A Comparison of Code Structure in Open Source Machine Learning Repositories
master thesis
Status | open |
Advisor | Thomas Weber |
Professor | Prof. Dr. H. Hußmann |
Task
Machine Learning is a new paradigm for writing software where not one defined algorithm dictates the behaviour but instead data determines and changes how the software operates. Naturally, a different development paradigm means that we write different code.
GitHub as the platform for open source projects offers a diverse range of software repositories, traditional and using Machine Learning. In this thesis you will analyze publicly available repositories to determine how exactly Machine Learning code differs from more traditional code.
You will:
- Perform a literature review
- Collect and classify a sample of GitHub repositories that represent Machine Learning and traditional software development
- Compare what parts of the underlying programming language is used more or less for either
- Summarize your findings in a thesis and presenting them to an audience
- (Optional) co-writing a research paper
You need:
- Strong communication skills in english
- Good knowledge of data processing / extraction
- General knowledge of formal languages, parsing, etc.