Static Differences between Traditional and Machine Learning Code
bachelor thesis
Status | in progress |
Student | Amanuel Ghebreweldi |
Advisor | Thomas Weber |
Professor | Prof. Dr. Sven Mayer |
Task
Machine Learning is a new paradigm for writing software where not one defined algorithm dictates the behaviour but instead data determines and changes how the software operates. Naturally, a different development paradigm means that we write different code.
In this thesis you will analyze publicly available repositories to determine how exactly Machine Learning code differs from more traditional code.
This thesis builds on prior work by Simmons et al. who performed a similar analysis with respect to coding standards. As part of this thesis, you will select additional metrics that may be interesting for this data set.
In the 20 weeks of the thesis, you will roughly follow this procedure:
- Week 1-3:
Familiarize yourself with the existing literature and the paper that gives a list of roughly 1000 repositories with Machine learning and 1000 without Document the scope of the study and the code metrics that will be analyzed Set up the necessary tools and environments for crawling and analyzing the GitHub repositories
- Week 4-5:
Update the existing code for the new metrics and repositories.
- Week 6:
Crawl and collect data from the 1000 Machine learning repositories Perform static analysis on the collected data and extract the code metrics
- Week 7:
Crawl and collect data from the 1000 non-Machine learning repositories Perform static analysis on the collected data and extract the code metrics
- MILESTONE 1: All data is collected.
- Week 8-9:
Document the infrastructure and setup for crawling the repositories.
- Week 10-11:
Clean and preprocess the collected data Prepare the data for statistical analysis
- Week 12-14:
Perform statistical analysis on the collected data Identify patterns and differences between the Machine learning and non-Machine learning repositories
- MILESTONE 2: Analysis is completed
- Week 15-16:
Finalize the literature review, describing the background and related work in the field of static analysis and machine learning Finalize the methodology section, describing the data collection and analysis process
- Week 17-18:
Write the results section, describing the findings from the statistical analysis Write the discussion section, interpreting the results and discussing their implications
- Week 19:
Write the conclusion section, summarizing the main contributions of the thesis
- MILESTONE 3: Each section of theses has an acceptable text.
- Week 20:
Finalize the thesis document and submit it for evaluation. Prepare for the thesis defense
- MILESTONE 4: Thesis complete