2 research outputs found
SAF-DL: Semantic Analysis Framework for Deep Learning Open Source Projects
Title from PDF of title page viewed June 20, 2018Thesis advisor: Yugyung LeeVitaIncludes bibliographical references (pages 74-76)Thesis (M.S.)--School of Computing and Engineering. University of Missouri--Kansas City, 2018There are a lot of open source projects available on the internet. Specifically, due to the
increasing interest of Deep Learning (DL), the number of DL open source projects is also
increased. This project is motivated by utilizing the existing projects to develop either a new
innovative project or create a better-refined version. In addition, these projects can be used to
guide software developers or students to perform effective programming in their DL projects.
The challenge is how to analyze the functionalities or features that are described in the source
code of these projects. It is not easy to understand the semantics of the source code in these
projects as the dependencies are intertwined deeply. As the complexity and scale of the projects
become huge, it is not scalable to manually analyze the workflow or its semantics of these open
source projects.
This thesis proposed to build a semantic analytics framework, called SAF-DL, that
aims (i) to analyze the sequences of operations and build a graph model, known as call-graph,
in a given open source project, (ii) to cluster the similar functional paths in the call graphs
using Machine Learning algorithms, (iii) to find the abstractions (clusters) of the function
flows, (iv) to identify the semantics of the function flows, (v) to discover the workflow by
analyzing their dependencies or similarity between the functional paths and between projects.
The SAF-DL pipeline transformation from source code to the semantics of the workflow model
was designed with Machine Learning and NLP techniques. In this thesis,
Python/TensorFlow/Keras-based open source projects are analyzed in GitHub. A comparative
analysis of models used to evaluate the effectiveness of discovery of code abstraction and
workflow in the SAF-DL framework. The SAF-DL framework was implemented in Python
Scikit-learn and tested using three open source projects. This thesis have demonstrated that the
SAF-DL framework can be used in various applications such as search or retrieval of open
source projects, source code to source code plagiarism detection, and automatic code or test
case generation.Introduction -- Background and related work -- Proposed framework -- Results and evaluation -- Applications -- Conclusion and future wor