3,884 research outputs found
HCPC: Human centric program comprehension by grouping static execution scenarios
New members of a software team can struggle to locate user requirements if proper software engineering principles are not practiced. Reading through code, finding relevant methods, classes and files take a significant portion of software development time. Many times developers have to fix issues in code written by others. Having a good tool support for this code browsing activity can reduce human effort and increase overall developers' productivity. To help program comprehension activities, building an abstract code summary of a software system from the call graph is an active research area. A call graph is a visual representation of caller-callee relationships between different methods of a software project. Call graphs can be difficult to comprehend for a larger code-base. The motivation is to extract the essence from the call graph by finding execution scenarios from a call graph and then cluster them together by concentrating the information in the code-base. Later, different techniques are applied to label nodes in the abstract code summary tree. In this thesis, we focus on static call graphs for creating an abstract code summary tree as it clusters all possible program scenarios and groups similar scenarios together. Previous work on static call graph clusters execution paths and uses only one information retrieval technique without any feedback from developers. First, to advance existing work, we introduced new information retrieval techniques alongside human-involved evaluation. We found that developers prefer node labels generated by terms in method names with TFIDF (term frequency-inverse document frequency). Second, from our observation, we introduced two new types of information (text description using comments and execution patterns) for abstraction nodes to provide better overview. Finally, we introduced an interactive software tool which can be used to browse the code-base in a guided way by targeting specific units of the source code. In the user study, we found developers can use our tool to overview a project alongside finding help for doing particular jobs such as locating relevant files and understanding relevant domain knowledge
On the Effect of Semantically Enriched Context Models on Software Modularization
Many of the existing approaches for program comprehension rely on the
linguistic information found in source code, such as identifier names and
comments. Semantic clustering is one such technique for modularization of the
system that relies on the informal semantics of the program, encoded in the
vocabulary used in the source code. Treating the source code as a collection of
tokens loses the semantic information embedded within the identifiers. We try
to overcome this problem by introducing context models for source code
identifiers to obtain a semantic kernel, which can be used for both deriving
the topics that run through the system as well as their clustering. In the
first model, we abstract an identifier to its type representation and build on
this notion of context to construct contextual vector representation of the
source code. The second notion of context is defined based on the flow of data
between identifiers to represent a module as a dependency graph where the nodes
correspond to identifiers and the edges represent the data dependencies between
pairs of identifiers. We have applied our approach to 10 medium-sized open
source Java projects, and show that by introducing contexts for identifiers,
the quality of the modularization of the software systems is improved. Both of
the context models give results that are superior to the plain vector
representation of documents. In some cases, the authoritativeness of
decompositions is improved by 67%. Furthermore, a more detailed evaluation of
our approach on JEdit, an open source editor, demonstrates that inferred topics
through performing topic analysis on the contextual representations are more
meaningful compared to the plain representation of the documents. The proposed
approach in introducing a context model for source code identifiers paves the
way for building tools that support developers in program comprehension tasks
such as application and domain concept location, software modularization and
topic analysis
Analysis of Software Binaries for Reengineering-Driven Product Line Architecture\^aAn Industrial Case Study
This paper describes a method for the recovering of software architectures
from a set of similar (but unrelated) software products in binary form. One
intention is to drive refactoring into software product lines and combine
architecture recovery with run time binary analysis and existing clustering
methods. Using our runtime binary analysis, we create graphs that capture the
dependencies between different software parts. These are clustered into smaller
component graphs, that group software parts with high interactions into larger
entities. The component graphs serve as a basis for further software product
line work. In this paper, we concentrate on the analysis part of the method and
the graph clustering. We apply the graph clustering method to a real
application in the context of automation / robot configuration software tools.Comment: In Proceedings FMSPLE 2015, arXiv:1504.0301
Improving visual representations of code
This work was done in 1997 at the Centre for Software Maintenance at the University of DurhamThe contents of this paper describe the work carried out by the Visual Research Group in the Centre for Software Maintenance at the University of Durham.Publisher PD
Graph layout using subgraph isomorphisms
Today, graphs are used for many things. In engineering, graphs are used to design circuits in very large scale integration. In computer science, graphs are used in the representation of the structure of software. They show information such as the flow of data through the program (known as the data flow graph [1]) or the information about the calling sequence of programs (known as the call graph [145]). These graphs consist of many classes of graphs and may occupy a large area and involve a large number of vertices and edges. The manual layout of graphs is a tedious and error prone task. Algorithms for graph layout exist but tend to only produce a 'good' layout when they are applied to specific classes of small graphs. In this thesis, research is presented into a new automatic graph layout technique. Within many graphs, common structures exist. These are structures that produce 'good' layouts that are instantly recognisable and, when combined, can be used to improve the layout of the graphs. In this thesis common structures are given that are present in call graphs. A method of using subgraph isomorphism to detect these common structures is also presented. The method is known as the ANHOF method. This method is implemented in the ANHOF system, and is used to improve the layout of call graphs. The resulting layouts are an improvement over layouts from other algorithms because these common structures are evident and the number of edge crossings, clusters and aspect ratio are improved
Software Analytics for Improving Program Comprehension
Title from PDF of title page viewed June 28, 2021Dissertation advisor: Yugyung LeeVitaIncludes bibliographical references (pages 122-143)Thesis (Ph.D.)--School of Computing and Engineering. University of Missouri--Kansas City, 2021Program comprehension is an essential part of software development and maintenance. Traditional methods of program comprehension, such as reviewing the codebase and documentation, are still challenging for understanding the software's overall structure and implementation. In recent years, software static analysis studies have emerged to facilitate program comprehensions, such as call graphs, which represent the system’s structure and its implementation as a directed graph. Furthermore, some studies focused on semantic enrichment of the software system problems using systematic learning analytics, including machine learning and NLP.
While call graphs can enhance the program comprehension process, they still face three main challenges: (1) complex call graphs can become very difficult to understand making call graphs much harder to visualize and interpret by a developer and thus increases the overhead in program comprehension; (2) they are often limited to a single level of granularity, such as function calls; and (3) there is a lack of the interpretation semantics about the graphs.
In this dissertation, we propose a novel framework, called CodEx, to facilitate and accelerate program comprehension. CodEx enables top-down and bottom-up analysis of the system's call graph and its execution paths for an enhanced program comprehension experience. Specifically, the proposed framework is designed to cope with the following techniques: multi-level graph abstraction using a coarsening technique, hierarchical clustering to represent the call graph into subgraphs (i.e., multi-levels of granularity), and interactive visual exploration of the graphs at different levels of abstraction. Moreover, we are also worked on building semantics of software systems using NLP and machine learning, including topic modeling, to interpret the meaning of the abstraction levels of the call graph.Introduction -- Multi-Level Call Graph for Program Comprehension -- Static Trace Clustering: Single-Level Approach -- Static Trace Clustering: Multi-Level Approach -- Topic Modeling for Cluster Analysis -- Visual Exploration of Software Clustered Traces -- Conclusion and Feature Work -- Appendi
Automated End-to-End Management of the Deep Learning Lifecycle
Title from PDF of title page viewed March 1, 2021Dissertation advisor: Yugyung LeeVitaIncludes bibliographical references ( page 115-125)Thesis (Ph.D.)--School of Computing and Engineering. University of Missouri--Kansas City, 2020Deep learning has improved the state-of-the-art results in an ever-growing number of domains. This success heavily relies on the development of deep learning models--an experimental, iterative process that produces tens to hundreds of models before arriving at a satisfactory result. While there has been a surge in the number of tools and frameworks that aim at facilitating deep learning, the process of managing the models and their artifacts is still surprisingly challenging and time-consuming. Existing model-management solutions are either tailored for commercial platforms or require significant code changes. Moreover, most of the existing solutions address a single phase of the modeling lifecycle, such as experiment monitoring, while ignoring other essential tasks, such as model sharing and deployment. In this dissertation, we present a software system to facilitate and accelerate the deep learning lifecycle, named ModelKB. ModelKB can \textit{automatically} manage the modeling lifecycle end-to-end, including (1) monitoring and tracking experiments; (2) visualizing, searching for, and comparing models and experiments; (3) deploying models locally and on the cloud; and (4) sharing and publishing trained models. Our system also provides a stepping-stone for enhanced reproducibility. ModelKB currently supports TensorFlow 2.0, Keras, and PyTorch, and it can be extended to other deep learning frameworks easily. A video demo is available at https://youtu.be/XWiJpSM_jvA.
Moreover, we study static call graphs to form a stepping-stone to facilitate the \textit{comprehension} of the overall lifecycle implementation (i.e., source code). Specifically, we introduce Code2Graph to facilitate the exploration and tracking of the implementation and its changes over time. Code2Graph is used to construct and visualize the call graph of a software codebase. We evaluate the functionality by analyzing and studying real software systems throughout their entire lifespan. The tool, evaluation results, and a video demo are available at https://goo.gl/8edZ64.
Finally, we demonstrate a software system that brings together the contributions mentioned above to build a robust, open-collaborative platform for deep learning applications in the health domain, named Medl.AI. Medl.AI enables researchers and healthcare professionals to easily and efficiently: explore, share, reuse, and discuss deep learning models specific to the medical domain. We present six illustrative deep learning medical applications using Medl.AI. We conduct an online survey to assess the feasibility and benefits of Medl.AI. The user study suggests that Medl.AI provides a promising solution to open collaborative research and applications. Our live website is currently available at http://medl.ai.Introduction -- Background and Challenges -- Automated Management of the Modeling Lifecycle -- Facilitating Program Comprehension -- Medl.AI: An application of MODELKB -- Conclusion
- …