3,884 research outputs found

    HCPC: Human centric program comprehension by grouping static execution scenarios

    Get PDF
    New members of a software team can struggle to locate user requirements if proper software engineering principles are not practiced. Reading through code, finding relevant methods, classes and files take a significant portion of software development time. Many times developers have to fix issues in code written by others. Having a good tool support for this code browsing activity can reduce human effort and increase overall developers' productivity. To help program comprehension activities, building an abstract code summary of a software system from the call graph is an active research area. A call graph is a visual representation of caller-callee relationships between different methods of a software project. Call graphs can be difficult to comprehend for a larger code-base. The motivation is to extract the essence from the call graph by finding execution scenarios from a call graph and then cluster them together by concentrating the information in the code-base. Later, different techniques are applied to label nodes in the abstract code summary tree. In this thesis, we focus on static call graphs for creating an abstract code summary tree as it clusters all possible program scenarios and groups similar scenarios together. Previous work on static call graph clusters execution paths and uses only one information retrieval technique without any feedback from developers. First, to advance existing work, we introduced new information retrieval techniques alongside human-involved evaluation. We found that developers prefer node labels generated by terms in method names with TFIDF (term frequency-inverse document frequency). Second, from our observation, we introduced two new types of information (text description using comments and execution patterns) for abstraction nodes to provide better overview. Finally, we introduced an interactive software tool which can be used to browse the code-base in a guided way by targeting specific units of the source code. In the user study, we found developers can use our tool to overview a project alongside finding help for doing particular jobs such as locating relevant files and understanding relevant domain knowledge

    On the Effect of Semantically Enriched Context Models on Software Modularization

    Full text link
    Many of the existing approaches for program comprehension rely on the linguistic information found in source code, such as identifier names and comments. Semantic clustering is one such technique for modularization of the system that relies on the informal semantics of the program, encoded in the vocabulary used in the source code. Treating the source code as a collection of tokens loses the semantic information embedded within the identifiers. We try to overcome this problem by introducing context models for source code identifiers to obtain a semantic kernel, which can be used for both deriving the topics that run through the system as well as their clustering. In the first model, we abstract an identifier to its type representation and build on this notion of context to construct contextual vector representation of the source code. The second notion of context is defined based on the flow of data between identifiers to represent a module as a dependency graph where the nodes correspond to identifiers and the edges represent the data dependencies between pairs of identifiers. We have applied our approach to 10 medium-sized open source Java projects, and show that by introducing contexts for identifiers, the quality of the modularization of the software systems is improved. Both of the context models give results that are superior to the plain vector representation of documents. In some cases, the authoritativeness of decompositions is improved by 67%. Furthermore, a more detailed evaluation of our approach on JEdit, an open source editor, demonstrates that inferred topics through performing topic analysis on the contextual representations are more meaningful compared to the plain representation of the documents. The proposed approach in introducing a context model for source code identifiers paves the way for building tools that support developers in program comprehension tasks such as application and domain concept location, software modularization and topic analysis

    Analysis of Software Binaries for Reengineering-Driven Product Line Architecture\^aAn Industrial Case Study

    Full text link
    This paper describes a method for the recovering of software architectures from a set of similar (but unrelated) software products in binary form. One intention is to drive refactoring into software product lines and combine architecture recovery with run time binary analysis and existing clustering methods. Using our runtime binary analysis, we create graphs that capture the dependencies between different software parts. These are clustered into smaller component graphs, that group software parts with high interactions into larger entities. The component graphs serve as a basis for further software product line work. In this paper, we concentrate on the analysis part of the method and the graph clustering. We apply the graph clustering method to a real application in the context of automation / robot configuration software tools.Comment: In Proceedings FMSPLE 2015, arXiv:1504.0301

    Improving visual representations of code

    Get PDF
    This work was done in 1997 at the Centre for Software Maintenance at the University of DurhamThe contents of this paper describe the work carried out by the Visual Research Group in the Centre for Software Maintenance at the University of Durham.Publisher PD

    Graph layout using subgraph isomorphisms

    Get PDF
    Today, graphs are used for many things. In engineering, graphs are used to design circuits in very large scale integration. In computer science, graphs are used in the representation of the structure of software. They show information such as the flow of data through the program (known as the data flow graph [1]) or the information about the calling sequence of programs (known as the call graph [145]). These graphs consist of many classes of graphs and may occupy a large area and involve a large number of vertices and edges. The manual layout of graphs is a tedious and error prone task. Algorithms for graph layout exist but tend to only produce a 'good' layout when they are applied to specific classes of small graphs. In this thesis, research is presented into a new automatic graph layout technique. Within many graphs, common structures exist. These are structures that produce 'good' layouts that are instantly recognisable and, when combined, can be used to improve the layout of the graphs. In this thesis common structures are given that are present in call graphs. A method of using subgraph isomorphism to detect these common structures is also presented. The method is known as the ANHOF method. This method is implemented in the ANHOF system, and is used to improve the layout of call graphs. The resulting layouts are an improvement over layouts from other algorithms because these common structures are evident and the number of edge crossings, clusters and aspect ratio are improved

    Software Analytics for Improving Program Comprehension

    Get PDF
    Title from PDF of title page viewed June 28, 2021Dissertation advisor: Yugyung LeeVitaIncludes bibliographical references (pages 122-143)Thesis (Ph.D.)--School of Computing and Engineering. University of Missouri--Kansas City, 2021Program comprehension is an essential part of software development and maintenance. Traditional methods of program comprehension, such as reviewing the codebase and documentation, are still challenging for understanding the software's overall structure and implementation. In recent years, software static analysis studies have emerged to facilitate program comprehensions, such as call graphs, which represent the system’s structure and its implementation as a directed graph. Furthermore, some studies focused on semantic enrichment of the software system problems using systematic learning analytics, including machine learning and NLP. While call graphs can enhance the program comprehension process, they still face three main challenges: (1) complex call graphs can become very difficult to understand making call graphs much harder to visualize and interpret by a developer and thus increases the overhead in program comprehension; (2) they are often limited to a single level of granularity, such as function calls; and (3) there is a lack of the interpretation semantics about the graphs. In this dissertation, we propose a novel framework, called CodEx, to facilitate and accelerate program comprehension. CodEx enables top-down and bottom-up analysis of the system's call graph and its execution paths for an enhanced program comprehension experience. Specifically, the proposed framework is designed to cope with the following techniques: multi-level graph abstraction using a coarsening technique, hierarchical clustering to represent the call graph into subgraphs (i.e., multi-levels of granularity), and interactive visual exploration of the graphs at different levels of abstraction. Moreover, we are also worked on building semantics of software systems using NLP and machine learning, including topic modeling, to interpret the meaning of the abstraction levels of the call graph.Introduction -- Multi-Level Call Graph for Program Comprehension -- Static Trace Clustering: Single-Level Approach -- Static Trace Clustering: Multi-Level Approach -- Topic Modeling for Cluster Analysis -- Visual Exploration of Software Clustered Traces -- Conclusion and Feature Work -- Appendi

    Automated End-to-End Management of the Deep Learning Lifecycle

    Get PDF
    Title from PDF of title page viewed March 1, 2021Dissertation advisor: Yugyung LeeVitaIncludes bibliographical references ( page 115-125)Thesis (Ph.D.)--School of Computing and Engineering. University of Missouri--Kansas City, 2020Deep learning has improved the state-of-the-art results in an ever-growing number of domains. This success heavily relies on the development of deep learning models--an experimental, iterative process that produces tens to hundreds of models before arriving at a satisfactory result. While there has been a surge in the number of tools and frameworks that aim at facilitating deep learning, the process of managing the models and their artifacts is still surprisingly challenging and time-consuming. Existing model-management solutions are either tailored for commercial platforms or require significant code changes. Moreover, most of the existing solutions address a single phase of the modeling lifecycle, such as experiment monitoring, while ignoring other essential tasks, such as model sharing and deployment. In this dissertation, we present a software system to facilitate and accelerate the deep learning lifecycle, named ModelKB. ModelKB can \textit{automatically} manage the modeling lifecycle end-to-end, including (1) monitoring and tracking experiments; (2) visualizing, searching for, and comparing models and experiments; (3) deploying models locally and on the cloud; and (4) sharing and publishing trained models. Our system also provides a stepping-stone for enhanced reproducibility. ModelKB currently supports TensorFlow 2.0, Keras, and PyTorch, and it can be extended to other deep learning frameworks easily. A video demo is available at https://youtu.be/XWiJpSM_jvA. Moreover, we study static call graphs to form a stepping-stone to facilitate the \textit{comprehension} of the overall lifecycle implementation (i.e., source code). Specifically, we introduce Code2Graph to facilitate the exploration and tracking of the implementation and its changes over time. Code2Graph is used to construct and visualize the call graph of a software codebase. We evaluate the functionality by analyzing and studying real software systems throughout their entire lifespan. The tool, evaluation results, and a video demo are available at https://goo.gl/8edZ64. Finally, we demonstrate a software system that brings together the contributions mentioned above to build a robust, open-collaborative platform for deep learning applications in the health domain, named Medl.AI. Medl.AI enables researchers and healthcare professionals to easily and efficiently: explore, share, reuse, and discuss deep learning models specific to the medical domain. We present six illustrative deep learning medical applications using Medl.AI. We conduct an online survey to assess the feasibility and benefits of Medl.AI. The user study suggests that Medl.AI provides a promising solution to open collaborative research and applications. Our live website is currently available at http://medl.ai.Introduction -- Background and Challenges -- Automated Management of the Modeling Lifecycle -- Facilitating Program Comprehension -- Medl.AI: An application of MODELKB -- Conclusion
    • …
    corecore