1,172 research outputs found

    On the Effect of Semantically Enriched Context Models on Software Modularization

    Full text link
    Many of the existing approaches for program comprehension rely on the linguistic information found in source code, such as identifier names and comments. Semantic clustering is one such technique for modularization of the system that relies on the informal semantics of the program, encoded in the vocabulary used in the source code. Treating the source code as a collection of tokens loses the semantic information embedded within the identifiers. We try to overcome this problem by introducing context models for source code identifiers to obtain a semantic kernel, which can be used for both deriving the topics that run through the system as well as their clustering. In the first model, we abstract an identifier to its type representation and build on this notion of context to construct contextual vector representation of the source code. The second notion of context is defined based on the flow of data between identifiers to represent a module as a dependency graph where the nodes correspond to identifiers and the edges represent the data dependencies between pairs of identifiers. We have applied our approach to 10 medium-sized open source Java projects, and show that by introducing contexts for identifiers, the quality of the modularization of the software systems is improved. Both of the context models give results that are superior to the plain vector representation of documents. In some cases, the authoritativeness of decompositions is improved by 67%. Furthermore, a more detailed evaluation of our approach on JEdit, an open source editor, demonstrates that inferred topics through performing topic analysis on the contextual representations are more meaningful compared to the plain representation of the documents. The proposed approach in introducing a context model for source code identifiers paves the way for building tools that support developers in program comprehension tasks such as application and domain concept location, software modularization and topic analysis

    Reify Your Collection Queries for Modularity and Speed!

    Full text link
    Modularity and efficiency are often contradicting requirements, such that programers have to trade one for the other. We analyze this dilemma in the context of programs operating on collections. Performance-critical code using collections need often to be hand-optimized, leading to non-modular, brittle, and redundant code. In principle, this dilemma could be avoided by automatic collection-specific optimizations, such as fusion of collection traversals, usage of indexing, or reordering of filters. Unfortunately, it is not obvious how to encode such optimizations in terms of ordinary collection APIs, because the program operating on the collections is not reified and hence cannot be analyzed. We propose SQuOpt, the Scala Query Optimizer--a deep embedding of the Scala collections API that allows such analyses and optimizations to be defined and executed within Scala, without relying on external tools or compiler extensions. SQuOpt provides the same "look and feel" (syntax and static typing guarantees) as the standard collections API. We evaluate SQuOpt by re-implementing several code analyses of the Findbugs tool using SQuOpt, show average speedups of 12x with a maximum of 12800x and hence demonstrate that SQuOpt can reconcile modularity and efficiency in real-world applications.Comment: 20 page

    Applications of Multi-view Learning Approaches for Software Comprehension

    Full text link
    Program comprehension concerns the ability of an individual to make an understanding of an existing software system to extend or transform it. Software systems comprise of data that are noisy and missing, which makes program understanding even more difficult. A software system consists of various views including the module dependency graph, execution logs, evolutionary information and the vocabulary used in the source code, that collectively defines the software system. Each of these views contain unique and complementary information; together which can more accurately describe the data. In this paper, we investigate various techniques for combining different sources of information to improve the performance of a program comprehension task. We employ state-of-the-art techniques from learning to 1) find a suitable similarity function for each view, and 2) compare different multi-view learning techniques to decompose a software system into high-level units and give component-level recommendations for refactoring of the system, as well as cross-view source code search. The experiments conducted on 10 relatively large Java software systems show that by fusing knowledge from different views, we can guarantee a lower bound on the quality of the modularization and even improve upon it. We proceed by integrating different sources of information to give a set of high-level recommendations as to how to refactor the software system. Furthermore, we demonstrate how learning a joint subspace allows for performing cross-modal retrieval across views, yielding results that are more aligned with what the user intends by the query. The multi-view approaches outlined in this paper can be employed for addressing problems in software engineering that can be encoded in terms of a learning problem, such as software bug prediction and feature location

    Evaluating Maintainability Prejudices with a Large-Scale Study of Open-Source Projects

    Full text link
    Exaggeration or context changes can render maintainability experience into prejudice. For example, JavaScript is often seen as least elegant language and hence of lowest maintainability. Such prejudice should not guide decisions without prior empirical validation. We formulated 10 hypotheses about maintainability based on prejudices and test them in a large set of open-source projects (6,897 GitHub repositories, 402 million lines, 5 programming languages). We operationalize maintainability with five static analysis metrics. We found that JavaScript code is not worse than other code, Java code shows higher maintainability than C# code and C code has longer methods than other code. The quality of interface documentation is better in Java code than in other code. Code developed by teams is not of higher and large code bases not of lower maintainability. Projects with high maintainability are not more popular or more often forked. Overall, most hypotheses are not supported by open-source data.Comment: 20 page

    Assessing architectural evolution: A case study

    Get PDF
    This is the post-print version of the Article. The official published can be accessed from the link below - Copyright @ 2011 SpringerThis paper proposes to use a historical perspective on generic laws, principles, and guidelines, like Lehmanā€™s software evolution laws and Martinā€™s design principles, in order to achieve a multi-faceted process and structural assessment of a systemā€™s architectural evolution. We present a simple structural model with associated historical metrics and visualizations that could form part of an architectā€™s dashboard. We perform such an assessment for the Eclipse SDK, as a case study of a large, complex, and long-lived system for which sustained effective architectural evolution is paramount. The twofold aim of checking generic principles on a well-know system is, on the one hand, to see whether there are certain lessons that could be learned for best practice of architectural evolution, and on the other hand to get more insights about the applicability of such principles. We find that while the Eclipse SDK does follow several of the laws and principles, there are some deviations, and we discuss areas of architectural improvement and limitations of the assessment approach

    A review and assessment of novice learning tools for problem solving and program development

    Get PDF
    There is a great demand for the development of novice learning tools to supplement classroom instruction in the areas of problem solving and program development. Research in the area of pedagogy, the psychology of programming, human-computer interaction, and cognition have provided valuable input to the development of new methodologies, paradigms, programming languages, and novice learning tools to answer this demand. Based on the cognitive needs of novices, it is possible to postulate a set of characteristics that should comprise the components an effective novice-learning tool. This thesis will discover these characteristics and provide recommendations for the development of new learning tools. This will be accomplished with a review of the challenges that novices face, an in-depth discussion on modem learning tools and the challenges that they address, and the identification and discussion of the vital characteristics that constitute an effective learning tool based on these tools and personal ideas
    • ā€¦
    corecore