1,172 research outputs found
On the Effect of Semantically Enriched Context Models on Software Modularization
Many of the existing approaches for program comprehension rely on the
linguistic information found in source code, such as identifier names and
comments. Semantic clustering is one such technique for modularization of the
system that relies on the informal semantics of the program, encoded in the
vocabulary used in the source code. Treating the source code as a collection of
tokens loses the semantic information embedded within the identifiers. We try
to overcome this problem by introducing context models for source code
identifiers to obtain a semantic kernel, which can be used for both deriving
the topics that run through the system as well as their clustering. In the
first model, we abstract an identifier to its type representation and build on
this notion of context to construct contextual vector representation of the
source code. The second notion of context is defined based on the flow of data
between identifiers to represent a module as a dependency graph where the nodes
correspond to identifiers and the edges represent the data dependencies between
pairs of identifiers. We have applied our approach to 10 medium-sized open
source Java projects, and show that by introducing contexts for identifiers,
the quality of the modularization of the software systems is improved. Both of
the context models give results that are superior to the plain vector
representation of documents. In some cases, the authoritativeness of
decompositions is improved by 67%. Furthermore, a more detailed evaluation of
our approach on JEdit, an open source editor, demonstrates that inferred topics
through performing topic analysis on the contextual representations are more
meaningful compared to the plain representation of the documents. The proposed
approach in introducing a context model for source code identifiers paves the
way for building tools that support developers in program comprehension tasks
such as application and domain concept location, software modularization and
topic analysis
Reify Your Collection Queries for Modularity and Speed!
Modularity and efficiency are often contradicting requirements, such that
programers have to trade one for the other. We analyze this dilemma in the
context of programs operating on collections. Performance-critical code using
collections need often to be hand-optimized, leading to non-modular, brittle,
and redundant code. In principle, this dilemma could be avoided by automatic
collection-specific optimizations, such as fusion of collection traversals,
usage of indexing, or reordering of filters. Unfortunately, it is not obvious
how to encode such optimizations in terms of ordinary collection APIs, because
the program operating on the collections is not reified and hence cannot be
analyzed.
We propose SQuOpt, the Scala Query Optimizer--a deep embedding of the Scala
collections API that allows such analyses and optimizations to be defined and
executed within Scala, without relying on external tools or compiler
extensions. SQuOpt provides the same "look and feel" (syntax and static typing
guarantees) as the standard collections API. We evaluate SQuOpt by
re-implementing several code analyses of the Findbugs tool using SQuOpt, show
average speedups of 12x with a maximum of 12800x and hence demonstrate that
SQuOpt can reconcile modularity and efficiency in real-world applications.Comment: 20 page
Applications of Multi-view Learning Approaches for Software Comprehension
Program comprehension concerns the ability of an individual to make an
understanding of an existing software system to extend or transform it.
Software systems comprise of data that are noisy and missing, which makes
program understanding even more difficult. A software system consists of
various views including the module dependency graph, execution logs,
evolutionary information and the vocabulary used in the source code, that
collectively defines the software system. Each of these views contain unique
and complementary information; together which can more accurately describe the
data. In this paper, we investigate various techniques for combining different
sources of information to improve the performance of a program comprehension
task. We employ state-of-the-art techniques from learning to 1) find a suitable
similarity function for each view, and 2) compare different multi-view learning
techniques to decompose a software system into high-level units and give
component-level recommendations for refactoring of the system, as well as
cross-view source code search. The experiments conducted on 10 relatively large
Java software systems show that by fusing knowledge from different views, we
can guarantee a lower bound on the quality of the modularization and even
improve upon it. We proceed by integrating different sources of information to
give a set of high-level recommendations as to how to refactor the software
system. Furthermore, we demonstrate how learning a joint subspace allows for
performing cross-modal retrieval across views, yielding results that are more
aligned with what the user intends by the query. The multi-view approaches
outlined in this paper can be employed for addressing problems in software
engineering that can be encoded in terms of a learning problem, such as
software bug prediction and feature location
Evaluating Maintainability Prejudices with a Large-Scale Study of Open-Source Projects
Exaggeration or context changes can render maintainability experience into
prejudice. For example, JavaScript is often seen as least elegant language and
hence of lowest maintainability. Such prejudice should not guide decisions
without prior empirical validation. We formulated 10 hypotheses about
maintainability based on prejudices and test them in a large set of open-source
projects (6,897 GitHub repositories, 402 million lines, 5 programming
languages). We operationalize maintainability with five static analysis
metrics. We found that JavaScript code is not worse than other code, Java code
shows higher maintainability than C# code and C code has longer methods than
other code. The quality of interface documentation is better in Java code than
in other code. Code developed by teams is not of higher and large code bases
not of lower maintainability. Projects with high maintainability are not more
popular or more often forked. Overall, most hypotheses are not supported by
open-source data.Comment: 20 page
Assessing architectural evolution: A case study
This is the post-print version of the Article. The official published can be accessed from the link below - Copyright @ 2011 SpringerThis paper proposes to use a historical perspective on generic laws, principles,
and guidelines, like Lehmanās software evolution laws and Martinās design principles, in order to achieve a multi-faceted process and structural assessment of a systemās architectural evolution. We present a simple structural model with associated historical metrics and
visualizations that could form part of an architectās dashboard. We perform such an assessment for the Eclipse SDK, as a case study of a large, complex, and long-lived system for which sustained effective architectural evolution is paramount. The twofold aim of checking generic principles on a well-know system is, on the one hand,
to see whether there are certain lessons that could be learned for best practice of architectural evolution, and on the other hand to get more insights about the applicability of such principles. We find that while the Eclipse SDK does follow several of the laws and principles, there are some deviations, and we discuss areas of architectural improvement and limitations of the assessment approach
A review and assessment of novice learning tools for problem solving and program development
There is a great demand for the development of novice learning tools to supplement classroom instruction in the areas of problem solving and program development. Research in the area of pedagogy, the psychology of programming, human-computer interaction, and cognition have provided valuable input to the development of new methodologies, paradigms, programming languages, and novice learning tools to answer this demand.
Based on the cognitive needs of novices, it is possible to postulate a set of characteristics that should comprise the components an effective novice-learning tool. This thesis will discover these characteristics and provide recommendations for the development of new learning tools. This will be accomplished with a review of the challenges that novices face, an in-depth discussion on modem learning tools and the challenges that they address, and the identification and discussion of the vital characteristics that constitute an effective learning tool based on these tools and personal ideas
- ā¦