1,082 research outputs found
On the Effect of Semantically Enriched Context Models on Software Modularization
Many of the existing approaches for program comprehension rely on the
linguistic information found in source code, such as identifier names and
comments. Semantic clustering is one such technique for modularization of the
system that relies on the informal semantics of the program, encoded in the
vocabulary used in the source code. Treating the source code as a collection of
tokens loses the semantic information embedded within the identifiers. We try
to overcome this problem by introducing context models for source code
identifiers to obtain a semantic kernel, which can be used for both deriving
the topics that run through the system as well as their clustering. In the
first model, we abstract an identifier to its type representation and build on
this notion of context to construct contextual vector representation of the
source code. The second notion of context is defined based on the flow of data
between identifiers to represent a module as a dependency graph where the nodes
correspond to identifiers and the edges represent the data dependencies between
pairs of identifiers. We have applied our approach to 10 medium-sized open
source Java projects, and show that by introducing contexts for identifiers,
the quality of the modularization of the software systems is improved. Both of
the context models give results that are superior to the plain vector
representation of documents. In some cases, the authoritativeness of
decompositions is improved by 67%. Furthermore, a more detailed evaluation of
our approach on JEdit, an open source editor, demonstrates that inferred topics
through performing topic analysis on the contextual representations are more
meaningful compared to the plain representation of the documents. The proposed
approach in introducing a context model for source code identifiers paves the
way for building tools that support developers in program comprehension tasks
such as application and domain concept location, software modularization and
topic analysis
Legacy Software Restructuring: Analyzing a Concrete Case
Software re-modularization is an old preoccupation of reverse engineering
research. The advantages of a well structured or modularized system are well
known. Yet after so much time and efforts, the field seems unable to come up
with solutions that make a clear difference in practice. Recently, some
researchers started to question whether some basic assumptions of the field
were not overrated. The main one consists in evaluating the
high-cohesion/low-coupling dogma with metrics of unknown relevance. In this
paper, we study a real structuring case (on the Eclipse platform) to try to
better understand if (some) existing metrics would have helped the software
engineers in the task. Results show that the cohesion and coupling metrics used
in the experiment did not behave as expected and would probably not have helped
the maintainers reach there goal. We also measured another possible
restructuring which is to decrease the number of cyclic dependencies between
modules. Again, the results did not meet expectations
Applications of Multi-view Learning Approaches for Software Comprehension
Program comprehension concerns the ability of an individual to make an
understanding of an existing software system to extend or transform it.
Software systems comprise of data that are noisy and missing, which makes
program understanding even more difficult. A software system consists of
various views including the module dependency graph, execution logs,
evolutionary information and the vocabulary used in the source code, that
collectively defines the software system. Each of these views contain unique
and complementary information; together which can more accurately describe the
data. In this paper, we investigate various techniques for combining different
sources of information to improve the performance of a program comprehension
task. We employ state-of-the-art techniques from learning to 1) find a suitable
similarity function for each view, and 2) compare different multi-view learning
techniques to decompose a software system into high-level units and give
component-level recommendations for refactoring of the system, as well as
cross-view source code search. The experiments conducted on 10 relatively large
Java software systems show that by fusing knowledge from different views, we
can guarantee a lower bound on the quality of the modularization and even
improve upon it. We proceed by integrating different sources of information to
give a set of high-level recommendations as to how to refactor the software
system. Furthermore, we demonstrate how learning a joint subspace allows for
performing cross-modal retrieval across views, yielding results that are more
aligned with what the user intends by the query. The multi-view approaches
outlined in this paper can be employed for addressing problems in software
engineering that can be encoded in terms of a learning problem, such as
software bug prediction and feature location
An Empirical Study of Cohesion and Coupling: Balancing Optimisation and Disruption
Search based software engineering has been extensively applied to the problem of finding improved modular structures that maximise cohesion and minimise coupling. However, there has, hitherto, been no longitudinal study of developers’ implementations, over a series of sequential releases. Moreover, results validating whether developers respect the fitness functions are scarce, and the potentially disruptive effect of search-based remodularisation is usually overlooked. We present an empirical study of 233 sequential releases of 10 different systems; the largest empirical study reported in the literature so far, and the first longitudinal study. Our results provide evidence that developers do, indeed, respect the fitness functions used to optimise cohesion/coupling (they are statistically significantly better than arbitrary choices with p << 0.01), yet they also leave considerable room for further improvement (cohesion/coupling can be improved by 25% on average). However, we also report that optimising the structure is highly disruptive (on average more than 57% of the structure must change), while our results reveal that developers tend to avoid such disruption. Therefore, we introduce and evaluate a multi-objective evolutionary approach that minimises disruption while maximising cohesion/coupling improvement. This allows developers to balance reticence to disrupt existing modular structure, against their competing need to improve cohesion and coupling. The multi-objective approach is able to find modular structures that improve the cohesion of developers’ implementations by 22.52%, while causing an acceptably low level of disruption (within that already tolerated by developers)
A multiple hill climbing approach to software module clustering
Automated software module clustering is important for maintenance of legacy systems written in a 'monolithic format' with inadequate module boundaries. Even where systems were originally designed with suitable module boundaries, structure tends to degrade as the system evolves, making re-modularization worthwhile. This paper focuses upon search-based approaches to the automated module clustering problem, where hitherto, the local search approach of hill climbing has been found to be most successful. In the paper we show that results from a set of multiple hill climbs can be combined to locate good 'building blocks' for subsequent searches. Building blocks are formed by identifying the common features in a selection of best hill climbs. This process reduces the search space, while simultaneously 'hard wiring' parts of the solution. The paper reports the results of an empirical study that show that the multiple hill climbing approach does indeed guide the search to higher peaks in subsequent executions. The paper also investigates the relationship between the improved results and the system size
- …