1,082 research outputs found

    On the Effect of Semantically Enriched Context Models on Software Modularization

    Full text link
    Many of the existing approaches for program comprehension rely on the linguistic information found in source code, such as identifier names and comments. Semantic clustering is one such technique for modularization of the system that relies on the informal semantics of the program, encoded in the vocabulary used in the source code. Treating the source code as a collection of tokens loses the semantic information embedded within the identifiers. We try to overcome this problem by introducing context models for source code identifiers to obtain a semantic kernel, which can be used for both deriving the topics that run through the system as well as their clustering. In the first model, we abstract an identifier to its type representation and build on this notion of context to construct contextual vector representation of the source code. The second notion of context is defined based on the flow of data between identifiers to represent a module as a dependency graph where the nodes correspond to identifiers and the edges represent the data dependencies between pairs of identifiers. We have applied our approach to 10 medium-sized open source Java projects, and show that by introducing contexts for identifiers, the quality of the modularization of the software systems is improved. Both of the context models give results that are superior to the plain vector representation of documents. In some cases, the authoritativeness of decompositions is improved by 67%. Furthermore, a more detailed evaluation of our approach on JEdit, an open source editor, demonstrates that inferred topics through performing topic analysis on the contextual representations are more meaningful compared to the plain representation of the documents. The proposed approach in introducing a context model for source code identifiers paves the way for building tools that support developers in program comprehension tasks such as application and domain concept location, software modularization and topic analysis

    Legacy Software Restructuring: Analyzing a Concrete Case

    Get PDF
    Software re-modularization is an old preoccupation of reverse engineering research. The advantages of a well structured or modularized system are well known. Yet after so much time and efforts, the field seems unable to come up with solutions that make a clear difference in practice. Recently, some researchers started to question whether some basic assumptions of the field were not overrated. The main one consists in evaluating the high-cohesion/low-coupling dogma with metrics of unknown relevance. In this paper, we study a real structuring case (on the Eclipse platform) to try to better understand if (some) existing metrics would have helped the software engineers in the task. Results show that the cohesion and coupling metrics used in the experiment did not behave as expected and would probably not have helped the maintainers reach there goal. We also measured another possible restructuring which is to decrease the number of cyclic dependencies between modules. Again, the results did not meet expectations

    Applications of Multi-view Learning Approaches for Software Comprehension

    Full text link
    Program comprehension concerns the ability of an individual to make an understanding of an existing software system to extend or transform it. Software systems comprise of data that are noisy and missing, which makes program understanding even more difficult. A software system consists of various views including the module dependency graph, execution logs, evolutionary information and the vocabulary used in the source code, that collectively defines the software system. Each of these views contain unique and complementary information; together which can more accurately describe the data. In this paper, we investigate various techniques for combining different sources of information to improve the performance of a program comprehension task. We employ state-of-the-art techniques from learning to 1) find a suitable similarity function for each view, and 2) compare different multi-view learning techniques to decompose a software system into high-level units and give component-level recommendations for refactoring of the system, as well as cross-view source code search. The experiments conducted on 10 relatively large Java software systems show that by fusing knowledge from different views, we can guarantee a lower bound on the quality of the modularization and even improve upon it. We proceed by integrating different sources of information to give a set of high-level recommendations as to how to refactor the software system. Furthermore, we demonstrate how learning a joint subspace allows for performing cross-modal retrieval across views, yielding results that are more aligned with what the user intends by the query. The multi-view approaches outlined in this paper can be employed for addressing problems in software engineering that can be encoded in terms of a learning problem, such as software bug prediction and feature location

    An Empirical Study of Cohesion and Coupling: Balancing Optimisation and Disruption

    Get PDF
    Search based software engineering has been extensively applied to the problem of finding improved modular structures that maximise cohesion and minimise coupling. However, there has, hitherto, been no longitudinal study of developers’ implementations, over a series of sequential releases. Moreover, results validating whether developers respect the fitness functions are scarce, and the potentially disruptive effect of search-based remodularisation is usually overlooked. We present an empirical study of 233 sequential releases of 10 different systems; the largest empirical study reported in the literature so far, and the first longitudinal study. Our results provide evidence that developers do, indeed, respect the fitness functions used to optimise cohesion/coupling (they are statistically significantly better than arbitrary choices with p << 0.01), yet they also leave considerable room for further improvement (cohesion/coupling can be improved by 25% on average). However, we also report that optimising the structure is highly disruptive (on average more than 57% of the structure must change), while our results reveal that developers tend to avoid such disruption. Therefore, we introduce and evaluate a multi-objective evolutionary approach that minimises disruption while maximising cohesion/coupling improvement. This allows developers to balance reticence to disrupt existing modular structure, against their competing need to improve cohesion and coupling. The multi-objective approach is able to find modular structures that improve the cohesion of developers’ implementations by 22.52%, while causing an acceptably low level of disruption (within that already tolerated by developers)

    A multiple hill climbing approach to software module clustering

    Get PDF
    Automated software module clustering is important for maintenance of legacy systems written in a 'monolithic format' with inadequate module boundaries. Even where systems were originally designed with suitable module boundaries, structure tends to degrade as the system evolves, making re-modularization worthwhile. This paper focuses upon search-based approaches to the automated module clustering problem, where hitherto, the local search approach of hill climbing has been found to be most successful. In the paper we show that results from a set of multiple hill climbs can be combined to locate good 'building blocks' for subsequent searches. Building blocks are formed by identifying the common features in a selection of best hill climbs. This process reduces the search space, while simultaneously 'hard wiring' parts of the solution. The paper reports the results of an empirical study that show that the multiple hill climbing approach does indeed guide the search to higher peaks in subsequent executions. The paper also investigates the relationship between the improved results and the system size
    • …
    corecore