4 research outputs found

    Restructuring source code identifiers

    Get PDF
    In software engineering, maintenance cost 60% of overall project lifecycle costs of any software product. Program comprehension is a substantial part of maintenance and evolution cost and, thus, any advancement in maintenance, evolution, and program understanding will potentially greatly reduce the total cost of ownership of any software products. Identifiers are an important source of information during program understanding and maintenance. Programmers often use identifiers to build their mental models of the software artifacts. Thus, poorly-chosen identifiers have been reported in the literature as misleading and increasing the program comprehension effort. Identifiers are composed of terms, which can be dictionary words, acronyms, contractions, or simple strings. We conjecture that the use of identical terms in different contexts may increase the risk of faults, and hence maintenance effort. We investigate our conjecture using a measure combining term entropy and term context-coverage to study whether certain terms increase the odds ratios of methods to be fault-prone. We compute term entropy and context-coverage of terms extracted from identifiers in Rhino 1.4R3 and ArgoUML 0.16. We show statistically that methods containing terms with high entropy and context-coverage are more fault-prone than others, and that the new measure is only partially correlated with size. We will build on this study, and will apply summarization technique for extracting linguistic information form methods and classes. Using this information, we will extract domain concepts from source code, and propose linguistic based refactoring

    Studying the evolution of software through software clustering and concept analysis

    Get PDF
    This thesis describes an investigation into the use of software clustering and concept analysis techniques for studying the evolution of software. These techniques produce representations of software systems by clustering similar entities in the system together. The software engineering community has used these techniques for a number of different reasons but this is the first study to investigate their uses for evolution. The representations produced by software clustering and concept analysis techniques can be used to trace changes to a software system over a number of different versions of the system. This information can be used by system maintainers to identify worrying evolutionary trends or assess a proposed change by comparing it to the effects of an earlier, similar change. The work described here attempts to establish whether the use of software clustering and concept analysis techniques for studying the evolution of software is worth pursuing. Four techniques, chosen based on an extensive literature survey of the field, have been used to create representations of versions of a test software system. These representations have been examined to assess whether any observations about the evolution of the system can be drawn from them. The results are positive and it is thought that evolution of software systems could be studied by using these techniques

    Defining linguistic antipatterns towards the improvement of source code quality

    Get PDF
    Previous studies showed that linguistic aspect of source code is a valuable source of information that can help to improve program comprehension. The proposed research work focuses on supporting quality improvement of source code by identifying, specifying, and studying common negative practices (i.e., linguistic antipatterns) with respect to linguistic information. We expect the definition of linguistic antipatterns to increase the awareness of the existence of such bad practices and to discourage their use. We also propose to study the relation between negative practices in linguistic information (i.e., linguistic antipatterns) and negative practices in structural information (i.e., design antipatterns) with respect to comprehension effort and fault/change proneness. We discuss the proposed methodology and some preliminary results

    Design Recovery and Data Mining: A Methodology That Identifies Data-Cohesive Subsystems Based on Mining Association Rules.

    Get PDF
    Software maintenance is both a technical and an economic concern for organizations. Large software systems are difficult to maintain due to their intrinsic complexity, and their maintenance consumes between 50% and 90% of the cost of their complete life-cycle. An essential step in maintenance is reverse engineering, which focuses on understanding the system. This system understanding is critical to avoid the generation of undesired side effects during maintenance. The objective of this research is to investigate the potential of applying data mining to reverse engineering. This research was motivated by the following: (1) data mining can process large volumes of information, (2) data mining can elicit meaningful information without previous knowledge of the domain, (3) data mining can extract novel non-trivial relationships from a data set, and (4) data mining is automatable. These data mining features are used to help address the problem of understanding large legacy systems. This research produced a general method to apply data mining to reverse engineering, and a methodology for design recovery, called Identification of Subsystems based on Associations (ISA). ISA uses mined association rules from a database view of the subject system to guide a clustering process that produces a data-cohesive hierarchical subsystem decomposition of the system. ISA promotes object-oriented principles because each identified subsystem consists of a set of data repositories and the code (i.e., programs) that manipulates them. ISA is an automatic multi-step process, which uses the source code of the subject system and multiple parameters as its input. ISA includes two representation models (i.e., text-based and graphic-based representation models) to present the resulting subsystem decomposition. The automated environment RE-ISA implements the ISA methodology. RE-ISA was used to produce the subsystem decomposition of real-word software systems. Results show that ISA can automatically produce data-cohesive subsystem decompositions without previous knowledge of the subject system, and that ISA always generates the same results if the same parameters are utilized. This research provides evidence that data mining is a beneficial tool for reverse engineering and provides the foundation for defining methodologies that combine data mining and software maintenance
    corecore