52 research outputs found

    Management Aspects of Software Clone Detection and Analysis

    Get PDF
    Copying a code fragment and reusing it by pasting with or without minor modifications is a common practice in software development for improved productivity. As a result, software systems often have similar segments of code, called software clones or code clones. Due to many reasons, unintentional clones may also appear in the source code without awareness of the developer. Studies report that significant fractions (5% to 50%) of the code in typical software systems are cloned. Although code cloning may increase initial productivity, it may cause fault propagation, inflate the code base and increase maintenance overhead. Thus, it is believed that code clones should be identified and carefully managed. This Ph.D. thesis contributes in clone management with techniques realized into tools and large-scale in-depth analyses of clones to inform clone management in devising effective techniques and strategies. To support proactive clone management, we have developed a clone detector as a plug-in to the Eclipse IDE. For clone detection, we used a hybrid approach that combines the strength of both parser-based and text-based techniques. To capture clones that are similar but not exact duplicates, we adopted a novel approach that applies a suffix-tree-based k-difference hybrid algorithm, borrowed from the area of computational biology. Instead of targeting all clones from the entire code base, our tool aids clone-aware development by allowing focused search for clones of any code fragment of the developer's interest. A good understanding on the code cloning phenomenon is a prerequisite to devise efficient clone management strategies. The second phase of the thesis includes large-scale empirical studies on the characteristics (e.g., proportion, types of similarity, change patterns) of code clones in evolving software systems. Applying statistical techniques, we also made fairly accurate forecast on the proportion of code clones in the future versions of software projects. The outcome of these studies expose useful insights into the characteristics of evolving clones and their management implications. Upon identification of the code clones, their management often necessitates careful refactoring, which is dealt with at the third phase of the thesis. Given a large number of clones, it is difficult to optimally decide what to refactor and what not, especially when there are dependencies among clones and the objective remains the minimization of refactoring efforts and risks while maximizing benefits. In this regard, we developed a novel clone refactoring scheduler that applies a constraint programming approach. We also introduced a novel effort model for the estimation of efforts needed to refactor clones in source code. We evaluated our clone detector, scheduler and effort model through comparative empirical studies and user studies. Finally, based on our experience and in-depth analysis of the present state of the art, we expose avenues for further research and development towards a versatile clone management system that we envision

    Search based software engineering: Trends, techniques and applications

    Get PDF
    © ACM, 2012. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version is available from the link below.In the past five years there has been a dramatic increase in work on Search-Based Software Engineering (SBSE), an approach to Software Engineering (SE) in which Search-Based Optimization (SBO) algorithms are used to address problems in SE. SBSE has been applied to problems throughout the SE lifecycle, from requirements and project planning to maintenance and reengineering. The approach is attractive because it offers a suite of adaptive automated and semiautomated solutions in situations typified by large complex problem spaces with multiple competing and conflicting objectives. This article provides a review and classification of literature on SBSE. The work identifies research trends and relationships between the techniques applied and the applications to which they have been applied and highlights gaps in the literature and avenues for further research.EPSRC and E

    A Mono- and Multi-objective Approach for Recommending Software Refactoring

    Get PDF
    Les systèmes logiciels sont devenus de plus en plus répondus et importants dans notre société. Ainsi, il y a un besoin constant de logiciels de haute qualité. Pour améliorer la qualité de logiciels, l’une des techniques les plus utilisées est le refactoring qui sert à améliorer la structure d'un programme tout en préservant son comportement externe. Le refactoring promet, s'il est appliqué convenablement, à améliorer la compréhensibilité, la maintenabilité et l'extensibilité du logiciel tout en améliorant la productivité des programmeurs. En général, le refactoring pourra s’appliquer au niveau de spécification, conception ou code. Cette thèse porte sur l'automatisation de processus de recommandation de refactoring, au niveau code, s’appliquant en deux étapes principales: 1) la détection des fragments de code qui devraient être améliorés (e.g., les défauts de conception), et 2) l'identification des solutions de refactoring à appliquer. Pour la première étape, nous traduisons des régularités qui peuvent être trouvés dans des exemples de défauts de conception. Nous utilisons un algorithme génétique pour générer automatiquement des règles de détection à partir des exemples de défauts. Pour la deuxième étape, nous introduisons une approche se basant sur une recherche heuristique. Le processus consiste à trouver la séquence optimale d'opérations de refactoring permettant d'améliorer la qualité du logiciel en minimisant le nombre de défauts tout en priorisant les instances les plus critiques. De plus, nous explorons d'autres objectifs à optimiser: le nombre de changements requis pour appliquer la solution de refactoring, la préservation de la sémantique, et la consistance avec l’historique de changements. Ainsi, réduire le nombre de changements permets de garder autant que possible avec la conception initiale. La préservation de la sémantique assure que le programme restructuré est sémantiquement cohérent. De plus, nous utilisons l'historique de changement pour suggérer de nouveaux refactorings dans des contextes similaires. En outre, nous introduisons une approche multi-objective pour améliorer les attributs de qualité du logiciel (la flexibilité, la maintenabilité, etc.), fixer les « mauvaises » pratiques de conception (défauts de conception), tout en introduisant les « bonnes » pratiques de conception (patrons de conception).Software systems have become prevalent and important in our society. There is a constant need for high-quality software. Hence, to improve software quality, one of the most-used techniques is the refactoring which improves design structure while preserving the external behavior. Refactoring has promised, if applied well, to improve software readability, maintainability and extendibility while increasing the speed at which programmers can write and maintain their code. In general, refactoring can be performed in various levels such as the requirement, design, or code level. In this thesis, we mainly focus on the source code level where automated refactoring recommendation can be performed through two main steps: 1) detection of code fragments that need to be improved/fixed (e.g., code-smells), and 2) identification of refactoring solutions to achieve this goal. For the code-smells identification step, we translate regularities that can be found in such code-smell examples into detection rules. To this end, we use genetic programming to automatically generate detection rules from examples of code-smells. For the refactoring identification step, a search-based approach is used. The process aims at finding the optimal sequence of refactoring operations that improve software quality by minimizing the number of detected code-smells while prioritizing the most critical ones. In addition, we explore other objectives to optimize using a multi-objective approach: the code changes needed to apply refactorings, semantics preservation, and the consistency with development change history. Hence, reducing code changes allows us to keep as much as possible the initial design. On the other hand, semantics preservation insures that the refactored program is semantically coherent, and that it models correctly the domain-semantics. Indeed, we use knowledge from historical code change to suggest new refactorings in similar contexts. Furthermore, we introduce a novel multi-objective approach to improve software quality attributes (i.e., flexibility, maintainability, etc.), fix “bad” design practices (i.e., code-smells) while promoting “good” design practices (i.e., design patterns)

    Analyzing Clone Evolution for Identifying the Important Clones for Management

    Get PDF
    Code clones (identical or similar code fragments in a code-base) have dual but contradictory impacts (i.e., both positive and negative impacts) on the evolution and maintenance of a software system. Because of the negative impacts (such as high change-proneness, bug-proneness, and unintentional inconsistencies), software researchers consider code clones to be the number one bad-smell in a code-base. Existing studies on clone management suggest managing code clones through refactoring and tracking. However, a software system's code-base may contain a huge number of code clones, and it is impractical to consider all these clones for refactoring or tracking. In these circumstances, it is essential to identify code clones that can be considered particularly important for refactoring and tracking. However, no existing study has investigated this matter. We conduct our research emphasizing this matter, and perform five studies on identifying important clones by analyzing clone evolution history. In our first study we detect evolutionary coupling of code clones by automatically investigating clone evolution history from thousands of commits of software systems downloaded from on-line SVN repositories. By analyzing evolutionary coupling of code clones we identify a particular clone change pattern, Similarity Preserving Change Pattern (SPCP), such that code clones that evolve following this pattern should be considered important for refactoring. We call these important clones the SPCP clones. We rank SPCP clones considering their strength of evolutionary coupling. In our second study we further analyze evolutionary coupling of code clones with an aim to assist clone tracking. The purpose of clone tracking is to identify the co-change (i.e. changing together) candidates of code clones to ensure consistency of changes in the code-base. Our research in the second study identifies and ranks the important co-change candidates by analyzing their evolutionary coupling. In our third study we perform a deeper analysis on the SPCP clones and identify their cross-boundary evolutionary couplings. On the basis of such couplings we separate the SPCP clones into two disjoint subsets. While one subset contains the non-cross-boundary SPCP clones which can be considered important for refactoring, the other subset contains the cross-boundary SPCP clones which should be considered important for tracking. In our fourth study we analyze the bug-proneness of different types of SPCP clones in order to identify which type(s) of code clones have high tendencies of experiencing bug-fixes. Such clone-types can be given high priorities for management (refactoring or tracking). In our last study we analyze and compare the late propagation tendencies of different types of code clones. Late propagation is commonly regarded as a harmful clone evolution pattern. Findings from our last study can help us prioritize clone-types for management on the basis of their tendencies of experiencing late propagations. We also find that late propagation can be considerably minimized by managing the SPCP clones. On the basis of our studies we develop an automatic system called AMIC (Automatic Mining of Important Clones) that identifies the important clones for management (refactoring and tracking) and ranks these clones considering their evolutionary coupling, bug-proneness, and late propagation tendencies. We believe that our research findings have the potential to assist clone management by pin-pointing the important clones to be managed, and thus, considerably minimizing clone management effort

    Automatic and Safe Refactoring of Function-Level Code Clones

    Get PDF
    학위논문 (석사)-- 서울대학교 대학원 : 공과대학 컴퓨터공학부, 2019. 2. 이광근.본 논문은 JAVA 소스코드를 대상으로 함수단위 코드중복 쌍을 찾아 자동으로 안전하게 고쳐주는 방법을 제안한다. 이 방법은 크게 두 단계로 구성된다. (i) 전체 소스코드 내에서 중복코드 탐지기를 이용하여 중복코드 쌍을 찾아낸다. (ii) 찾아낸 중복코드 쌍을 유형별로 분류한 뒤 알맞은 알고리즘을 통해 안전하게 고친다. (i)에서는 Deckard 중복코드 탐지기를 활용한다. (ii)에서는 입력받은 함수단위 중복코드 쌍을 두가지 유형으로 나누어 처리한다. 첫번째 유형은 두 함수의 타입이 같을 때로 두 함수를 병합하여 처리하고, 두번째 유형은 타입이 다를 때로 내부 중복부분을 추출하여 처리하는 것이다. 이 방법의 실효성을 보이기 위해 함수단위 중복 리팩토링 도구 AutoRefactor를 설계 및 구현하였다. 구현한 도구로 회사 제품코드와 오픈소스 프로젝트에서 찾은 실제 중복을 없애기 위해 실험한 결과, 같은 파일 내 존재하는 함수단위 중복코드 6,082 LOC(55개쌍 중복) 중 3,976 LOC(30개쌍 중복)를 처리할 수 있었다.This paper proposes a method to automatically find duplicated function-level code clone pairs and securely fix them for JAVA programs. This method consists of two major stages. (i) Find duplicated code pairs using code clone detector within the complete source code. (ii) Classify the located duplicated code pairs and fix them safely through the appropriate algorithm. Stage (i) utilizes the Deckard code clone detector. In stage (ii), duplicated code pairs are divided into two classes. If two functions have same type, they are merged into a single function. If they don't have same type, internal duplicated code snippet is extracted. In order to demonstrate the effectiveness of this method, I designed and implemented function-level code clone refactoring tool, AutoRefactor. I experimented AutoRefactor with real company product codes and open source projects. Experimental results show that AutoRefactor can process 3,976 LOCs (30 clone pairs) out of 6,082 LOCs (55 clone pairs).I. 서론 1. 동기 2. 해결 방안 3. 결과 4. 논문의 구성 II. 중복코드 탐지 1. 중복코드의 유형 2. 중복 리팩토링의 유형 3. 코드 중복의 정의 4. Deckard III. 중복처리 알고리즘 1. 두 중복함수의 타입이 같은 경우 1.1. 함수 병합 2. 두 중복함수의 타입이 다른 경우 2.1. 함수 추출 IV. 중복처리 알고리즘 확장 1. 함수 병합 : 처리방식 다양화 2. 벤치마크 적용 예시 V. 실험 결과 VI. 관련 연구 및 논의 VII. 결론Maste

    Automated Improvement of Software Design by Search-Based Refactoring

    Get PDF
    Le coût de maintenance du logiciel est estimé à plus de 70% du coût total du système, en raison de nombreux facteurs, y compris les besoins des nouveaux utilisateurs, l’adoption de nouvelles technologies et la qualité des systèmes logiciels. De ces facteurs, la qualité est celle que nous pouvons contrôler et continuellement améliorer pour empêcher la dégradation de la performance et la réduction de l’efficacité (par exemple, la dégradation de la conception du logiciel). De plus, pour rester compétitive, l’industrie du logiciel a raccourci ses cycles de lancement afin de fournir de nouveaux produits et fonctionnalités plus rapidement, ce qui entraîne une pression accrue sur les équipes de développeurs et une accélération de l’évolution de la conception du système. Une façon d’empêcher la dégradation du logiciel est l’identification et la correction des anti-patrons qui sont des indicateurs de mauvaise qualité de conception. Pour améliorer la qualité de la conception et supprimer les anti-patrons, les développeurs effectuent de petites transformations préservant le comportement (c.-à-d., refactoring). Le refactoring manuel est coûteux, car il nécessite (1) d’identifier les entités de code qui doivent être refactorisées ; (2) générer des opérations de refactoring pour les classes identifiées à l’étape précédente ; (3) trouver le bon ordre d’application des refactorings générés, pour maximiser le bénéfice pour la qualité du code et minimiser les conflits. Ainsi, les chercheurs et les praticiens ont formulé le refactoring comme un problème d’optimisation et utilisent des techniques basées sur la recherche pour proposer des approches (semi) automatisées pour le résoudre. Dans cette thèse, je propose plusieurs méthodes pour résoudre les principaux problèmes des outils existants, afin d’aider les développeurs dans leurs activités de maintenance et d’assurance qualité. Ma thèse est qu’il est possible d’améliorer le refactoring automatisé en considérant de nouvelles dimensions : (1) le contexte de tâche du développeur pour prioriser le refactoring des classes pertinentes ; (2) l’effort du test pour réduire le coût des tests après le refactoring ; (3) l’identification de conflit entre opérations de refactoring afin de réduire le coût de refactoring ; et (4) l’efficacité énergétique pour améliorer la consommation d’énergie des applications mobiles après refactoring.----------ABSTRACT: Software maintenance cost is estimated to be more than 70% of the total cost of system, because of many factors, including new user’s requirements, the adoption of new technologies and the quality of software systems. From these factors, quality is the one that we can control and continually improved to prevent degradation of performance and reduction of effectiveness (a.k.a. design decay). Moreover, to stay competitive, the software industry has shortened its release cycles to deliver new products and features faster, which results in more pressure on developer teams and the acceleration of system’s design evolution. One way to prevent design decay is the identification and correction of anti-patterns which are indicators of poor design quality. To improve design quality and remove anti-patterns, developers perform small behavior-preserving transformations (a.k.a. refactoring). Manual refactoring is expensive, as it requires to (1) identify the code entities that need to be refactored; (2) generate refactoring operations for classes identified in the previous step; (3) find the correct order of application of the refactorings generated, to maximize the quality effect and to minimize conflicts. Hence, researchers and practitioners have formulated refactoring as an optimization problem and use search-based techniques to propose (semi)automated approaches to solve it. In this dissertation, we propose several approaches to tackle some of the major issues in existing refactoring tools, to assist developers in their maintenance and quality assurance activities

    Handling High-Level Model Changes Using Search Based Software Engineering

    Full text link
    Model-Driven Engineering (MDE) considers models as first-class artifacts during the software lifecycle. The number of available tools, techniques, and approaches for MDE is increasing as its use gains traction in driving quality, and controlling cost in evolution of large software systems. Software models, defined as code abstractions, are iteratively refined, restructured, and evolved. This is due to many reasons such as fixing defects in design, reflecting changes in requirements, and modifying a design to enhance existing features. In this work, we focus on four main problems related to the evolution of software models: 1) the detection of applied model changes, 2) merging parallel evolved models, 3) detection of design defects in merged model, and 4) the recommendation of new changes to fix defects in software models. Regarding the first contribution, a-posteriori multi-objective change detection approach has been proposed for evolved models. The changes are expressed in terms of atomic and composite refactoring operations. The majority of existing approaches detects atomic changes but do not adequately address composite changes which mask atomic operations in intermediate models. For the second contribution, several approaches exist to construct a merged model by incorporating all non-conflicting operations of evolved models. Conflicts arise when the application of one operation disables the applicability of another one. The essence of the problem is to identify and prioritize conflicting operations based on importance and context – a gap in existing approaches. This work proposes a multi-objective formulation of model merging that aims to maximize the number of successfully applied merged operations. For the third and fourth contributions, the majority of existing works focuses on refactoring at source code level, and does not exploit the benefits of software design optimization at model level. However, refactoring at model level is inherently more challenging due to difficulty in assessing the potential impact on structural and behavioral features of the software system. This requires analysis of class and activity diagrams to appraise the overall system quality, feasibility, and inter-diagram consistency. This work focuses on designing, implementing, and evaluating a multi-objective refactoring framework for detection and fixing of design defects in software models.Ph.D.Information Systems Engineering, College of Engineering and Computer ScienceUniversity of Michigan-Dearbornhttp://deepblue.lib.umich.edu/bitstream/2027.42/136077/1/Usman Mansoor Final.pdfDescription of Usman Mansoor Final.pdf : Dissertatio

    Detection and analysis of near-miss clone genealogies

    Get PDF
    It is believed that identical or similar code fragments in source code, also known as code clones, have an impact on software maintenance. A clone genealogy shows how a group of clone fragments evolve with the evolution of the associated software system, and thus may provide important insights on the maintenance implications of those clone fragments. Considering the importance of studying the evolution of code clones, many studies have been conducted on this topic. However, after a decade of active research, there has been a marked lack of progress in understanding the evolution of near-miss software clones, especially where statements have been added, deleted, or modified in the copied fragments. Given that there are a significant amount of near-miss clones in the software systems, we believe that without studying the evolution of near-miss clones, one cannot have a complete picture of the clone evolution. In this thesis, we have advanced the state-of-the-art in the evolution of clone research in the context of both exact and near-miss software clones. First, we performed a large-scale empirical study to extend the existing knowledge about the evolution of exact and renamed clones where identifiers have been modified in the copied fragments. Second, we have developed a framework, gCad that can automatically extract both exact and near-miss clone genealogies across multiple versions of a program and identify their change patterns reasonably fast while maintaining high precision and recall. Third, in order to gain a broader perspective of clone evolution, we extended gCad to calculate various evolutionary metrics, and performed an in-depth empirical study on the evolution of both exact and near-miss clones in six open source software systems of two different programming languages with respect to five research questions. We discovered several interesting evolutionary phenomena of near-miss clones which either contradict with previous findings or are new. Finally, we further improved gCad, and investigated a wide range of attributes and metrics derived from both the clones themselves and their evolution histories to identify certain attributes, which developers often use to remove clones in the real world. We believe that our new insights in the evolution of near-miss clones, and about how developers approach and remove duplication, will play an important role in understanding the maintenance implications of clones and will help design better clone management systems

    Visualization and analysis of software clones

    Get PDF
    Code clones are identical or similar fragments of code in a software system. Simple copy-paste programming practices of developers, reusing existing code fragments instead of implementing from the scratch, limitations of both programming languages and developers are the primary reasons behind code cloning. Despite the maintenance implications of clones, it is not possible to conclude that cloning is harmful because there are also benefits in using them (e.g. faster and independent development). As a result, researchers at least agree that clones need to be analyzed before aggressively refactoring them. Although a large number of state-of-the-art clone detectors are available today, handling raw clone data is challenging due to the textual nature and large volume. To address this issue, we propose a framework for large-scale clone analysis and develop a maintenance support environment based on the framework called VisCad. To manage the large volume of clone data, VisCad employs the Visual Information Seeking Mantra: overview first, zoom and filter, then provide details-on-demand. With VisCad users can analyze and identify distinctive code clones through a set of visualization techniques, metrics covering different clone relations and data filtering operations. The loosely coupled architecture of VisCad allows users to work with any clone detection tool that reports source-coordinates of the found clones. This yields the opportunity to work with the clone detectors of choice, which is important because each clone detector has its own strengths and weaknesses. In addition, we extend the support for clone evolution analysis, which is important to understand the cause and effect of changes at the clone level during the evolution of a software system. Such information can be used to make software maintenance decisions like when to refactor clones. We propose and implement a set of visualizations that can allow users to analyze the evolution of clones from a coarse grain to a fine grain level. Finally, we use VisCad to extract both spatial and temporal clone data to predict changes to clones in a future release/revision of the software, which can be used to rank clone classes as another means of handling a large volume of clone data. We believe that VisCad makes clone comprehension easier and it can be used as a test-bed to further explore code cloning, necessary in building a successful clone management system
    corecore