179 research outputs found

    Understanding the Evolution of Code Clones in Software Systems

    Get PDF
    Code cloning is a common practice in software development. However, code cloning has both positive aspects such as accelerating the development process and negative aspects such as causing code bloat. After a decade of active research, it is clear that removing all of the clones from a software system is not desirable. Therefore, it is better to manage clones than to remove them. A software system can have thousands of clones in it, which may serve multiple purposes. However, some of the clones may cause unwanted management difficulties and clones like these should be refactored. Failure to manage clones may cause inconsistencies in the code, which is prone to error. Managing thousands of clones manually would be a difficult task. A clone management system can help manage clones and find patterns of how clones evolve during the evolution of a software system. In this research, we propose a framework for constructing and visualizing clone genealogies with change patterns (e.g., inconsistent changes), bug information, developer information and several other important metrics in a software system. Based on the framework we design and build an interactive prototype for a multi-touch surface (e.g., an iPad). The prototype uses a variety of techniques to support understanding clone genealogies, including: identifying and providing a compact overview of the clone genealogies along with their key characteristics; providing interactive navigation of genealogies, cloned source code and the differences between clone fragments; providing the ability to filter and organize genealogies based on their properties; providing a feature for annotating clone fragments with comments to aid future review; and providing the ability to contact developers from within the system to find out more information about specific clones. To investigate the suitability of the framework and prototype for investigating and managing cloned code, we elicit feedback from practicing researchers and developers, and we conduct two empirical studies: a detailed investigation into the evolution of function clones and a detailed investigation into how clones contribute to bugs. In both empirical studies we are able to use the prototype to quickly investigate the cloned source code to gain insights into clone use. We believe that the clone management system and the findings will play an important role in future studies and in managing code clones in software systems

    Improving software engineering processes using machine learning and data mining techniques

    Get PDF
    The availability of large amounts of data from software development has created an area of research called mining software repositories. Researchers mine data from software repositories both to improve understanding of software development and evolution, and to empirically validate novel ideas and techniques. The large amount of data collected from software processes can then be leveraged for machine learning applications. Indeed, machine learning can have a large impact in software engineering, just like it has had in other fields, supporting developers, and other actors involved in the software development process, in automating or improving parts of their work. The automation can not only make some phases of the development process less tedious or cheaper, but also more efficient and less prone to errors. Moreover, employing machine learning can reduce the complexity of difficult problems, enabling engineers to focus on more interesting problems rather than the basics of development. The aim of this dissertation is to show how the development and the use of machine learning and data mining techniques can support several software engineering phases, ranging from crash handling, to code review, to patch uplifting, to software ecosystem management. To validate our thesis we conducted several studies tackling different problems in an industrial open-source context, focusing on the case of Mozilla

    Analyzing Clone Evolution for Identifying the Important Clones for Management

    Get PDF
    Code clones (identical or similar code fragments in a code-base) have dual but contradictory impacts (i.e., both positive and negative impacts) on the evolution and maintenance of a software system. Because of the negative impacts (such as high change-proneness, bug-proneness, and unintentional inconsistencies), software researchers consider code clones to be the number one bad-smell in a code-base. Existing studies on clone management suggest managing code clones through refactoring and tracking. However, a software system's code-base may contain a huge number of code clones, and it is impractical to consider all these clones for refactoring or tracking. In these circumstances, it is essential to identify code clones that can be considered particularly important for refactoring and tracking. However, no existing study has investigated this matter. We conduct our research emphasizing this matter, and perform five studies on identifying important clones by analyzing clone evolution history. In our first study we detect evolutionary coupling of code clones by automatically investigating clone evolution history from thousands of commits of software systems downloaded from on-line SVN repositories. By analyzing evolutionary coupling of code clones we identify a particular clone change pattern, Similarity Preserving Change Pattern (SPCP), such that code clones that evolve following this pattern should be considered important for refactoring. We call these important clones the SPCP clones. We rank SPCP clones considering their strength of evolutionary coupling. In our second study we further analyze evolutionary coupling of code clones with an aim to assist clone tracking. The purpose of clone tracking is to identify the co-change (i.e. changing together) candidates of code clones to ensure consistency of changes in the code-base. Our research in the second study identifies and ranks the important co-change candidates by analyzing their evolutionary coupling. In our third study we perform a deeper analysis on the SPCP clones and identify their cross-boundary evolutionary couplings. On the basis of such couplings we separate the SPCP clones into two disjoint subsets. While one subset contains the non-cross-boundary SPCP clones which can be considered important for refactoring, the other subset contains the cross-boundary SPCP clones which should be considered important for tracking. In our fourth study we analyze the bug-proneness of different types of SPCP clones in order to identify which type(s) of code clones have high tendencies of experiencing bug-fixes. Such clone-types can be given high priorities for management (refactoring or tracking). In our last study we analyze and compare the late propagation tendencies of different types of code clones. Late propagation is commonly regarded as a harmful clone evolution pattern. Findings from our last study can help us prioritize clone-types for management on the basis of their tendencies of experiencing late propagations. We also find that late propagation can be considerably minimized by managing the SPCP clones. On the basis of our studies we develop an automatic system called AMIC (Automatic Mining of Important Clones) that identifies the important clones for management (refactoring and tracking) and ranks these clones considering their evolutionary coupling, bug-proneness, and late propagation tendencies. We believe that our research findings have the potential to assist clone management by pin-pointing the important clones to be managed, and thus, considerably minimizing clone management effort

    Automated Refactoring in Software Automation Platforms

    Get PDF
    Software Automation Platforms (SAPs) enable faster development and reduce the need to use code to construct applications. SAPs provide abstraction and automation, result- ing in a low-entry barrier for users with less programming skills to become proficient developers. An unfortunate consequence of using SAPs is the production of code with a higher technical debt since such developers are less familiar with the software develop- ment best practices. Hence, SAPs should aim to produce a simpler software construction and evolution pipeline beyond providing a rapid software development environment. One simple example of such high technical debt is the Unlimited Records anti-pattern, which occurs whenever queries are unbounded, i.e. the maximum number of records to be fetched is not explicitly limited. Limiting the number of records retrieved may, in many cases, improve the performance of applications by reducing screen-loading time, thus making applications faster and more responsive, which is a top priority for developers. A second example is the Duplicated Code anti-pattern that severely affects code readability and maintainability, and can even be the cause of bug propagation. To overcome this problem we will resort to automated refactoring as it accelerates the refactoring process and provides provably correct modifications. This dissertation aims to study and develop a solution for automated refactorings in the context of OutSystems (an industry-leading SAP). This was carried out by implement- ing automated techniques for automatically refactoring a set of selected anti-patterns in OutSystems logic that are currently detected by the OutSystems technical debt monitor- ing tool.As Plataformas de Automação de Software (PAS) habilitam os seus utilizadores a desen- volver aplicações de forma mais rápida e reduzem a necessidade de escrever código. Estas fornecem abstração e automação, o que auxilia utilizadores com menos formação técnica a tornarem-se programadores proficientes. No entanto, a integração de programadores com menos formação técnica também contribui para a produção de código com alta dívida técnica, uma vez que os mesmos estão menos familiarizados com as melhores práticas de desenvolvimento de software. Desta forma, as PAS devem ter como objetivo a cons- trução e evolução de software de forma simples para além de fornecer um ambiente de desenvolvimento de software rápido. Um exemplo de alta dívida técnica é o anti-padrão Unlimited Records, que ocorre sempre que o número máximo de registos a ser retornado por uma consulta à base de dados não é explicitamente limitado. Limitar o número de registos devolvidos pode, em muitos casos, melhorar o desempenho das aplicações, reduzindo o tempo que demora a carregar o ecrã, tornando assim as aplicações mais rápidas e responsivas, sendo esta uma das principais prioridades dos programadores. Um segundo exemplo é o anti-padrão Código Duplicado que afeta gravemente a legibilidade e manutenção do código, e que pode causar a propagação de erros. Para superar este problema, recorreremos à reestru- turação automatizada, pois acelera o processo de reestruturação através de modificações comprovadamente corretas. O objetivo desta dissertação é estudar e desenvolver uma solução para reestruturação automatizada no contexto da OutSystems (uma PAS líder neste setor). Tal foi realizado através da implementação de técnicas automatizadas para reestruturar um conjunto de anti-padrões que são atualmente detetados pela ferramenta de monitorização de dívida técnica da OutSystems
    corecore