2 research outputs found

    Stereo: editing clones refactored as code generators

    Get PDF
    International audienceClone detection is a largely mature technology able to detect many code duplications, also called clones, in software systems of practically any size. The classic approaches to clone management are either clone removal, which consists in refactoring clones as an available language abstraction, or clone tracking, using a so-called linked editor, able to propagate changes between clone instances. However, past studies have shown that clone removal is not always feasible due to the limited expressiveness of language abstractions, or not desirable because of the abstraction overhead or the risks inherent to the refactoring. Linked editors, on the other hand, provide costless abstraction at no risk, but have their own issues, such as limited expressiveness, scalability, and controllability. This paper presents a new approach in which clones are safely refactored as code generators, but the unmodified code is presented to the maintainers with the same look-and-feel as in a linked editor. This solution has good expressiveness, scalability, and controllability properties. A prototype such editor is presented along with a first application within an industrial project

    LASSO – an observatorium for the dynamic selection, analysis and comparison of software

    Full text link
    Mining software repositories at the scale of 'big code' (i.e., big data) is a challenging activity. As well as finding a suitable software corpus and making it programmatically accessible through an index or database, researchers and practitioners have to establish an efficient analysis infrastructure and precisely define the metrics and data extraction approaches to be applied. Moreover, for analysis results to be generalisable, these tasks have to be applied at a large enough scale to have statistical significance, and if they are to be repeatable, the artefacts need to be carefully maintained and curated over time. Today, however, a lot of this work is still performed by human beings on a case-by-case basis, with the level of effort involved often having a significant negative impact on the generalisability and repeatability of studies, and thus on their overall scientific value. The general purpose, 'code mining' repositories and infrastructures that have emerged in recent years represent a significant step forward because they automate many software mining tasks at an ultra-large scale and allow researchers and practitioners to focus on defining the questions they would like to explore at an abstract level. However, they are currently limited to static analysis and data extraction techniques, and thus cannot support (i.e., help automate) any studies which involve the execution of software systems. This includes experimental validations of techniques and tools that hypothesise about the behaviour (i.e., semantics) of software, or data analysis and extraction techniques that aim to measure dynamic properties of software. In this thesis a platform called LASSO (Large-Scale Software Observatorium) is introduced that overcomes this limitation by automating the collection of dynamic (i.e., execution-based) information about software alongside static information. It features a single, ultra-large scale corpus of executable software systems created by amalgamating existing Open Source software repositories and a dedicated DSL for defining abstract selection and analysis pipelines. Its key innovations are integrated capabilities for searching for selecting software systems based on their exhibited behaviour and an 'arena' that allows their responses to software tests to be compared in a purely data-driven way. We call the platform a 'software observatorium' since it is a place where the behaviour of large numbers of software systems can be observed, analysed and compared
    corecore