70 research outputs found

    Structured Review of the Evidence for Effects of Code Duplication on Software Quality

    Get PDF
    This report presents the detailed steps and results of a structured review of code clone literature. The aim of the review is to investigate the evidence for the claim that code duplication has a negative effect on code changeability. This report contains only the details of the review for which there is not enough place to include them in the companion paper published at a conference (Hordijk, Ponisio et al. 2009 - Harmfulness of Code Duplication - A Structured Review of the Evidence)

    A Large-scale Empirical Study of Java Language Feature Usage

    Get PDF
    Programming languages evolve over time, adding additional language features to simplify common tasks and make the language easier to use. For example, the Java Language Specification has four editions and is currently drafting a fifth. While the addition of language features is driven by an assumed need by the community (often with direct requests for such features), there is little empirical evidence demonstrating how these new features are adopted by developers once released. In this paper, we analyze over 23k open-source Java projects representing over 7 million Java files, which when parsed contain over 14 billion AST nodes. We analyze this corpus to find uses of new Java language features over time. Our study gives interesting insights, such as: the fact that while all features are used, there are still millions of more places they could potentially be used; all features are used before release; and features tend to be adopted by committers on an individual basis rather than as a team

    Management Aspects of Software Clone Detection and Analysis

    Get PDF
    Copying a code fragment and reusing it by pasting with or without minor modifications is a common practice in software development for improved productivity. As a result, software systems often have similar segments of code, called software clones or code clones. Due to many reasons, unintentional clones may also appear in the source code without awareness of the developer. Studies report that significant fractions (5% to 50%) of the code in typical software systems are cloned. Although code cloning may increase initial productivity, it may cause fault propagation, inflate the code base and increase maintenance overhead. Thus, it is believed that code clones should be identified and carefully managed. This Ph.D. thesis contributes in clone management with techniques realized into tools and large-scale in-depth analyses of clones to inform clone management in devising effective techniques and strategies. To support proactive clone management, we have developed a clone detector as a plug-in to the Eclipse IDE. For clone detection, we used a hybrid approach that combines the strength of both parser-based and text-based techniques. To capture clones that are similar but not exact duplicates, we adopted a novel approach that applies a suffix-tree-based k-difference hybrid algorithm, borrowed from the area of computational biology. Instead of targeting all clones from the entire code base, our tool aids clone-aware development by allowing focused search for clones of any code fragment of the developer's interest. A good understanding on the code cloning phenomenon is a prerequisite to devise efficient clone management strategies. The second phase of the thesis includes large-scale empirical studies on the characteristics (e.g., proportion, types of similarity, change patterns) of code clones in evolving software systems. Applying statistical techniques, we also made fairly accurate forecast on the proportion of code clones in the future versions of software projects. The outcome of these studies expose useful insights into the characteristics of evolving clones and their management implications. Upon identification of the code clones, their management often necessitates careful refactoring, which is dealt with at the third phase of the thesis. Given a large number of clones, it is difficult to optimally decide what to refactor and what not, especially when there are dependencies among clones and the objective remains the minimization of refactoring efforts and risks while maximizing benefits. In this regard, we developed a novel clone refactoring scheduler that applies a constraint programming approach. We also introduced a novel effort model for the estimation of efforts needed to refactor clones in source code. We evaluated our clone detector, scheduler and effort model through comparative empirical studies and user studies. Finally, based on our experience and in-depth analysis of the present state of the art, we expose avenues for further research and development towards a versatile clone management system that we envision

    Empirische Untersuchung der Eignung von Code-Clones für den Nachweis der Redundanz als Treiber für die Evolution von Programmierkonzepten

    Get PDF
    Bei der Entwicklung von Programmen werden durch Entwickler regelmäßig Code-Clones durch das Kopieren von Quellcode erzeugt. In dieser Arbeit wird ein Ansatz zur automatisierten Messung dieses duplizierten Codes mit Hilfe von Clone-Detection-Tools über mehrere Versionen von verschiedenen Software-Produkten gezeigt. Anhand der Historien von Code-Clones werden Einflüsse auf die Redundanzen dieser Software empirisch gemessen. Damit wird eine Grundlage für den Beweis, dass die Entwicklung von Programmiersprachen zu einem dominanten Teil durch Redundanzreduzierung getrieben wird, geschaffen.:Inhaltsverzeichnis Abstract I Inhaltsverzeichnis II 1 Einleitung 1 1.1 Problemstellung 1 1.2 Zielsetzung 1 1.3 Vorgehensweise 3 2 Vorbetrachtung 5 2.1 Programmierkonzepte 5 2.1.1 Definition 5 2.1.2 Programmierkonzepte in Java 5 2.2 Treiber für die Entwicklung von Programmierkonzepten 8 2.2.1 Arten der Treiber von Programmierkonzepten 9 2.2.2 Reduzierung von Redundanz in Software 10 2.2.2.1 Arten von Redundanz in Software 10 2.2.2.2 Code-Clones 11 2.2.2.3 Folgen von Redundanz in Software 13 2.2.3 Ansätze für den Nachweis von Redundanzreduzierung als Treiber 14 2.3 Auswahl Software Repositories für die Analysen 16 2.3.1 Arten von Software Repositories 16 2.3.2 Anforderung an Software Repositories 17 3 Erhebungsprozess für die Analyse von Software auf Clones 20 3.1 Aufbau des Erhebungsprozesses 20 3.1.1 Lösungsansatz 20 3.1.2 Prozessteuerung 21 3.2 Umgang mit Versionierung 22 3.2.1 Allgemein 22 3.2.2 Commit-Filter 24 3.3 Clone-Detection 25 3.3.1 Arten und Vertreter 25 3.3.2 Eigene Verwendung 28 3.3.2.1 Simian 28 3.3.2.2 CCFinderX 29 3.3.3 Laufzeitproblem und Lösungsansätze 31 3.4 Datenaggregation 32 4 Auswertung der Messungen 35 4.1 Vorgehensweise der Auswertung 35 4.2 Betrachtung von Code-Clone-Historien 35 4.3 Vergleich unterschiedlicher Konfigurationen 41 4.3.1 Vergleich unterschiedlicher Clone-Detection-Tools 41 4.3.2 Vergleich unterschiedlicher Commit-Filter 45 4.3.3 Vergleich unterschiedlicher Schwellwerte für die Erkennung 46 4.4 Untersuchung verschiedener Interessenpunkte 48 5 Nachbetrachtung 53 5.1 Fehlerbetrachtung 53 5.2 Erweiterungsmöglichkeiten 55 5.3 Schlussbemerkung 57 Anhang V Vorgehensweise der Literaturrecherchen V Verwendete Computerkonfiguration IX Beispiele für Dateien X Beispiel für Detailausgabe von Simian X Beispiel für Detailausgabe von CCFinderX XI Beispiel für aggregierte Daten XII Abbildungsverzeichnis XIII Tabellenverzeichnis XIV Programmtextverzeichnis XV Abkürzungsverzeichnis XVI Literaturverzeichnis XVII Eidesstattliche Erklärung XXII

    Bringing ultra-large-scale software repository mining to the masses with Boa

    Get PDF
    Mining software repositories provides developers and researchers a chance to learn from previous development activities and apply that knowledge to the future. Ultra-large-scale open source repositories (e.g., SourceForge with 350,000+ projects, GitHub with 250,000+ projects, and Google Code with 250,000+ projects) provide an extremely large corpus to perform such mining tasks on. This large corpus allows researchers the opportunity to test new mining techniques and empirically validate new approaches on real-world data. However, the barrier to entry is often extremely high. Researchers interested in mining must know a large number of techniques, languages, tools, etc, each of which is often complex. Additionally, performing mining at the scale proposed above adds additional complexity and often is difficult to achieve. The Boa language and infrastructure was developed to solve these problems. We provide users a domain-specific language tailored for software repository mining and allow them to submit queries via our web-based interface. These queries are then automatically parallelized and executed on a cluster, analyzing a dataset containing almost 700,000 projects, history information from millions of revisions, millions of Java source files, and billions of AST nodes. The language also provides an easy to comprehend visitor syntax to ease writing source code mining queries. The underlying infrastructure contains several optimizations, including query optimizations to make single queries faster as well as a fusion optimization to group queries from multiple users into a single query. The latter optimization is important as Boa is intended to be a shared, community resource. Finally, we show the potential benefit of Boa to the community by reproducing a previously published case study and performing a new case study on the adoption of Java language features
    corecore