18 research outputs found

    Structured Review of the Evidence for Effects of Code Duplication on Software Quality

    Get PDF
    This report presents the detailed steps and results of a structured review of code clone literature. The aim of the review is to investigate the evidence for the claim that code duplication has a negative effect on code changeability. This report contains only the details of the review for which there is not enough place to include them in the companion paper published at a conference (Hordijk, Ponisio et al. 2009 - Harmfulness of Code Duplication - A Structured Review of the Evidence)

    A novel approach for Software Clone detection using Data Mining in Software

    Get PDF
    The Similar Program structures which recur in variant forms in software systems are code clones. Many techniques are proposed in order to detect similar code fragments in software. The software maintenance is generally helped by maintenance is generally helped by the identification and subsequent unification. When the patterns of simple clones reoccur, it is an indication for the presence of interesting higher-level similarities. They are called as Structural Clones. The structural clones when compared to simple clones show a bigger picture of similarities. The problem of huge number of clones is alleviated by the structural clones, which are part of logical groups of simple clones. In order to understand the design of the system for better maintenance and reengineering for reuse, detection of structural clones is essential. In this paper, a technique which is useful to detect some useful types of structural clones is proposed. The novelty of the present approach comprises the formulation of the structural clone concept and the application of data mining techniques. A novel approach is useful for implementation of the proposed technique is described

    An Extended Stable Marriage Problem Algorithm for Clone Detection

    Full text link
    Code cloning negatively affects industrial software and threatens intellectual property. This paper presents a novel approach to detecting cloned software by using a bijective matching technique. The proposed approach focuses on increasing the range of similarity measures and thus enhancing the precision of the detection. This is achieved by extending a well-known stable-marriage problem (SMP) and demonstrating how matches between code fragments of different files can be expressed. A prototype of the proposed approach is provided using a proper scenario, which shows a noticeable improvement in several features of clone detection such as scalability and accuracy.Comment: 20 pages, 10 figures, 6 table

    Topic identification using filtering and rule generation algorithm for textual document

    Get PDF
    Information stored digitally in text documents are seldom arranged according to specific topics. The necessity to read whole documents is time-consuming and decreases the interest for searching information. Most existing topic identification methods depend on occurrence of terms in the text. However, not all frequent occurrence terms are relevant. The term extraction phase in topic identification method has resulted in extracted terms that might have similar meaning which is known as synonymy problem. Filtering and rule generation algorithms are introduced in this study to identify topic in textual documents. The proposed filtering algorithm (PFA) will extract the most relevant terms from text and solve synonym roblem amongst the extracted terms. The rule generation algorithm (TopId) is proposed to identify topic for each verse based on the extracted terms. The PFA will process and filter each sentence based on nouns and predefined keywords to produce suitable terms for the topic. Rules are then generated from the extracted terms using the rule-based classifier. An experimental design was performed on 224 English translated Quran verses which are related to female issues. Topics identified by both TopId and Rough Set technique were compared and later verified by experts. PFA has successfully extracted more relevant terms compared to other filtering techniques. TopId has identified topics that are closer to the topics from experts with an accuracy of 70%. The proposed algorithms were able to extract relevant terms without losing important terms and identify topic in the verse

    The last line effect explained

    Get PDF
    Micro-clones are tiny duplicated pieces of code; they typically comprise only few statements or lines. In this paper, we study the “Last Line Effect,” the phenomenon that the last line or statement in a micro-clone is much more likely to contain an error than the previous lines or statements. We do this by analyzing 219 open source projects and reporting on 263 faulty micro-clones and interviewing six authors of real-world faulty micro-clones. In an interdisciplinary collaboration, we examine the underlying psychological mechanisms for the presence of these relatively trivial errors. Based on the interviews and further technical analyses, we suggest that so-called “action slips” play a pivotal role for the existence of the last line effect: Developers’ attention shifts away at the end of a micro-clone creation task due to noise and the routine nature of the task. Moreover, all micro-clones whose origin we could determine were introduced in unusually large commits. Practitioners benefit from this knowledge twofold: 1) They can spot situations in which they are likely to introduce a faulty micro-clone and 2) they can use PVS-Studio, our automated micro-clone detector, to help find erroneous micro-clones

    The last line effect explained

    Get PDF

    Detecting Test Clones with Static Analysis

    Get PDF
    Large-scale software systems often have correspondingly complicated test suites, which are diffi cult for developers to construct and maintain. As systems evolve, engineers must update their test suite along with changes in the source code. Tests created by duplicating and modifying previously existing tests (clones) can complicate this task. Several testing technologies have been proposed to mitigate cloning in tests, including parametrized unit tests and test theories. However, detecting opportunities to improve existing test suites is labour intensive. This thesis presents a novel technique for etecting similar tests based on type hierarchies and method calls in test code. Using this technique, we can track variable history and detect test clones based on test assertion similarity. The thesis further includes results from our empirical study of 10 benchmark systems using this technique which suggest that test clone detection by our technique will aid test de-duplication eff orts in industrial systems

    Detecting Test Clones with Static Analysis

    Get PDF
    Large-scale software systems often have correspondingly complicated test suites, which are diffi cult for developers to construct and maintain. As systems evolve, engineers must update their test suite along with changes in the source code. Tests created by duplicating and modifying previously existing tests (clones) can complicate this task. Several testing technologies have been proposed to mitigate cloning in tests, including parametrized unit tests and test theories. However, detecting opportunities to improve existing test suites is labour intensive. This thesis presents a novel technique for etecting similar tests based on type hierarchies and method calls in test code. Using this technique, we can track variable history and detect test clones based on test assertion similarity. The thesis further includes results from our empirical study of 10 benchmark systems using this technique which suggest that test clone detection by our technique will aid test de-duplication eff orts in industrial systems

    Erstellung und Evaluation eines Verfahrens zur Messung von Redundanz anhand von Tokenzerlegung

    Get PDF
    Die vorliegende Arbeit beschäftigt sich mit der Frage, wie verschiedene Arten von Redundanz gemessen werden können und wie sich die Entfernung dieser Redundanzen auf die Größe des Quellcodes auswirkt. Des Weiteren wird untersucht, inwieweit verschiedene Programmierkonzepte und Features einen Einfluss auf Redundanzminderung besitzen. Zu diesem Zweck wird zunächst eine Literaturstudie durchgeführt. Weiterhin wird eine Definition von intrinsischer Redundanz eingeführt. Zudem werden verschiedene Messverfahren analysiert und darauf basierend wird ein Verfahren zur Messung von Redundanz anhand der Tokenzerlegung erstellt. Für die Untersuchung der Redundanz des Quellcodes sowie die Auswirkung der Verwendung von Programmierkonzepten und Features auf die Redundanz, wird ein Experiment anhand eines mehrteiligen Code Korpus mit funktional identischen Code Paaren durchgeführt. Zunächst wird dazu die bereinigte Tokenanzahl der Code Fragmente gemessen, ohne dass ein bestimmtes Konstrukt verwendet wurde. Diese Messwerte werden dann mit den Messwerten der Code Fragmente verglichen, in denen das betreffende Konstrukt angewendet wird. Es hat sich dabei gezeigt, dass es bestimmte Sprachkonstrukte gibt, die eine redundanzmindernde Wirkung besitzen. Für verschiedene Konstrukte zeigte sich außerdem, dass ihre Verwendung erst ab einer bestimmten Anzahl oder Größe der zu ersetzenden Code Fragmente, eine redundanzmindernde Wirkung aufweist

    Common Attack Surface Detection

    Get PDF
    In the current software development market, many software is being developed using a copy-paste mechanism with little to no change made to the reused code. Such a practice has the potential of causing severe security issues since one fragment of code containing a vulnerability may cause the same vulnerability to appear in many other software with the same cloned fragment. The concept of relying on software diversity for security may also be compromised by such a trend, since seemingly different software may in fact share vulnerable code fragments. Although there exist efforts on detecting cloned code fragments, there lack solutions for formally characterizing the specific impact on security. In this thesis, we revisit the concept of software diversity from a security viewpoint. Specifically, we define the novel concept of common attack surface to model the relative degree to which a pair of software may be sharing potentially vulnerable code fragments. To implement the concept, we develop an automated tool, Dupsec, in order to efficiently identify common attack surface between any given pair of software applications with minimum human intervention. Finally, we conduct experiments by applying our tool to a large number of open source software. Our results demonstrate many seemingly unrelated real-world software indeed share significant common attack surface
    corecore