9 research outputs found

    Management Aspects of Software Clone Detection and Analysis

    Get PDF
    Copying a code fragment and reusing it by pasting with or without minor modifications is a common practice in software development for improved productivity. As a result, software systems often have similar segments of code, called software clones or code clones. Due to many reasons, unintentional clones may also appear in the source code without awareness of the developer. Studies report that significant fractions (5% to 50%) of the code in typical software systems are cloned. Although code cloning may increase initial productivity, it may cause fault propagation, inflate the code base and increase maintenance overhead. Thus, it is believed that code clones should be identified and carefully managed. This Ph.D. thesis contributes in clone management with techniques realized into tools and large-scale in-depth analyses of clones to inform clone management in devising effective techniques and strategies. To support proactive clone management, we have developed a clone detector as a plug-in to the Eclipse IDE. For clone detection, we used a hybrid approach that combines the strength of both parser-based and text-based techniques. To capture clones that are similar but not exact duplicates, we adopted a novel approach that applies a suffix-tree-based k-difference hybrid algorithm, borrowed from the area of computational biology. Instead of targeting all clones from the entire code base, our tool aids clone-aware development by allowing focused search for clones of any code fragment of the developer's interest. A good understanding on the code cloning phenomenon is a prerequisite to devise efficient clone management strategies. The second phase of the thesis includes large-scale empirical studies on the characteristics (e.g., proportion, types of similarity, change patterns) of code clones in evolving software systems. Applying statistical techniques, we also made fairly accurate forecast on the proportion of code clones in the future versions of software projects. The outcome of these studies expose useful insights into the characteristics of evolving clones and their management implications. Upon identification of the code clones, their management often necessitates careful refactoring, which is dealt with at the third phase of the thesis. Given a large number of clones, it is difficult to optimally decide what to refactor and what not, especially when there are dependencies among clones and the objective remains the minimization of refactoring efforts and risks while maximizing benefits. In this regard, we developed a novel clone refactoring scheduler that applies a constraint programming approach. We also introduced a novel effort model for the estimation of efforts needed to refactor clones in source code. We evaluated our clone detector, scheduler and effort model through comparative empirical studies and user studies. Finally, based on our experience and in-depth analysis of the present state of the art, we expose avenues for further research and development towards a versatile clone management system that we envision

    Leveraging Software Clones for Software Comprehension: Techniques and Practice

    Get PDF
    RÉSUMÉ Le corps de cette thèse est centré sur deux aspects de la détection de clones logiciels: la détection et l’application. En détection, la contribution principale de cette thèse est un nouveau détecteur de clones conçu avec la librairie mtreelib, elle-même développée expressément pour ce travail. Cette librairie implémente un arbre de métrique général, une structure de donnée spécialisée dans la division des espaces de métriques dans le but d’accélérer certaines requêtes communes, comme les requêtes par intervalles ou les requêtes de plus proche voisin. Cette structure est utilisée pour construire un détecteur de clones qui approxime la distance de Levenshtein avec une forte précision. Une brève évaluation est présentée pour soutenir cette précision. D’autres résultats pertinents sur les métriques et la détection incrémentale de clones sont également présentés. Plusieurs applications du nouveau détecteur de clones sont présentés. Tout d’abord, un algorithme original pour la reconstruction d’informations perdus dans les systèmes de versionnement est proposé et testé sur plusieurs grands systèmes. Puis, une évaluation qualitative et quantitative de Firefox est faite sur la base d’une analyse du plus proche voisin; les courbes obtenues sont utilisées pour mettre en lumière les difficultés d’effectuer une transition entre un cycle de développement lent et rapide. Ensuite, deux expériences industrielles d’utilisation et de déploiement d’une technologie de détection de clonage sont présentés. Ces deux expériences concernent les langages C/C++, Java et TTCN-3. La grande différence de population de clones entre C/C++ et Java et TTCN-3 est présentée. Finalement, un résultat obtenu grâce au croisement d’une analyse de clones et d’une analyse de flux de sécurité met en lumière l’utilité des clones dans l’identification des failles de sécurité. Le travail se termine par une conclusion et quelques perspectives futures.----------ABSTRACT This thesis explores two topics in clone analysis: detection and application. The main contribution in clone detection is a new clone detector based on a library called mtreelib. This library is a package developed for clone detection that implements the metric data structure. This structure is used to build a clone detector that approximates the Levenshtein distance with high accuracy. A small benchmark is produced to assess the accuracy. Other results from these regarding metrics and incremental clone detection are also presented. Many applications of the clone detector are introduced. An original algorithm to reconstruct missing information in the structure of software repositories is described and tested with data sourced from large existing software. An insight into Firefox is exposed showing the quantity of change between versions and the link between different release cycle types and the number of bugs. Also, an analysis crossing the results from pattern traversal, flow analysis and clone detection is presented. Two industrial experiments using a different clone detector, CLAN, are also presented with some developers’ perspectives. One of the experiments is done on a language never explored in clone detection, TTCN-3, and the results show that the clone population in that language differs greatly from other well-known languages, like C/C++ and Java. The thesis concludes with a summary of the findings and some perspectives for future research

    Analysis of Human Affect and Bug Patterns to Improve Software Quality and Security

    Get PDF
    The impact of software is ever increasing as more and more systems are being software operated. Despite the usefulness of software, many instances software failures have been causing tremendous losses in lives and dollars. Software failures take place because of bugs (i.e., faults) in the software systems. These bugs cause the program to malfunction or crash and expose security vulnerabilities exploitable by malicious hackers. Studies confirm that software defects and vulnerabilities appear in source code largely due to the human mistakes and errors of the developers. Human performance is impacted by the underlying development process and human affects, such as sentiment and emotion. This thesis examines these human affects of software developers, which have drawn recent interests in the community. For capturing developers’ sentimental and emotional states, we have developed several software tools (i.e., SentiStrength-SE, DEVA, and MarValous). These are novel tools facilitating automatic detection of sentiments and emotions from the software engineering textual artifacts. Using such an automated tool, the developers’ sentimental variations are studied with respect to the underlying development tasks (e.g., bug-fixing, bug-introducing), development periods (i.e., days and times), team sizes and project sizes. We expose opportunities for exploiting developers’ sentiments for higher productivity and improved software quality. While developers’ sentiments and emotions can be leveraged for proactive and active safeguard in identifying and minimizing software bugs, this dissertation also includes in-depth studies of the relationship among various bug patterns, such as software defects, security vulnerabilities, and code smells to find actionable insights in minimizing software bugs and improving software quality and security. Bug patterns are exposed through mining software repositories and bug databases. These bug patterns are crucial in localizing bugs and security vulnerabilities in software codebase for fixing them, predicting portions of software susceptible to failure or exploitation by hackers, devising techniques for automated program repair, and avoiding code constructs and coding idioms that are bug-prone. The software tools produced from this thesis are empirically evaluated using standard measurement metrics (e.g., precision, recall). The findings of all the studies are validated with appropriate tests for statistical significance. Finally, based on our experience and in-depth analysis of the present state of the art, we expose avenues for further research and development towards a holistic approach for developing improved and secure software systems

    Analysis of VANET Standard IEEE 1609.4 Mac Layer Multi-Channel Operations Using OMNeT++ and Veins

    Get PDF
    VANETS is an ad hoc network in vehicles with wireless communication capability. The network utilizes a system to relay data from one vehicle to another vehicle or to a Road Side Unit (RSU). This communication is also known as Vehicle to Vehicle (V2V) [31] and Vehicle to Infrastructure (V2I) [31]. The communication protocol for Wireless Access in Vehicular Environment (WAVE) [10], is the industry standard IEEE 802.11p to communicate between vehicles. This thesis examines the Medium Access Control (MAC) layer of this IEEE 1609.4 multi-channel communication protocol. In Dedicated Short Range Communications, the core of the WAVE protocol, there is an allocated spectrum in the frequency area of 5.9-GHz [20]. In the U.S, the allocated spectrum of 75 MHz was split into seven channels. A channel is defined as a frequency range of 10 MHz for a radio to tune into [28]. There is a control channel to relay safety messages and six service channels to relay non-safety messages, giving us two types of channels to choose from when in message transmission. Both the type and priority of the message are the factors considered. Many existing studies illustrate the impact of multi- channel and single-channel switching for non-safety and safety message transmissions. Most studies focus on optimizing the usability of the service channels. This thesis aims to determine the best use of the single radio in a vehicle i.e. to best utilize the Control Channel (CCH) and Service Channels (SCHs) in a Single Radio Multi-Channel (SR-MC) system [20]. We analyze the channel utilization, beacon transmission, and packet transmission of IEEE 1609.4 multi-channel operations in CCH and SCH. Some of the parameters used for comparison are the number of collisions, channel utilization, packet transmissions, and beacon transmissions. We investigate the scenario with density of n vehicles in a real world map, using safety (beacons) and non-safety (data) messages. The technologies used are Instant Veins 4.6, OMNET++ 5.2.1, SUMO 0.30, Debian GNU/Linux 9 (stretch) 64-bit, VMware Fusion (Professional Version 10.1.4) and an open street map from Northampton. The advantage of using OMNeT++ and Simulation Urban Mobility (SUMO) framework is the thorough implementation of IEEE 1609.4 DSRC/ WAVE and IEEE 802.11p in the framework [29]. Additionally, important feature of realistic traffic along with factual map can be generated with SUMO [21]. The contributions provided in this thesis include the integration of the testing framework Catch, randomizing the SCH, adding beacon transmission to the MAC layer, tracking of vehicle neighbors, tracking of collisions, and channel utilization. Plus an analysis on multi-channel switching. In our results we found that the CCH is highly overloaded both with beacon and channel switching management, which has a strong impact on the switching operation with a high number of collisions. Furthermore we also found that as the number of beacons generated increased, there was an increase in lost frames independent of the channel . Lastly there was little fluctuation in the number of collisions with a higher “n” of vehicles

    Software Clone Management Towards Industrial Application (Dagstuhl Seminar 12071)

    No full text
    This report documents the program and the outcomes of Dagstuhl Seminar 12071 ``Software Clone Management Towards Industrial Application\u27\u27. Software clones are identical or similar pieces of code or design. A lot of research has been devoted to software clones. Unlike previous research, this seminar put a particular emphasis on industrial application of software clone management methods and tools and aimed at gathering concrete usage scenarios of clone management in industry, which will help to identify new industrially relevant aspects in order to shape the future research. Talks were presented by industrial participants and working groups were formed to discuss issues in clone detection, presentation, and refactoring. In addition we developed a unified conceptual model to capture clone information required to support a common notion of clone data and for interoperability to foster exchange of data among researchers and tools in practice. The main focus of current research is clones in source code -- therefore, we also looked into ways of extending our research to other types of software artifacts. Last but not least, we discussed how clone management activities may be integrated into the process of software development
    corecore