3,192 research outputs found

    The Survey of the Code Clone Detection Techniques and Process with Types (I, II, III and IV)

    Get PDF
    In software upgradation code clones are regularly utilized. So, we can contemplate on code location strategies goes past introductory code. In condition of-craftsmanship on clone programming study, we perceived the absence of methodical overview. We clarified the earlier research-in view of deliberate and broad database find and the hole of research for additionally think about. Software support cost is more than outlining cost. Code cloning is useful in several areas like detecting library contents, understanding program, detecting malicious program, etc. and apart from pros several serious impact of code cloning on quality, reusability and continuity of software framework. In this paper, we have discussed the code clone and its evolution and classification of code clone. Code clone is classified into 4 types namely Type I, Type II, III and IV. The exact code as well as copied code is depicted in detail for each type of code clone. Several clone detection techniques such as: Text, token, metric, hybrid based techniques were studied comparatively. Comparison of detection tools such as: clone DR, covet, Duploc, CLAN, etc. based on different techniques used are highlighted and cloning process is also explained. Code clones are identical segment of source code which might be inserted intentionally or unintentionally. Reusing code snippets via copying and pasting with or without minor alterations is general task in software development. But the existence of code clones may reduce the design structure and quality of software like changeability, readability and maintainability and hence increase the continuation charges

    A systematic literature review on source code similarity measurement and clone detection: techniques, applications, and challenges

    Full text link
    Measuring and evaluating source code similarity is a fundamental software engineering activity that embraces a broad range of applications, including but not limited to code recommendation, duplicate code, plagiarism, malware, and smell detection. This paper proposes a systematic literature review and meta-analysis on code similarity measurement and evaluation techniques to shed light on the existing approaches and their characteristics in different applications. We initially found over 10000 articles by querying four digital libraries and ended up with 136 primary studies in the field. The studies were classified according to their methodology, programming languages, datasets, tools, and applications. A deep investigation reveals 80 software tools, working with eight different techniques on five application domains. Nearly 49% of the tools work on Java programs and 37% support C and C++, while there is no support for many programming languages. A noteworthy point was the existence of 12 datasets related to source code similarity measurement and duplicate codes, of which only eight datasets were publicly accessible. The lack of reliable datasets, empirical evaluations, hybrid methods, and focuses on multi-paradigm languages are the main challenges in the field. Emerging applications of code similarity measurement concentrate on the development phase in addition to the maintenance.Comment: 49 pages, 10 figures, 6 table

    Code Smell Detection Techniques and Process: A Review

    Get PDF
    A code smell is a hint that something has turned out badly some place in your code. The idea of code smells was introduced to characterize various different types of design shortcomings in code. Code and design smells are poor solutions to recurring implementation and design problems. They may hinder the evolution of a system by making it hard for software engineers to carry out changes. In this paper, we reviewed code smell detection tool like: D�cor, InFusion, JDeodorant, PMD, Stench Blossom, etc. Furthermore, we discussed various code smells detecting techniques. Code clones are indistinguishable fragment of source code which may be embedded deliberately or inadvertently. Reusing code pieces through reordering with or without minor adjustments is general undertaking in programming advancement. We�ve examined several papers to explore various tools and techniques used for code smell. In addition, we reviewed the process of code smell detection

    Management Aspects of Software Clone Detection and Analysis

    Get PDF
    Copying a code fragment and reusing it by pasting with or without minor modifications is a common practice in software development for improved productivity. As a result, software systems often have similar segments of code, called software clones or code clones. Due to many reasons, unintentional clones may also appear in the source code without awareness of the developer. Studies report that significant fractions (5% to 50%) of the code in typical software systems are cloned. Although code cloning may increase initial productivity, it may cause fault propagation, inflate the code base and increase maintenance overhead. Thus, it is believed that code clones should be identified and carefully managed. This Ph.D. thesis contributes in clone management with techniques realized into tools and large-scale in-depth analyses of clones to inform clone management in devising effective techniques and strategies. To support proactive clone management, we have developed a clone detector as a plug-in to the Eclipse IDE. For clone detection, we used a hybrid approach that combines the strength of both parser-based and text-based techniques. To capture clones that are similar but not exact duplicates, we adopted a novel approach that applies a suffix-tree-based k-difference hybrid algorithm, borrowed from the area of computational biology. Instead of targeting all clones from the entire code base, our tool aids clone-aware development by allowing focused search for clones of any code fragment of the developer's interest. A good understanding on the code cloning phenomenon is a prerequisite to devise efficient clone management strategies. The second phase of the thesis includes large-scale empirical studies on the characteristics (e.g., proportion, types of similarity, change patterns) of code clones in evolving software systems. Applying statistical techniques, we also made fairly accurate forecast on the proportion of code clones in the future versions of software projects. The outcome of these studies expose useful insights into the characteristics of evolving clones and their management implications. Upon identification of the code clones, their management often necessitates careful refactoring, which is dealt with at the third phase of the thesis. Given a large number of clones, it is difficult to optimally decide what to refactor and what not, especially when there are dependencies among clones and the objective remains the minimization of refactoring efforts and risks while maximizing benefits. In this regard, we developed a novel clone refactoring scheduler that applies a constraint programming approach. We also introduced a novel effort model for the estimation of efforts needed to refactor clones in source code. We evaluated our clone detector, scheduler and effort model through comparative empirical studies and user studies. Finally, based on our experience and in-depth analysis of the present state of the art, we expose avenues for further research and development towards a versatile clone management system that we envision

    PDF-Malware Detection: A Survey and Taxonomy of Current Techniques

    Get PDF
    Portable Document Format, more commonly known as PDF, has become, in the last 20 years, a standard for document exchange and dissemination due its portable nature and widespread adoption. The flexibility and power of this format are not only leveraged by benign users, but from hackers as well who have been working to exploit various types of vulnerabilities, overcome security restrictions, and then transform the PDF format in one among the leading malicious code spread vectors. Analyzing the content of malicious PDF files to extract the main features that characterize the malware identity and behavior, is a fundamental task for modern threat intelligence platforms that need to learn how to automatically identify new attacks. This paper surveys existing state of the art about systems for the detection of malicious PDF files and organizes them in a taxonomy that separately considers the used approaches and the data analyzed to detect the presence of malicious code. © Springer International Publishing AG, part of Springer Nature 2018

    Seamless Variability Management With the Virtual Platform

    Get PDF
    Customization is a general trend in software engineering, demanding systems that support variable stakeholder requirements. Two opposing strategies are commonly used to create variants: software clone & own and software configuration with an integrated platform. Organizations often start with the former, which is cheap, agile, and supports quick innovation, but does not scale. The latter scales by establishing an integrated platform that shares software assets between variants, but requires high up-front investments or risky migration processes. So, could we have a method that allows an easy transition or even combine the benefits of both strategies? We propose a method and tool that supports a truly incremental development of variant-rich systems, exploiting a spectrum between both opposing strategies. We design, formalize, and prototype the variability-management framework virtual platform. It bridges clone & own and platform-oriented development. Relying on programming-language-independent conceptual structures representing software assets, it offers operators for engineering and evolving a system, comprising: traditional, asset-oriented operators and novel, feature-oriented operators for incrementally adopting concepts of an integrated platform. The operators record meta-data that is exploited by other operators to support the transition. Among others, they eliminate expensive feature-location effort or the need to trace clones. Our evaluation simulates the evolution of a real-world, clone-based system, measuring its costs and benefits.Comment: 13 pages, 10 figures; accepted for publication at the 43rd International Conference on Software Engineering (ICSE 2021), main technical trac

    Change Impact Analysis of Code Clones

    Get PDF
    Copying a code fragment and reusing it with or without modifications is known to be a frequent activity in software development. This results in exact or closely similar copies of code fragments, known as code clones, to exist in the software systems. Developers leverage the code reuse opportunity by code cloning for increased productivity. However, different studies on code clones report important concerns regarding the impacts of clones on software maintenance. One of the key concerns is to maintain consistent evolution of the clone fragments as inconsistent changes to clones may introduce bugs. Challenges to the consistent evolution of clones involve the identification of all related clone fragments for change propagation when a cloned fragment is changed. The task of identifying the ripple effects (i.e., all the related components to change) is known as Change Impact Analysis (CIA). In this thesis, we evaluate the impacts of clones on software systems from new perspectives and then we propose an evolutionary coupling based technique for change impact analysis of clones. First, we empirically evaluate the comparative stability of cloned and non-cloned code using fine-grained syntactic change types. Second, we assess the impacts of clones from the perspective of coupling at the domain level. Third, we carry out a comprehensive analysis of the comparative stability of cloned and non-cloned code within a uniform framework. We compare stability metrics with the results from the original experimental settings with respect to the clone detection tools and the subject systems. Fourth, we investigate the relationships between stability and bug-proneness of clones to assess whether and how stability contribute to the bug-proneness of different types of clones. Next, in the fifth study, we analyzed the impacts of co-change coupling on the bug-proneness of different types of clones. After a comprehensive evaluation of the impacts of clones on software systems, we propose an evolutionary coupling based CIA approach to support the consistent evolution of clones. In the sixth study, we propose a solution to minimize the effects of atypical commits (extra large commits) on the accuracy of the detection of evolutionary coupling. We propose a clustering-based technique to split atypical commits into pseudo-commits of related entities. This considerably reduces the number of incorrect couplings introduced by the atypical commits. Finally, in the seventh study, we propose an evolutionary coupling based change impact analysis approach for clones. In addition to handling the atypical commits, we use the history of fine-grained syntactic changes extracted from the software repositories to detect typed evolutionary coupling of clones. Conventional approaches consider only the frequency of co-change of the entities to detect evolutionary coupling. We consider both change frequencies and the fine-grained change types in the detection of evolutionary coupling. Findings from our studies give important insights regarding the impacts of clones and our proposed typed evolutionary coupling based CIA approach has the potential to support the consistent evolution of clones for better clone management

    Dealing with clones in software : a practical approach from detection towards management

    Get PDF
    Despite the fact that duplicated fragments of code also called code clones are considered one of the prominent code smells that may exist in software, cloning is widely practiced in industrial development. The larger the system, the more people involved in its development and the more parts developed by different teams result in an increased possibility of having cloned code in the system. While there are particular benefits of code cloning in software development, research shows that it might be a source of various troubles in evolving software. Therefore, investigating and understanding clones in a software system is important to manage the clones efficiently. However, when the system is fairly large, it is challenging to identify and manage those clones properly. Among the various types of clones that may exist in software, research shows detection of near-miss clones where there might be minor to significant differences (e.g., renaming of identifiers and additions/deletions/modifications of statements) among the cloned fragments is costly in terms of time and memory. Thus, there is a great demand of state-of-the-art technologies in dealing with clones in software. Over the years, several tools have been developed to detect and visualize exact and similar clones. However, usually the tools are standalone and do not integrate well with a software developer's workflow. In this thesis, first, a study is presented on the effectiveness of a fingerprint based data similarity measurement technique named 'simhash' in detecting clones in large scale code-base. Based on the positive outcome of the study, a time efficient detection approach is proposed to find exact and near-miss clones in software, especially in large scale software systems. The novel detection approach has been made available as a highly configurable and fully fledged standalone clone detection tool named 'SimCad', which can be configured for detection of clones in both source code and non-source code based data. Second, we show a robust use of the clone detection approach studied earlier by assembling its detection service as a portable library named 'SimLib'. This library can provide tightly coupled (integrated) clone detection functionality to other applications as opposed to loosely coupled service provided by a typical standalone tool. Because of being highly configurable and easily extensible, this library allows the user to customize its clone detection process for detecting clones in data having diverse characteristics. We performed a user study to get some feedback on installation and use of the 'SimLib' API (Application Programming Interface) and to uncover its potential use as a third-party clone detection library. Third, we investigated on what tools and techniques are currently in use to detect and manage clones and understand their evolution. The goal was to find how those tools and techniques can be made available to a developer's own software development platform for convenient identification, tracking and management of clones in the software. Based on that, we developed a clone-aware software development platform named 'SimEclipse' to promote the practical use of code clone research and to provide better support for clone management in software. Finally, we evaluated 'SimEclipse' by conducting a user study on its effectiveness, usability and information management. We believe that both researchers and developers would enjoy and utilize the benefit of using these tools in different aspect of code clone research and manage cloned code in software systems
    • …
    corecore