595 research outputs found

    SourcererCC: Scaling Code Clone Detection to Big Code

    Full text link
    Despite a decade of active research, there is a marked lack in clone detectors that scale to very large repositories of source code, in particular for detecting near-miss clones where significant editing activities may take place in the cloned code. We present SourcererCC, a token-based clone detector that targets three clone types, and exploits an index to achieve scalability to large inter-project repositories using a standard workstation. SourcererCC uses an optimized inverted-index to quickly query the potential clones of a given code block. Filtering heuristics based on token ordering are used to significantly reduce the size of the index, the number of code-block comparisons needed to detect the clones, as well as the number of required token-comparisons needed to judge a potential clone. We evaluate the scalability, execution time, recall and precision of SourcererCC, and compare it to four publicly available and state-of-the-art tools. To measure recall, we use two recent benchmarks, (1) a large benchmark of real clones, BigCloneBench, and (2) a Mutation/Injection-based framework of thousands of fine-grained artificial clones. We find SourcererCC has both high recall and precision, and is able to scale to a large inter-project repository (250MLOC) using a standard workstation.Comment: Accepted for publication at ICSE'16 (preprint, unrevised

    Towards Automating Precision Studies of Clone Detectors

    Full text link
    Current research in clone detection suffers from poor ecosystems for evaluating precision of clone detection tools. Corpora of labeled clones are scarce and incomplete, making evaluation labor intensive and idiosyncratic, and limiting inter tool comparison. Precision-assessment tools are simply lacking. We present a semi-automated approach to facilitate precision studies of clone detection tools. The approach merges automatic mechanisms of clone classification with manual validation of clone pairs. We demonstrate that the proposed automatic approach has a very high precision and it significantly reduces the number of clone pairs that need human validation during precision experiments. Moreover, we aggregate the individual effort of multiple teams into a single evolving dataset of labeled clone pairs, creating an important asset for software clone research.Comment: Accepted to be published in the 41st ACM/IEEE International Conference on Software Engineerin

    Raters’ reliability in clone benchmarks construction

    Get PDF
    International audienceCloned code often complicates code maintenance and evolution and must therefore be effectively detected. One of the biggest challenges for clone detectors is to reduce the amount of irrelevant clones they found, called false positives. Several benchmarks of true and false positive clones have been introduced, enabling tool developers to compare, assess and fine-tune their tools. Manual inspection of clone candidates is performed by raters that do not have expertise on the underlying code. This way of building benchmarks might be unreliable when considering context-dependent clones i.e., clones valid for a specific purpose. Our goal is to investigate the reliability of rater judgments about context-dependent clones. We randomly select about 600 clones from two projects and ask several raters, including experts of the projects, to manually classify these clones. We observe that judgments of non expert raters are not always repeatable. We also observe that they seldomly agree with each others and with the expert. Finally, we find that the project and the fact that a clone is a true or false positive might have an influence on the agreement between the expert and non experts. Therefore, using non experts to produce clone benchmarks could be unreliable

    An Empirical Assessment of Bellon's Clone Benchmark

    Get PDF
    Context: Clone benchmarks are essential to the assessment and improvement of clone detection tools and algorithms. Among existing benchmarks, Bellon’s benchmark is widely used by the research community. However, a serious threat to the validity of this benchmark is that reference clones it contains have been manually validated by Bellon alone. Other persons may disagree with Bellon’s judgment. Ob-jective: In this paper, we perform an empirical assessment of Bellon’s benchmark. Method: We seek the opinion of eighteen participants on a subset of Bellon’s benchmark to determine if researchers should trust the reference clones it contains. Results: Our experiment shows that a significant amount of the reference clones are debatable, and this phe-nomenon can introduce noise in results obtained using this benchmark

    Investigating the Generalizability of Deep Learning-based Clone Detectors

    Full text link
    The generalizability of Deep Learning (DL) models is a significant challenge, as poor generalizability indicates that the model has overfitted to the training data and is not able to generalize to new data. Despite numerous DL-based clone detectors emerging in recent years, their generalizability has not been thoroughly assessed. This study investigates the generalizability of three DL-based clone detectors (CCLearner, ASTNN, and CodeBERT) by comparing their detection accuracy on different training and testing clone benchmarks. The results show that all three clone detectors do not generalize well to new data and there is a strong relationship between clone types and generalizability for CCLearner and ASTNN.Choi E., Fuke N., Fujiwara Y., et al. Investigating the Generalizability of Deep Learning-based Clone Detectors. IEEE International Conference on Program Comprehension 2023-May, 181 (2023); https://doi.org/10.1109/ICPC58990.2023.00032

    Using a Nearest-Neighbour, BERT-Based Approach for Scalable Clone Detection

    Full text link
    Code clones can detrimentally impact software maintenance and manually detecting them in very large codebases is impractical. Additionally, automated approaches find detection of Type 3 and Type 4 (inexact) clones very challenging. While the most recent artificial deep neural networks (for example BERT-based artificial neural networks) seem to be highly effective in detecting such clones, their pairwise comparison of every code pair in the target system(s) is inefficient and scales poorly on large codebases. We therefore introduce SSCD, a BERT-based clone detection approach that targets high recall of Type 3 and Type 4 clones at scale (in line with our industrial partner's requirements). It does so by computing a representative embedding for each code fragment and finding similar fragments using a nearest neighbour search. SSCD thus avoids the pairwise-comparison bottleneck of other Neural Network approaches while also using parallel, GPU-accelerated search to tackle scalability. This paper details the approach and an empirical assessment towards configuring and evaluating that approach in industrial setting. The configuration analysis suggests that shorter input lengths and text-only based neural network models demonstrate better efficiency in SSCD, while only slightly decreasing effectiveness. The evaluation results suggest that SSCD is more effective than state-of-the-art approaches like SAGA and SourcererCC. It is also highly efficient: in its optimal setting, SSCD effectively locates clones in the entire 320 million LOC BigCloneBench (a standard clone detection benchmark) in just under three hours.Comment: 10 pages, 2 figures, 38th IEEE International Conference on Software Maintenance and Evolutio

    Exploring Automated Code Evaluation Systems and Resources for Code Analysis: A Comprehensive Survey

    Full text link
    The automated code evaluation system (AES) is mainly designed to reliably assess user-submitted code. Due to their extensive range of applications and the accumulation of valuable resources, AESs are becoming increasingly popular. Research on the application of AES and their real-world resource exploration for diverse coding tasks is still lacking. In this study, we conducted a comprehensive survey on AESs and their resources. This survey explores the application areas of AESs, available resources, and resource utilization for coding tasks. AESs are categorized into programming contests, programming learning and education, recruitment, online compilers, and additional modules, depending on their application. We explore the available datasets and other resources of these systems for research, analysis, and coding tasks. Moreover, we provide an overview of machine learning-driven coding tasks, such as bug detection, code review, comprehension, refactoring, search, representation, and repair. These tasks are performed using real-life datasets. In addition, we briefly discuss the Aizu Online Judge platform as a real example of an AES from the perspectives of system design (hardware and software), operation (competition and education), and research. This is due to the scalability of the AOJ platform (programming education, competitions, and practice), open internal features (hardware and software), attention from the research community, open source data (e.g., solution codes and submission documents), and transparency. We also analyze the overall performance of this system and the perceived challenges over the years

    Is GitHub's Copilot as Bad as Humans at Introducing Vulnerabilities in Code?

    Full text link
    Several advances in deep learning have been successfully applied to the software development process. Of recent interest is the use of neural language models to build tools, such as Copilot, that assist in writing code. In this paper we perform a comparative empirical analysis of Copilot-generated code from a security perspective. The aim of this study is to determine if Copilot is as bad as human developers - we investigate whether Copilot is just as likely to introduce the same software vulnerabilities that human developers did. Using a dataset of C/C++ vulnerabilities, we prompt Copilot to generate suggestions in scenarios that previously led to the introduction of vulnerabilities by human developers. The suggestions are inspected and categorized in a 2-stage process based on whether the original vulnerability or the fix is reintroduced. We find that Copilot replicates the original vulnerable code ~33% of the time while replicating the fixed code at a ~25% rate. However this behavior is not consistent: Copilot is more susceptible to introducing some types of vulnerability than others and is more likely to generate vulnerable code in response to prompts that correspond to older vulnerabilities than newer ones. Overall, given that in a substantial proportion of instances Copilot did not generate code with the same vulnerabilities that human developers had introduced previously, we conclude that Copilot is not as bad as human developers at introducing vulnerabilities in code
    • …
    corecore