4,012 research outputs found

    SourcererCC: Scaling Code Clone Detection to Big Code

    Full text link
    Despite a decade of active research, there is a marked lack in clone detectors that scale to very large repositories of source code, in particular for detecting near-miss clones where significant editing activities may take place in the cloned code. We present SourcererCC, a token-based clone detector that targets three clone types, and exploits an index to achieve scalability to large inter-project repositories using a standard workstation. SourcererCC uses an optimized inverted-index to quickly query the potential clones of a given code block. Filtering heuristics based on token ordering are used to significantly reduce the size of the index, the number of code-block comparisons needed to detect the clones, as well as the number of required token-comparisons needed to judge a potential clone. We evaluate the scalability, execution time, recall and precision of SourcererCC, and compare it to four publicly available and state-of-the-art tools. To measure recall, we use two recent benchmarks, (1) a large benchmark of real clones, BigCloneBench, and (2) a Mutation/Injection-based framework of thousands of fine-grained artificial clones. We find SourcererCC has both high recall and precision, and is able to scale to a large inter-project repository (250MLOC) using a standard workstation.Comment: Accepted for publication at ICSE'16 (preprint, unrevised

    Faster Mutation Analysis via Equivalence Modulo States

    Full text link
    Mutation analysis has many applications, such as asserting the quality of test suites and localizing faults. One important bottleneck of mutation analysis is scalability. The latest work explores the possibility of reducing the redundant execution via split-stream execution. However, split-stream execution is only able to remove redundant execution before the first mutated statement. In this paper we try to also reduce some of the redundant execution after the execution of the first mutated statement. We observe that, although many mutated statements are not equivalent, the execution result of those mutated statements may still be equivalent to the result of the original statement. In other words, the statements are equivalent modulo the current state. In this paper we propose a fast mutation analysis approach, AccMut. AccMut automatically detects the equivalence modulo states among a statement and its mutations, then groups the statements into equivalence classes modulo states, and uses only one process to represent each class. In this way, we can significantly reduce the number of split processes. Our experiments show that our approach can further accelerate mutation analysis on top of split-stream execution with a speedup of 2.56x on average.Comment: Submitted to conferenc

    Learning mutational graphs of individual tumour evolution from single-cell and multi-region sequencing data

    Full text link
    Background. A large number of algorithms is being developed to reconstruct evolutionary models of individual tumours from genome sequencing data. Most methods can analyze multiple samples collected either through bulk multi-region sequencing experiments or the sequencing of individual cancer cells. However, rarely the same method can support both data types. Results. We introduce TRaIT, a computational framework to infer mutational graphs that model the accumulation of multiple types of somatic alterations driving tumour evolution. Compared to other tools, TRaIT supports multi-region and single-cell sequencing data within the same statistical framework, and delivers expressive models that capture many complex evolutionary phenomena. TRaIT improves accuracy, robustness to data-specific errors and computational complexity compared to competing methods. Conclusions. We show that the application of TRaIT to single-cell and multi-region cancer datasets can produce accurate and reliable models of single-tumour evolution, quantify the extent of intra-tumour heterogeneity and generate new testable experimental hypotheses

    A controlled migration genetic algorithm operator for hardware-in-the-loop experimentation

    Get PDF
    In this paper, we describe the development of an extended migration operator, which combats the negative effects of noise on the effective search capabilities of genetic algorithms. The research is motivated by the need to minimize the num- ber of evaluations during hardware-in-the-loop experimentation, which can carry a significant cost penalty in terms of time or financial expense. The authors build on previous research, where convergence for search methods such as Simulated Annealing and Variable Neighbourhood search was accelerated by the implementation of an adaptive decision support operator. This methodology was found to be effective in searching noisy data surfaces. Providing that noise is not too significant, Genetic Al- gorithms can prove even more effective guiding experimentation. It will be shown that with the introduction of a Controlled Migration operator into the GA heuristic, data, which repre- sents a significant signal-to-noise ratio, can be searched with significant beneficial effects on the efficiency of hardware-in-the- loop experimentation, without a priori parameter tuning. The method is tested on an engine-in-the-loop experimental example, and shown to bring significant performance benefits

    Parallel mutation testing for large scale systems

    Full text link
    Mutation testing is a valuable technique for measuring the quality of test suites in terms of detecting faults. However, one of its main drawbacks is its high computational cost. For this purpose, several approaches have been recently proposed to speed-up the mutation testing process by exploiting computational resources in distributed systems. However, bottlenecks have been detected when those techniques are applied in large-scale systems. This work improves the performance of mutation testing using large-scale systems by proposing a new load distribution algorithm, and parallelising different steps of the process. To demonstrate the benefits of our approach, we report on a thorough empirical evaluation, which analyses and compares our proposal with existing solutions executed in large-scale systems. The results show that our proposal outperforms the state-of-the-art distribution algorithms up to 35% in three different scenarios, reaching a reduction of the execution time of—at best—up to 99.66%This work was supported by the Spanish MINECO/FEDER project under Grants PID2021- 122270OB-I00, TED2021-129381B-C21 and PID2019-108528RBC22, the Comunidad de Madrid project FORTE-CM under Grant S2018/TCS-4314, Project S2018/TCS-4339 (BLOQUES-CM) cofunded by EIE Funds of the European Union and Comunidad de Madrid and the Project HPC-EUROPA3 (INFRAIA-2016-1-730897), with the support of the EC Research Innovation Action under the H2020 Programm

    The potential for liquid biopsies in the precision medical treatment of breast cancer.

    Get PDF
    Currently the clinical management of breast cancer relies on relatively few prognostic/predictive clinical markers (estrogen receptor, progesterone receptor, HER2), based on primary tumor biology. Circulating biomarkers, such as circulating tumor DNA (ctDNA) or circulating tumor cells (CTCs) may enhance our treatment options by focusing on the very cells that are the direct precursors of distant metastatic disease, and probably inherently different than the primary tumor's biology. To shift the current clinical paradigm, assessing tumor biology in real time by molecularly profiling CTCs or ctDNA may serve to discover therapeutic targets, detect minimal residual disease and predict response to treatment. This review serves to elucidate the detection, characterization, and clinical application of CTCs and ctDNA with the goal of precision treatment of breast cancer
    corecore