Search CORE

4,012 research outputs found

SourcererCC: Scaling Code Clone Detection to Big Code

Author: Lopes Cristina V.
Roy Chanchal K.
Saini Vaibhav
Sajnani Hitesh
Svajlenko Jeffrey
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 20/12/2015
Field of study

Despite a decade of active research, there is a marked lack in clone detectors that scale to very large repositories of source code, in particular for detecting near-miss clones where significant editing activities may take place in the cloned code. We present SourcererCC, a token-based clone detector that targets three clone types, and exploits an index to achieve scalability to large inter-project repositories using a standard workstation. SourcererCC uses an optimized inverted-index to quickly query the potential clones of a given code block. Filtering heuristics based on token ordering are used to significantly reduce the size of the index, the number of code-block comparisons needed to detect the clones, as well as the number of required token-comparisons needed to judge a potential clone. We evaluate the scalability, execution time, recall and precision of SourcererCC, and compare it to four publicly available and state-of-the-art tools. To measure recall, we use two recent benchmarks, (1) a large benchmark of real clones, BigCloneBench, and (2) a Mutation/Injection-based framework of thousands of fine-grained artificial clones. We find SourcererCC has both high recall and precision, and is able to scale to a large inter-project repository (250MLOC) using a standard workstation.Comment: Accepted for publication at ICSE'16 (preprint, unrevised

arXiv.org e-Print Archive

Crossref

Faster Mutation Analysis via Equivalence Modulo States

Author: Hao Dan
Shi Yangqingwei
Wang Bo
Xiong Yingfei
Zhang Lu
Publication venue
Publication date: 22/02/2017
Field of study

Mutation analysis has many applications, such as asserting the quality of test suites and localizing faults. One important bottleneck of mutation analysis is scalability. The latest work explores the possibility of reducing the redundant execution via split-stream execution. However, split-stream execution is only able to remove redundant execution before the first mutated statement. In this paper we try to also reduce some of the redundant execution after the execution of the first mutated statement. We observe that, although many mutated statements are not equivalent, the execution result of those mutated statements may still be equivalent to the result of the original statement. In other words, the statements are equivalent modulo the current state. In this paper we propose a fast mutation analysis approach, AccMut. AccMut automatically detects the equivalence modulo states among a statement and its mutations, then groups the statements into equivalence classes modulo states, and uses only one process to represent each class. In this way, we can significantly reduce the number of split processes. Our experiments show that our approach can further accelerate mutation analysis on top of split-stream execution with a speedup of 2.56x on average.Comment: Submitted to conferenc

arXiv.org e-Print Archive

Crossref

Learning mutational graphs of individual tumour evolution from single-cell and multi-region sequencing data

Author: Alessandro Tanca (494538)
Antonio Palomba (494539)
Cristina Fraumene (374294)
Edoardo Fiorillo (518797)
Francesco Cucca (145742)
Marcello Abbondio (3706183)
Sergio Uzzau (186221)
Valeria Manghina (3498188)
Publication venue
Publication date: 22/03/2019
Field of study

Background. A large number of algorithms is being developed to reconstruct evolutionary models of individual tumours from genome sequencing data. Most methods can analyze multiple samples collected either through bulk multi-region sequencing experiments or the sequencing of individual cancer cells. However, rarely the same method can support both data types. Results. We introduce TRaIT, a computational framework to infer mutational graphs that model the accumulation of multiple types of somatic alterations driving tumour evolution. Compared to other tools, TRaIT supports multi-region and single-cell sequencing data within the same statistical framework, and delivers expressive models that capture many complex evolutionary phenomena. TRaIT improves accuracy, robustness to data-specific errors and computational complexity compared to competing methods. Conclusions. We show that the application of TRaIT to single-cell and multi-region cancer datasets can produce accurate and reliable models of single-tumour evolution, quantify the extent of intra-tumour heterogeneity and generate new testable experimental hypotheses

arXiv.org e-Print Archive

FigShare

A controlled migration genetic algorithm operator for hardware-in-the-loop experimentation

Author: Alander
Back
D. Gladwin
DeJong
DeJong
Fogarty
Fogel
Garcia-Martinez
Goldberg
Goldberg
Goldberg
Gouvenc
Isermann
J. Stewart
Keane
Mann
Michalewicz
Nakama
Nsakandaa
Oh
P. Stewart
Petter
Po
Power
Schaffer
Smith
Spears
Stewart
Thierens
Publication venue: 'Elsevier BV'
Publication date: 01/01/2011
Field of study

In this paper, we describe the development of an extended migration operator, which combats the negative effects of noise on the effective search capabilities of genetic algorithms. The research is motivated by the need to minimize the num- ber of evaluations during hardware-in-the-loop experimentation, which can carry a significant cost penalty in terms of time or financial expense. The authors build on previous research, where convergence for search methods such as Simulated Annealing and Variable Neighbourhood search was accelerated by the implementation of an adaptive decision support operator. This methodology was found to be effective in searching noisy data surfaces. Providing that noise is not too significant, Genetic Al- gorithms can prove even more effective guiding experimentation. It will be shown that with the introduction of a Controlled Migration operator into the GA heuristic, data, which repre- sents a significant signal-to-noise ratio, can be searched with significant beneficial effects on the efficiency of hardware-in-the- loop experimentation, without a priori parameter tuning. The method is tested on an engine-in-the-loop experimental example, and shown to bring significant performance benefits

University of Lincoln Institutional Repository

Crossref

White Rose Research Online

UDORA - University of Derby Online Research Archive

Parallel mutation testing for large scale systems

Author: Cañizares Pablo C.
de Lara Juan
Filgueira Rosa
Núñez Alberto
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 20/06/2023
Field of study

Mutation testing is a valuable technique for measuring the quality of test suites in terms of detecting faults. However, one of its main drawbacks is its high computational cost. For this purpose, several approaches have been recently proposed to speed-up the mutation testing process by exploiting computational resources in distributed systems. However, bottlenecks have been detected when those techniques are applied in large-scale systems. This work improves the performance of mutation testing using large-scale systems by proposing a new load distribution algorithm, and parallelising different steps of the process. To demonstrate the benefits of our approach, we report on a thorough empirical evaluation, which analyses and compares our proposal with existing solutions executed in large-scale systems. The results show that our proposal outperforms the state-of-the-art distribution algorithms up to 35% in three different scenarios, reaching a reduction of the execution time of—at best—up to 99.66%This work was supported by the Spanish MINECO/FEDER project under Grants PID2021- 122270OB-I00, TED2021-129381B-C21 and PID2019-108528RBC22, the Comunidad de Madrid project FORTE-CM under Grant S2018/TCS-4314, Project S2018/TCS-4339 (BLOQUES-CM) cofunded by EIE Funds of the European Union and Comunidad de Madrid and the Project HPC-EUROPA3 (INFRAIA-2016-1-730897), with the support of the EC Research Innovation Action under the H2020 Programm

University of St. Andrews - Pure

Biblos-e Archivo

St Andrews Research Repository

The potential for liquid biopsies in the precision medical treatment of breast cancer.

Author: Barrak Dany K
Elhodaky Mostafa
Forte Victoria A
Lang Julie E
Snow Anson
Tung Lily
Publication venue: eScholarship, University of California
Publication date: 01/03/2016
Field of study

Currently the clinical management of breast cancer relies on relatively few prognostic/predictive clinical markers (estrogen receptor, progesterone receptor, HER2), based on primary tumor biology. Circulating biomarkers, such as circulating tumor DNA (ctDNA) or circulating tumor cells (CTCs) may enhance our treatment options by focusing on the very cells that are the direct precursors of distant metastatic disease, and probably inherently different than the primary tumor's biology. To shift the current clinical paradigm, assessing tumor biology in real time by molecularly profiling CTCs or ctDNA may serve to discover therapeutic targets, detect minimal residual disease and predict response to treatment. This review serves to elucidate the detection, characterization, and clinical application of CTCs and ctDNA with the goal of precision treatment of breast cancer

PubMed Central

eScholarship - University of California