96,685 research outputs found
Amortising the Cost of Mutation Based Fault Localisation using Statistical Inference
Mutation analysis can effectively capture the dependency between source code
and test results. This has been exploited by Mutation Based Fault Localisation
(MBFL) techniques. However, MBFL techniques suffer from the need to expend the
high cost of mutation analysis after the observation of failures, which may
present a challenge for its practical adoption. We introduce SIMFL (Statistical
Inference for Mutation-based Fault Localisation), an MBFL technique that allows
users to perform the mutation analysis in advance against an earlier version of
the system. SIMFL uses mutants as artificial faults and aims to learn the
failure patterns among test cases against different locations of mutations.
Once a failure is observed, SIMFL requires either almost no or very small
additional cost for analysis, depending on the used inference model. An
empirical evaluation of SIMFL using 355 faults in Defects4J shows that SIMFL
can successfully localise up to 103 faults at the top, and 152 faults within
the top five, on par with state-of-the-art alternatives. The cost of mutation
analysis can be further reduced by mutation sampling: SIMFL retains over 80% of
its localisation accuracy at the top rank when using only 10% of generated
mutants, compared to results obtained without sampling
Recommended from our members
Learning short multivariate time series models through evolutionary and sparse matrix computation
Multivariate time series (MTS) data are widely available in different fields including medicine, finance, bioinformatics, science and engineering. Modelling MTS data accurately is important for many decision making activities. One area that has been largely overlooked so far is the particular type of time series where the data set consists of a large number of variables but with a small number of observations. In this paper we describe the development of a novel computational method based on Natural Computation and sparse matrices that bypasses the size restrictions of traditional statistical MTS methods, makes no distribution assumptions, and also locates the associated parameters. Extensive results are presented, where the proposed method is compared with both traditional statistical and heuristic search techniques and evaluated on a number of criteria. The results have implications for a wide range of applications involving the learning of short MTS models
The Optimisation of Stochastic Grammars to Enable Cost-Effective Probabilistic Structural Testing
The effectiveness of probabilistic structural testing depends on the characteristics of the probability distribution from which test inputs are sampled at random. Metaheuristic search has been shown to be a practical method of optimis- ing the characteristics of such distributions. However, the applicability of the existing search-based algorithm is lim- ited by the requirement that the software’s inputs must be a fixed number of numeric values. In this paper we relax this limitation by means of a new representation for the probability distribution. The repre- sentation is based on stochastic context-free grammars but incorporates two novel extensions: conditional production weights and the aggregation of terminal symbols represent- ing numeric values. We demonstrate that an algorithm which combines the new representation with hill-climbing search is able to effi- ciently derive probability distributions suitable for testing software with structurally-complex input domains
Regulatory motif discovery using a population clustering evolutionary algorithm
This paper describes a novel evolutionary algorithm for regulatory motif discovery in DNA promoter sequences. The algorithm uses data clustering to logically distribute the evolving population across the search space. Mating then takes place within local regions of the population, promoting overall solution diversity and encouraging discovery of multiple solutions. Experiments using synthetic data sets have demonstrated the algorithm's capacity to find position frequency matrix models of known regulatory motifs in relatively long promoter sequences. These experiments have also shown the algorithm's ability to maintain diversity during search and discover multiple motifs within a single population. The utility of the algorithm for discovering motifs in real biological data is demonstrated by its ability to find meaningful motifs within muscle-specific regulatory sequences
Predicting glaucomatous visual field deterioration through short multivariate time series modelling
In bio-medical domains there are many
applications involving the modelling of
multivariate time series (MTS) data. One area
that has been largely overlooked so far is the
particular type of time series where the data set
consists of a large number of variables but with
a small number of observations. In this paper we
describe the development of a novel computational
method based on genetic algorithms that bypasses
the size restrictions of traditional statistical
MTS methods, makes no distribution assumptions,
and also locates the order and associated
parameters as a whole step. We apply this method to the prediction and modelling of glaucomatous
visual field deterioration
A comparison of transgenic rodent mutation and in vivo comet assay responses for 91 chemicals.
A database of 91 chemicals with published data from both transgenic rodent mutation (TGR) and rodent comet assays has been compiled. The objective was to compare the sensitivity of the two assays for detecting genotoxicity. Critical aspects of study design and results were tabulated for each dataset. There were fewer datasets from rats than mice, particularly for the TGR assay, and therefore, results from both species were combined for further analysis. TGR and comet responses were compared in liver and bone marrow (the most commonly studied tissues), and in stomach and colon evaluated either separately or in combination with other GI tract segments. Overall positive, negative, or equivocal test results were assessed for each chemical across the tissues examined in the TGR and comet assays using two approaches: 1) overall calls based on weight of evidence (WoE) and expert judgement, and 2) curation of the data based on a priori acceptability criteria prior to deriving final tissue specific calls. Since the database contains a high prevalence of positive results, overall agreement between the assays was determined using statistics adjusted for prevalence (using AC1 and PABAK). These coefficients showed fair or moderate to good agreement for liver and the GI tract (predominantly stomach and colon data) using WoE, reduced agreement for stomach and colon evaluated separately using data curation, and poor or no agreement for bone marrow using both the WoE and data curation approaches. Confidence in these results is higher for liver than for the other tissues, for which there were less data. Our analysis finds that comet and TGR generally identify the same compounds (mainly potent mutagens) as genotoxic in liver, stomach and colon, but not in bone marrow. However, the current database content precluded drawing assay concordance conclusions for weak mutagens and non-DNA reactive chemicals
- …