8,780 research outputs found

    Amortising the Cost of Mutation Based Fault Localisation using Statistical Inference

    Full text link
    Mutation analysis can effectively capture the dependency between source code and test results. This has been exploited by Mutation Based Fault Localisation (MBFL) techniques. However, MBFL techniques suffer from the need to expend the high cost of mutation analysis after the observation of failures, which may present a challenge for its practical adoption. We introduce SIMFL (Statistical Inference for Mutation-based Fault Localisation), an MBFL technique that allows users to perform the mutation analysis in advance against an earlier version of the system. SIMFL uses mutants as artificial faults and aims to learn the failure patterns among test cases against different locations of mutations. Once a failure is observed, SIMFL requires either almost no or very small additional cost for analysis, depending on the used inference model. An empirical evaluation of SIMFL using 355 faults in Defects4J shows that SIMFL can successfully localise up to 103 faults at the top, and 152 faults within the top five, on par with state-of-the-art alternatives. The cost of mutation analysis can be further reduced by mutation sampling: SIMFL retains over 80% of its localisation accuracy at the top rank when using only 10% of generated mutants, compared to results obtained without sampling

    Guiding Deep Learning System Testing using Surprise Adequacy

    Full text link
    Deep Learning (DL) systems are rapidly being adopted in safety and security critical domains, urgently calling for ways to test their correctness and robustness. Testing of DL systems has traditionally relied on manual collection and labelling of data. Recently, a number of coverage criteria based on neuron activation values have been proposed. These criteria essentially count the number of neurons whose activation during the execution of a DL system satisfied certain properties, such as being above predefined thresholds. However, existing coverage criteria are not sufficiently fine grained to capture subtle behaviours exhibited by DL systems. Moreover, evaluations have focused on showing correlation between adversarial examples and proposed criteria rather than evaluating and guiding their use for actual testing of DL systems. We propose a novel test adequacy criterion for testing of DL systems, called Surprise Adequacy for Deep Learning Systems (SADL), which is based on the behaviour of DL systems with respect to their training data. We measure the surprise of an input as the difference in DL system's behaviour between the input and the training data (i.e., what was learnt during training), and subsequently develop this as an adequacy criterion: a good test input should be sufficiently but not overtly surprising compared to training data. Empirical evaluation using a range of DL systems from simple image classifiers to autonomous driving car platforms shows that systematic sampling of inputs based on their surprise can improve classification accuracy of DL systems against adversarial examples by up to 77.5% via retraining

    Mining Fix Patterns for FindBugs Violations

    Get PDF
    In this paper, we first collect and track a large number of fixed and unfixed violations across revisions of software. The empirical analyses reveal that there are discrepancies in the distributions of violations that are detected and those that are fixed, in terms of occurrences, spread and categories, which can provide insights into prioritizing violations. To automatically identify patterns in violations and their fixes, we propose an approach that utilizes convolutional neural networks to learn features and clustering to regroup similar instances. We then evaluate the usefulness of the identified fix patterns by applying them to unfixed violations. The results show that developers will accept and merge a majority (69/116) of fixes generated from the inferred fix patterns. It is also noteworthy that the yielded patterns are applicable to four real bugs in the Defects4J major benchmark for software testing and automated repair.Comment: Accepted for IEEE Transactions on Software Engineerin

    Learning Test-Mutant Relationship for Accurate Fault Localisation

    Full text link
    Context: Automated fault localisation aims to assist developers in the task of identifying the root cause of the fault by narrowing down the space of likely fault locations. Simulating variants of the faulty program called mutants, several Mutation Based Fault Localisation (MBFL) techniques have been proposed to automatically locate faults. Despite their success, existing MBFL techniques suffer from the cost of performing mutation analysis after the fault is observed. Method: To overcome this shortcoming, we propose a new MBFL technique named SIMFL (Statistical Inference for Mutation-based Fault Localisation). SIMFL localises faults based on the past results of mutation analysis that has been done on the earlier version in the project history, allowing developers to make predictions on the location of incoming faults in a just-in-time manner. Using several statistical inference methods, SIMFL models the relationship between test results of the mutants and their locations, and subsequently infers the location of the current faults. Results: The empirical study on Defects4J dataset shows that SIMFL can localise 113 faults on the first rank out of 224 faults, outperforming other MBFL techniques. Even when SIMFL is trained on the predicted kill matrix, SIMFL can still localise 95 faults on the first rank out of 194 faults. Moreover, removing redundant mutants significantly improves the localisation accuracy of SIMFL by the number of faults localised at the first rank up to 51. Conclusion: This paper proposes a new MBFL technique called SIMFL, which exploits ahead-of-time mutation analysis to localise current faults. SIMFL is not only cost-effective, as it does not need a mutation analysis after the fault is observed, but also capable of localising faults accurately.Comment: Paper accepted for publication at IST. arXiv admin note: substantial text overlap with arXiv:1902.0972

    Fonte: Finding Bug Inducing Commits from Failures

    Full text link
    A Bug Inducing Commit (BIC) is a commit that introduces a software bug into the codebase. Knowing the relevant BIC for a given bug can provide valuable information for debugging as well as bug triaging. However, existing BIC identification techniques are either too expensive (because they require the failing tests to be executed against previous versions for bisection) or inapplicable at the debugging time (because they require post hoc artefacts such as bug reports or bug fixes). We propose Fonte, an efficient and accurate BIC identification technique that only requires test coverage. Fonte combines Fault Localisation (FL) with BIC identification and ranks commits based on the suspiciousness of the code elements that they modified. Fonte reduces the search space of BICs using failure coverage as well as a filter that detects commits that are merely style changes. Our empirical evaluation using 130 real-world BICs shows that Fonte significantly outperforms state-of-the-art BIC identification techniques based on Information Retrieval as well as neural code embedding models, achieving at least 39% higher MRR. We also report that the ranking scores produced by Fonte can be used to perform weighted bisection, further reducing the cost of BIC identification. Finally, we apply Fonte to a large-scale industry project with over 10M lines of code, and show that it can rank the actual BIC within the top five commits for 87% of the studied real batch-testing failures, and save the BIC inspection cost by 32% on average.Comment: accepted to ICSE'23 (not the final version

    Reducing DNN labelling cost using surprise adequacy: An industrial case study for autonomous driving

    Get PDF
    Deep Neural Networks (DNNs) are rapidly being adopted by the automotive industry, due to their impressive performance in tasks that are essential for autonomous driving. Object segmentation is one such task: its aim is to precisely locate boundaries of objects and classify the identified objects, helping autonomous cars to recognise the road environment and the traffic situation. Not only is this task safety critical, but developing a DNN based object segmentation module presents a set of challenges that are significantly different from traditional development of safety critical software. The development process in use consists of multiple iterations of data collection, labelling, training, and evaluation. Among these stages, training and evaluation are computation intensive while data collection and labelling are manual labour intensive. This paper shows how development of DNN based object segmentation can be improved by exploiting the correlation between Surprise Adequacy (SA) and model performance. The correlation allows us to predict model performance for inputs without manually labelling them. This, in turn, enables understanding of model performance, more guided data collection, and informed decisions about further training. In our industrial case study the technique allows cost savings of up to 50% with negligible evaluation inaccuracy. Furthermore, engineers can trade off cost savings versus the tolerable level of inaccuracy depending on different development phases and scenarios

    Reducing DNN Labelling Cost using Surprise Adequacy: An Industrial Case Study for Autonomous Driving

    Get PDF
    Deep Neural Networks (DNNs) are rapidly being adopted by the automotive industry, due to their impressive performance in tasks that are essential for autonomous driving. Object segmentation is one such task: its aim is to precisely locate boundaries of objects and classify the identified objects, helping autonomous cars to recognise the road environment and the traffic situation. Not only is this task safety critical, but developing a DNN based object segmentation module presents a set of challenges that are significantly different from traditional development of safety critical software. The development process in use consists of multiple iterations of data collection, labelling, training, and evaluation. Among these stages, training and evaluation are computation intensive while data collection and labelling are manual labour intensive. This paper shows how development of DNN based object segmentation can be improved by exploiting the correlation between Surprise Adequacy (SA) and model performance. The correlation allows us to predict model performance for inputs without manually labelling them. This, in turn, enables understanding of model performance, more guided data collection, and informed decisions about further training. In our industrial case study the technique allows cost savings of up to 50% with negligible evaluation inaccuracy. Furthermore, engineers can trade off cost savings versus the tolerable level of inaccuracy depending on different development phases and scenarios.Comment: to be published in Proceedings of the 28th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineerin

    Temperature change in pig rib bone during implant site preparation by low-speed drilling

    Get PDF
    OBJECTIVES: The purpose of this study was to evaluate the temperature change during low-speed drilling using infrared thermography. MATERIAL AND METHODS: Pig ribs were used to provide cortical bone of a similar quality to human mandible. Heat production by three implant drill systems (two conventional drilling systems and one low-speed drilling system) was evaluated by measuring the bone temperature using infrared thermography. Each system had two different bur sizes. The drill systems used were twist drill (2.0 mm/2.5 mm), which establishes the direction of the implant, and finally a 3.0 mm-pilot drill. Thermal images were recorded using the IRI1001 system (Infrared Integrated Systems Ltd.). Baseline temperature was 31±1ÂșC. Measurements were repeated 10 times, and a static load of 10 kg was applied while drilling. Data were analyzed using descriptive statistics. Statistical analysis was conducted with two-way ANOVA. RESULTS AND CONCLUSIONS: Mean values (n=10 drill sequences) for maximum recorded temperature (Max TÂșC), change in temperature (ΔTÂșC) from baseline were as follows. The changes in temperature (ΔTÂșC) were 1.57ÂșC and 2.46ÂșC for the lowest and the highest values, respectively. Drilling at 50 rpm without irrigation did not produce overheating. There was no significant difference in heat production between the 3 implant drill systems (p>;0.05). No implant drill system produced heat exceeding 47ÂșC, which is the critical temperature for bone necrosis during low-speed drilling. Low-speed drilling without irrigation could be used during implant site preparation

    How Transitive Are Real-World Group Interactions? -- Measurement and Reproduction

    Full text link
    Many real-world interactions (e.g., researcher collaborations and email communication) occur among multiple entities. These group interactions are naturally modeled as hypergraphs. In graphs, transitivity is helpful to understand the connections between node pairs sharing a neighbor, and it has extensive applications in various domains. Hypergraphs, an extension of graphs, are designed to represent group relations. However, to the best of our knowledge, there has been no examination regarding the transitivity of real-world group interactions. In this work, we investigate the transitivity of group interactions in real-world hypergraphs. We first suggest intuitive axioms as necessary characteristics of hypergraph transitivity measures. Then, we propose a principled hypergraph transitivity measure HyperTrans, which satisfies all the proposed axioms, with a fast computation algorithm Fast-HyperTrans. After that, we analyze the transitivity patterns in real-world hypergraphs distinguished from those in random hypergraphs. Lastly, we propose a scalable hypergraph generator THera. It reproduces the observed transitivity patterns by leveraging community structures, which are pervasive in real-world hypergraphs. Our code and datasets are available at https://github.com/kswoo97/hypertrans.Comment: To be published in KDD 2023. 12 pages, 7 figures, and 11 table
    • 

    corecore