21 research outputs found

    Identifying Bugs in Make and JVM-Oriented Builds

    Full text link
    Incremental and parallel builds are crucial features of modern build systems. Parallelism enables fast builds by running independent tasks simultaneously, while incrementality saves time and computing resources by processing the build operations that were affected by a particular code change. Writing build definitions that lead to error-free incremental and parallel builds is a challenging task. This is mainly because developers are often unable to predict the effects of build operations on the file system and how different build operations interact with each other. Faulty build scripts may seriously degrade the reliability of automated builds, as they cause build failures, and non-deterministic and incorrect build results. To reason about arbitrary build executions, we present buildfs, a generally-applicable model that takes into account the specification (as declared in build scripts) and the actual behavior (low-level file system operation) of build operations. We then formally define different types of faults related to incremental and parallel builds in terms of the conditions under which a file system operation violates the specification of a build operation. Our testing approach, which relies on the proposed model, analyzes the execution of single full build, translates it into buildfs, and uncovers faults by checking for corresponding violations. We evaluate the effectiveness, efficiency, and applicability of our approach by examining hundreds of Make and Gradle projects. Notably, our method is the first to handle Java-oriented build systems. The results indicate that our approach is (1) able to uncover several important issues (245 issues found in 45 open-source projects have been confirmed and fixed by the upstream developers), and (2) orders of magnitude faster than a state-of-the-art tool for Make builds

    Regression test selection model: a comparison between ReTSE and pythia

    Get PDF
    As software systems change and evolve over time regression tests have to be run to validate these changes. Regression testing is an expensive but essential activity in software maintenance. The purpose of this paper is to compare a new regression test selection model called ReTSE with Pythia. The ReTSE model uses decomposition slicing in order to identify the relevant regression tests. Decomposition slicing provides a technique that is capable of identifying the unchanged parts of a system. Pythia is a regression test selection technique based on textual differencing. Both techniques are compare using a Power program taken from Vokolos and Frankl’s paper. The analysis of this comparison has shown promising results in reducing the number of tests to be run after changes are introduced

    Neuron Sensitivity Guided Test Case Selection for Deep Learning Testing

    Full text link
    Deep Neural Networks~(DNNs) have been widely deployed in software to address various tasks~(e.g., autonomous driving, medical diagnosis). However, they could also produce incorrect behaviors that result in financial losses and even threaten human safety. To reveal the incorrect behaviors in DNN and repair them, DNN developers often collect rich unlabeled datasets from the natural world and label them to test the DNN models. However, properly labeling a large number of unlabeled datasets is a highly expensive and time-consuming task. To address the above-mentioned problem, we propose NSS, Neuron Sensitivity guided test case Selection, which can reduce the labeling time by selecting valuable test cases from unlabeled datasets. NSS leverages the internal neuron's information induced by test cases to select valuable test cases, which have high confidence in causing the model to behave incorrectly. We evaluate NSS with four widely used datasets and four well-designed DNN models compared to SOTA baseline methods. The results show that NSS performs well in assessing the test cases' probability of fault triggering and model improvement capabilities. Specifically, compared with baseline approaches, NSS obtains a higher fault detection rate~(e.g., when selecting 5\% test case from the unlabeled dataset in MNIST \& LeNet1 experiment, NSS can obtain 81.8\% fault detection rate, 20\% higher than baselines)

    Feature Map Testing for Deep Neural Networks

    Full text link
    Due to the widespread application of deep neural networks~(DNNs) in safety-critical tasks, deep learning testing has drawn increasing attention. During the testing process, test cases that have been fuzzed or selected using test metrics are fed into the model to find fault-inducing test units (e.g., neurons and feature maps, activating which will almost certainly result in a model error) and report them to the DNN developer, who subsequently repair them~(e.g., retraining the model with test cases). Current test metrics, however, are primarily concerned with the neurons, which means that test cases that are discovered either by guided fuzzing or selection with these metrics focus on detecting fault-inducing neurons while failing to detect fault-inducing feature maps. In this work, we propose DeepFeature, which tests DNNs from the feature map level. When testing is conducted, DeepFeature will scrutinize every internal feature map in the model and identify vulnerabilities that can be enhanced through repairing to increase the model's overall performance. Exhaustive experiments are conducted to demonstrate that (1) DeepFeature is a strong tool for detecting the model's vulnerable feature maps; (2) DeepFeature's test case selection has a high fault detection rate and can detect more types of faults~(comparing DeepFeature to coverage-guided selection techniques, the fault detection rate is increased by 49.32\%). (3) DeepFeature's fuzzer also outperforms current fuzzing techniques and generates valuable test cases more efficiently.Comment: 12 pages, 5 figures. arXiv admin note: text overlap with arXiv:2307.1101

    MicroFuzz: An Efficient Fuzzing Framework for Microservices

    Full text link
    This paper presents a novel fuzzing framework, called MicroFuzz, specifically designed for Microservices. Mocking-Assisted Seed Execution, Distributed Tracing, Seed Refresh and Pipeline Parallelism approaches are adopted to address the environmental complexities and dynamics of Microservices and improve the efficiency of fuzzing. MicroFuzz has been successfully implemented and deployed in Ant Group, a prominent FinTech company. Its performance has been evaluated in three distinct industrial scenarios: normalized fuzzing, iteration testing, and taint verification.Throughout five months of operation, MicroFuzz has diligently analyzed a substantial codebase, consisting of 261 Apps with over 74.6 million lines of code (LOC). The framework's effectiveness is evident in its detection of 5,718 potential quality or security risks, with 1,764 of them confirmed and fixed as actual security threats by software specialists. Moreover, MicroFuzz significantly increased program coverage by 12.24% and detected program behavior by 38.42% in the iteration testing.Comment: Accepted by ICSE-SEIP 202

    Reinforcement-Learning-Guided Source Code Summarization via Hierarchical Attention

    Full text link

    Regression testing framework for test cases generation and prioritization

    Get PDF
    A regression test is a significant part of software testing. It is used to find the maximum number of faults in software applications. Test Case Prioritization (TCP) is an approach to prioritize and schedule test cases. It is used to detect faults in the earlier stage of testing environment. Code coverage is one of the features of a Regression Test (RT) that detects more number of faults from a software application. However, code coverage and fault detection are reducing the performance of existing test case prioritization by consuming a lot of time for scanning an entire code. The process of generating test cases plays an important role in the prioritization of test cases. The existing automated generation and prioritization techniques produces insufficient test cases that cause less fault detection rate or consumes more computation time to detect more faults. Unified Modelling Language (UML) based test case generation techniques can extract test cases from UML diagrams by covering maximum part of a module of an application. Therefore, a UML based test case generation can support a test case prioritization technique to find a greater number of faults with shorter execution time. A multi-objective optimization technique able to handle multiple objectives that supports RT to generate more number of test cases as well as increase fault detection rate and produce a better result. The aim of this research is to develop a framework to detect maximum number of faults with less execution time for improving the RT. The performance of the RT can be improved by an efficient test case generation and prioritization method based on a multi-objective optimization technique by handling both test cases and rate of fault detection. This framework consists of two important models: Test Case Generation (TCG) and TCP. The TCG model requires an UML use case diagram to extract test cases. A meta heuristic approach is employed that uses tokens for generating test cases. And, TCP receives the extracted test cases with faults as input to produce the prioritized set of test cases. The proposed research has modified the existing Hill Climbing based TCP by altering its test case swapping feature and detect faults in a reasonable execution time. The proposed framework intends to improve the performance of regression testing by generating and prioritizing test cases in order to find a greater number of faults in an application. Two case studies are conducted in the research in order to gather Test Case (TC) and faults for multiple modules. The proposed framework yielded a 92.2% of Average Percentage Fault Detection with less amount of testing time comparing to the other artificial intelligence-based TCP. The findings were proved that the proposed framework produced a sufficient amount of TC and found the maximum number of faults in less amount of time
    corecore