129 research outputs found
An Entry Point for Formal Methods: Specification and Analysis of Event Logs
Formal specification languages have long languished, due to the grave
scalability problems faced by complete verification methods. Runtime
verification promises to use formal specifications to automate part of the more
scalable art of testing, but has not been widely applied to real systems, and
often falters due to the cost and complexity of instrumentation for online
monitoring. In this paper we discuss work in progress to apply an event-based
specification system to the logging mechanism of the Mars Science Laboratory
mission at JPL. By focusing on log analysis, we exploit the "instrumentation"
already implemented and required for communicating with the spacecraft. We
argue that this work both shows a practical method for using formal
specifications in testing and opens interesting research avenues, including a
challenging specification learning problem
Swarm testing
ManuscriptSwarm testing is a novel and inexpensive way to improve the diversity of test cases generated during random testing. Increased diversity leads to improved coverage and fault detection. In swarm testing, the usual practice of potentially including all features in every test case is abandoned. Rather, a large "swarm" of randomly generated configurations, each of which omits some features, is used, with configurations receiving equal resources. We have identified two mechanisms by which feature omission leads to better exploration of a system's state space. First, some features actively prevent the system from executing interesting behaviors; e.g., "pop" calls may prevent a stack data structure from executing a bug in its overflow detection logic. Second, even when there is no active suppression of behaviors, test features compete for space in each test, limiting the depth to which logic driven by features can be explored. Experimental results show that swarm testing increases coverage and can improve fault detection dramatically; for example, in a week of testing it found 42% more distinct ways to crash a collection of C compilers than did the heavily hand-tuned default configuration of a random tester
Help, help, Im being suppressed the significance of suppressors in software testing
pre-printAbstract-Test features are basic compositional units used to describe what a test does (and does not) involve. For example, in API-based testing, the most obvious features are function calls; in grammar-based testing, the obvious features are the elements of the grammar. The relationship between features as abstractions of tests and produced behaviors of the tested program is surprisingly poorly understood. This paper shows how large-scale random testing modified to use diverse feature sets can uncover causal relationships between what a test contains and what the program being tested does. We introduce a general notion of observable behaviors as targets, where a target can be a detected fault, an executed branch or statement, or a complex coverage entity such as a state, predicate-valuation, or program path. While it is obvious that targets have triggers - features without which they cannot be hit by a test - the notion of suppressors - features which make a test less likely to hit a target - has received little attention despite having important implications for automated test generation and program understanding. For a set of subjects including C compilers, a flash file system, and JavaScript engines, we show that suppression is both common and important
Cause reduction for quick testing
pre-printAbstract-In random testing, it is often desirable to produce a "quick test" - an extremely inexpensive test suite that can serve as a frequently applied regression and allow the benefits of random testing to be obtained even in very slow or oversubscribed test environments. Delta debugging is an algorithm that, given a failing test case, produces a smaller test case that also fails, and typically executes much more quickly. Delta debugging of random tests can produce effective regression suites for previously detected faults, but such suites often have little power for detecting new faults, and in some cases provide poor code coverage. This paper proposes extending delta debugging by simplifying tests with respect to code coverage, an instance of a generalization of delta debugging we call cause reduction. We show that test suites reduced in this fashion can provide very effective quick tests for real-world programs. For Mozilla's SpiderMonkey JavaScript engine, the reduced suite is more effective for finding software faults, even if its reduced runtime is not considered. The effectiveness of a reduction-based quick test persists through major changes to the software under test
Using test case reduction and prioritization to improve symbolic execution
Scaling symbolic execution to large programs or programs with complex inputs remains difficult due to path explosion and complex constraints, as well as external method calls. Additionally, creating an effective test structure with sym-bolic inputs can be difficult. A popular symbolic execution strategy in practice is to perform symbolic execution not “from scratch ” but based on existing test cases. This paper proposes that the effectiveness of this approach to symbolic execution can be enhanced by (1) reducing the size of seed test cases and (2) prioritizing seed test cases to maximize ex-ploration efficiency. The proposed test case reduction strat-egy is based on a recently introduced generalization of delta-debugging, and our prioritization techniques include novel methods that, for this purpose, can outperform some tradi-tional regression testing algorithms. We show that applying these methods can significantly improve the effectiveness of symbolic execution based on existing test cases
From scripts to specifications: the evolution of a flight software testing effort
The research described in this publication was carried out at the Jet Propulsion Laboratory
What are the Actual Flaws in Important Smart Contracts (and How Can We Find Them)?
An important problem in smart contract security is understanding the
likelihood and criticality of discovered, or potential, weaknesses in
contracts. In this paper we provide a summary of Ethereum smart contract audits
performed for 23 professional stakeholders, avoiding the common problem of
reporting issues mostly prevalent in low-quality contracts. These audits were
performed at a leading company in blockchain security, using both open-source
and proprietary tools, as well as human code analysis performed by professional
security engineers. We categorize 246 individual defects, making it possible to
compare the severity and frequency of different vulnerability types, compare
smart contract and non-smart contract flaws, and to estimate the efficacy of
automated vulnerability detection approaches
Contextual Predictive Mutation Testing
Mutation testing is a powerful technique for assessing and improving test
suite quality that artificially introduces bugs and checks whether the test
suites catch them. However, it is also computationally expensive and thus does
not scale to large systems and projects. One promising recent approach to
tackling this scalability problem uses machine learning to predict whether the
tests will detect the synthetic bugs, without actually running those tests.
However, existing predictive mutation testing approaches still misclassify 33%
of detection outcomes on a randomly sampled set of mutant-test suite pairs. We
introduce MutationBERT, an approach for predictive mutation testing that
simultaneously encodes the source method mutation and test method, capturing
key context in the input representation. Thanks to its higher precision,
MutationBERT saves 33% of the time spent by a prior approach on
checking/verifying live mutants. MutationBERT, also outperforms the
state-of-the-art in both same project and cross project settings, with
meaningful improvements in precision, recall, and F1 score. We validate our
input representation, and aggregation approaches for lifting predictions from
the test matrix level to the test suite level, finding similar improvements in
performance. MutationBERT not only enhances the state-of-the-art in predictive
mutation testing, but also presents practical benefits for real-world
applications, both in saving developer time and finding hard to detect mutants
- …