20 research outputs found
Assessing Test Adequacy for Black Box Systems without Specifications
Testing a black-box system without recourse to a specification is difficult, because there is no basis for estimating how many tests will be required, or to assess how complete a given test set is. Several researchers have noted that there is a duality between these testing problems and the problem of inductive inference (learning a model of a hidden system from a given set of examples). It is impossible to tell how many examples will be required to infer an accurate model, and there is no basis for telling how complete a given set of examples is. These issues have been addressed in the domain of inductive inference by developing statistical techniques, where the accuracy of an inferred model is subject to a tolerable degree of error. This paper explores the application of these techniques to assess test sets of black-box systems. It shows how they can be used to reason in a statistically justified manner about the number of tests required to fully exercise a system without a specification, and how to provide a valid adequacy measure for black-box test sets in an applied context
Finding Clustering Configurations to Accurately Infer Packet Structures from Network Data
Clustering is often used for reverse engineering network protocols from captured network traces. The performance of clustering techniques is often contingent upon the selection of various parameters, which can have a severe impact on clustering quality. In this paper we experimentally investigate the effect of four different parameters with respect to network traces. We also determining the optimal parameter configuration with respect to traces from four different network protocols. Our results indicate that the choice of distance measure and the length of the message has the most substantial impact on cluster accuracy. Depending on the type of protocol, the -gram length can also have a substantial impact
Visualising Software as a Particle System
Current metrics-based approaches to visualise un-
familiar software systems face two key limitations: (1) They
are limited in terms of the number of dimensions that can
be projected, and (2) they use fixed layout algorithms where
the resulting positions of entities can be vulnerable to mis-
interpretation. In this paper we show how computer games
technology can be used to address these problems. We present
the PhysVis software exploration system, where software metrics
can be variably mapped to parameters of a physical model and
displayed via a particle system. Entities can be imbued with
attributes such as mass, gravity, and (for relationships) strength
or springiness, alongside traditional attributes such as position,
colour and size. The resulting visualisation is a dynamic scene;
the relative positions of entities are not determined by a fixed
layout algorithm, but by intuitive physical notions such as gravity,
mass, and drag. The implementation is openly available, and we
evaluate it on a selection of visualisation tasks for two openly-
available software systems
Behaviourally Adequate Software Testing
Identifying a finite test set that adequately captures the essential behaviour of a program such that all faults are identified is a well-established problem. Traditional adequacy metrics can be impractical, and may be misleading even if they are satisfied. One intuitive notion of adequacy, which has been discussed in theoretical terms over the past three decades, is the idea of behavioural coverage; if it is possible to infer an accurate model of a system from its test executions, then the test set must be adequate. Despite its intuitive basis, it has remained almost entirely in the theoretical domain because inferred models have been expected to be exact (generally an infeasible task), and have not allowed for any pragmatic interim measures of adequacy to guide test set generation. In this work we present a new test generation technique that is founded on behavioural adequacy, which combines a model evaluation framework from the domain of statistical learning theory with search-based white-box test generation strategies. Experiments with our BESTEST prototype indicate that such test sets not only come with a statistically valid measurement of adequacy, but also detect significantly more defects
Using Segment-Based Alignment to Extract Packet Structures from Network Traces
Many applications in security, from understanding
unfamiliar protocols to fuzz-testing and guarding against potential
attacks, rely on analysing network protocols. In many
situations we cannot rely on access to a specification or even
an implementation of the protocol, and must instead rely on
raw network data “sniffed” from the network. When this is
the case, one of the key challenges is to discern from the raw
data the underlying packet structures – a task that is commonly
carried out by using alignment algorithms to identify
commonalities (e.g. field delimiters) between packets. For this,
most approaches have used variants of the Needleman Wunsch
algorthm to perform byte-wise alignment. However, they can
suffer when messages are heterogeneous, or in cases where
protocol fields are separated by long variable fields. In this
paper, we present an alternative alignment algorithm known
as segment-based alignment. We show how this technique
can produce accurate results on traces from several common
protocols, and how the results tend to be more intuitive than
those produced by state-of-the-art techniques
A Search Based Approach for Stress-Testing Integrated Circuits
In order to reduce software complexity and be power efficient, hardware platforms are increasingly incorporating functionality that was traditionally administered at a software-level (such as cache management). This functionality is often complex, incorporating multiple processors along with a multitude of design parameters. Such devices can only be reliably tested at a ‘system’ level, which presents various testing challenges; behaviour is often non-deterministic (from a software perspective), and finding suitable test sets to ‘stress’ the system adequately is often an inefficient, manual activity that yields fixed test sets that can rarely be reused. In this paper we investigate this problem with respect to ARM’s Cache Coherent Interconnect (CCI) Unit. We present an automated search-based testing approach that combines a parameterised test-generation framework with the hill-climbing heuristic to find test sets that maximally ‘stress’ the CCI by producing much larger numbers of data stall cycles than the corresponding manual test sets
Computing the Structural Difference between State-Based Models
Software behaviour models play an important role in software development. They can be manually generated to specify the intended behaviour of a system, or they can be reverse-engineered to capture the actual behaviour of the system. Models may differ when they correspond to different versions of the system, or they may contain faults or inaccuracies. In these circumstances, it is important to be able to concisely capture the differences between models a task that becomes increasingly challenging with complex models. This paper presents the PLTSDiff algorithm that addresses this problem. Given two state machines, the algorithm can identify which states and transitions are different. This can be used to generate a 'patch' with differences or to evaluate the extent of the differences between the machines. The paper also shows how the Precision and Recall measure can be adapted to quantify the similarity of two state machines
Black-Box Test Generation from Inferred Models
Automatically generating test inputs for components
without source code (are ‘black-box’) and specification is challenging.
One particularly interesting solution to this problem is to
use Machine Learning algorithms to infer testable models from
program executions in an iterative cycle. Although the idea has
been around for over 30 years, there is little empirical information
to inform the choice of suitable learning algorithms, or to show
how good the resulting test sets are. This paper presents an
openly available framework to facilitate experimentation in this
area, and provides a proof-of-concept inference-driven testing
framework, along with evidence of the efficacy of its test sets on
three programs
Supervised Software Modularisation
This paper is concerned with the challenge of reorganising a software system into modules that both obey sound design principles and are sensible to domain experts. The problem has given rise to several unsupervised automated approaches that use techniques such as clustering and Formal Concept Analysis. Although results are often partially correct, they usually require refinement to enable the developer to integrate domain knowledge. This paper presents the SUMO algorithm, an approach that is complementary to existing techniques and enables the maintainer to refine their results. The algorithm is guaranteed to eventually yield a result that is satisfactory to the maintainer, and the evaluation on a diverse range of systems shows that this occurs with a reasonably low amount of effort
Using Compression Algorithms to Support the Comprehension of Program Traces
Several software maintenance tasks such as debugging, phase-identification, or simply the high-level exploration of system functionality, rely on the extensive analysis of program traces. These usually require the developer to manually discern any repeated patterns that may be of interest from some visual representation of the trace. This can be both time-consuming and inaccurate; there is always the danger that visually similar trace-patterns actually represent distinct program behaviours. This paper presents an automated phase-identification technique. It is founded on the observation that the challenge of identifying repeated patterns in a trace is analogous to the challenge faced by data-compression algorithms. This applies an established data compression algorithm to identify repeated phases in traces. The SEQUITUR compression algorithm not only compresses data, but organises the repeated patterns into a hierarchy, which is especially useful from a comprehension standpoint, because it enables the analysis of a trace at varying levels of abstraction