1,257 research outputs found
Recommended from our members
You are the only possible oracle: Effective test selection for end users of interactive machine learning systems
How do you test a program when only a single user, with no expertise in software testing, is able to determine if the program is performing correctly? Such programs are common today in the form of machine-learned classifiers. We consider the problem of testing this common kind of machine-generated program when the only oracle is an end user: e.g., only you can determine if your email is properly filed. We present test selection methods that provide very good failure rates even for small test suites, and show that these methods work in both large-scale random experiments using a “gold standard” and in studies with real users. Our methods are inexpensive and largely algorithm-independent. Key to our methods is an exploitation of properties of classifiers that is not possible in traditional software testing. Our results suggest that it is plausible for time-pressured end users to interactively detect failures—even very hard-to-find failures—without wading through a large number of successful (and thus less useful) tests. We additionally show that some methods are able to find the arguably most difficult-to-detect faults of classifiers: cases where machine learning algorithms have high confidence in an incorrect result
A subsumption hierarchy of test case prioritization for composite services
published_or_final_versio
Report from GI-Dagstuhl Seminar 16394: Software Performance Engineering in the DevOps World
This report documents the program and the outcomes of GI-Dagstuhl Seminar
16394 "Software Performance Engineering in the DevOps World".
The seminar addressed the problem of performance-aware DevOps. Both, DevOps
and performance engineering have been growing trends over the past one to two
years, in no small part due to the rise in importance of identifying
performance anomalies in the operations (Ops) of cloud and big data systems and
feeding these back to the development (Dev). However, so far, the research
community has treated software engineering, performance engineering, and cloud
computing mostly as individual research areas. We aimed to identify
cross-community collaboration, and to set the path for long-lasting
collaborations towards performance-aware DevOps.
The main goal of the seminar was to bring together young researchers (PhD
students in a later stage of their PhD, as well as PostDocs or Junior
Professors) in the areas of (i) software engineering, (ii) performance
engineering, and (iii) cloud computing and big data to present their current
research projects, to exchange experience and expertise, to discuss research
challenges, and to develop ideas for future collaborations
Same Coverage, Less Bloat: Accelerating Binary-only Fuzzing with Coverage-preserving Coverage-guided Tracing
Coverage-guided fuzzing's aggressive, high-volume testing has helped reveal
tens of thousands of software security flaws. While executing billions of test
cases mandates fast code coverage tracing, the nature of binary-only targets
leads to reduced tracing performance. A recent advancement in binary fuzzing
performance is Coverage-guided Tracing (CGT), which brings orders-of-magnitude
gains in throughput by restricting the expense of coverage tracing to only when
new coverage is guaranteed. Unfortunately, CGT suits only a basic block
coverage granularity -- yet most fuzzers require finer-grain coverage metrics:
edge coverage and hit counts. It is this limitation which prohibits nearly all
of today's state-of-the-art fuzzers from attaining the performance benefits of
CGT.
This paper tackles the challenges of adapting CGT to fuzzing's most
ubiquitous coverage metrics. We introduce and implement a suite of enhancements
that expand CGT's introspection to fuzzing's most common code coverage metrics,
while maintaining its orders-of-magnitude speedup over conventional always-on
coverage tracing. We evaluate their trade-offs with respect to fuzzing
performance and effectiveness across 12 diverse real-world binaries (8 open-
and 4 closed-source). On average, our coverage-preserving CGT attains
near-identical speed to the present block-coverage-only CGT, UnTracer; and
outperforms leading binary- and source-level coverage tracers QEMU, Dyninst,
RetroWrite, and AFL-Clang by 2-24x, finding more bugs in less time.Comment: CCS '21: Proceedings of the 2021 ACM SIGSAC Conference on Computer
and Communications Securit
A Comprehensive Framework for Testing Database-Centric Software Applications
The database is a critical component of many modern software applications. Recent reports indicate that the vast majority of database use occurs from within an application program. Indeed, database-centric applications have been implemented to create digital libraries, scientific data repositories, and electronic commerce applications. However, a database-centric application is very different from a traditional software system because it interacts with a database that has a complex state and structure. This dissertation formulates a comprehensive framework to address the challenges that are associated with the efficient and effective testing of database-centric applications. The database-aware approach to testing includes: (i) a fault model, (ii) several unified representations of a program's database interactions, (iii) a family of test adequacycriteria, (iv) a test coverage monitoring component, and (v) tools for reducing and re-ordering a test suite during regression testing.This dissertation analyzes the worst-case time complexity of every important testing algorithm. This analysis is complemented by experiments that measure the efficiency and effectiveness of thedatabase-aware testing techniques. Each tool is evaluated by using it to test six database-centric applications. The experiments show thatthe database-aware representations can be constructed with moderate time and space overhead. The adequacy criteria call for test suitesto cover 20% more requirements than traditional criteria and this ensures the accurate assessment of test suite quality. It is possibleto enumerate data flow-based test requirements in less than one minute and coverage tree path requirements are normally identified in no morethan ten seconds. The experimental results also indicate that the coverage monitor can insert instrumentation probes into all six of theapplications in fewer than ten seconds. Although instrumentation may moderately increase the static space overhead of an application, the coverage monitoring techniques only increase testing time by 55% on average. A coverage tree often can be stored in less than five seconds even though the coverage report may consume up to twenty-fivemegabytes of storage. The regression tester usually reduces or prioritizes a test suite in under five seconds. The experiments also demonstrate that the modified test suite is frequently more streamlined than the initial tests
Automated Test Generation Based on an Applicational Model
Context: As testing is an extremely costly and time-consuming process, tools to automatically
generate test cases have been proposed throughout the literature. OutSystems
provides a software development environment where with the aid of the visual OutSystems
language, developers can create their applications in an agile form, thus improving
their productivity.
Problem: As OutSystems aims at accelerating software development, automating the
test case generation activity would bring great value to their clients.
Objectives: The main objectives of this work are to: develop an algorithm that generates,
automatically, test cases for OutSystems applications and evaluates the coverage
they provide to the code, according to a set of criteria.
Methods: The OutSystems language is represented as a graph to which developers can
then add pieces of code by dragging nodes to the screen and connecting them to the graph.
The methodology applied in this work consists in traversing these graphs with depth and
breadth-first search algorithms, employing a boundary-value analysis to identify the test
inputs and a cause-effect graphing to reduce the number of redundant inputs generated.
To evaluate these test inputs, coverage criteria regarding the control flow of data are
analysed according to node, branch, condition, modified condition-decision and multiple
condition coverage.
Results: This tool is able to generate test inputs that cover 100% of reachable code
and the methodologies employed help greatly in reducing the inputs generated, as well
as displaying a minimum set of test inputs with which the developer is already able to
cover all traversable code. Usability tests also yield very optimistic feedback from users.
Conclusions: This work’s objectives were fully met, seen as we have a running tool
able to act upon a subset of the OutSystems applicational model. This work provides
crucial information for assessing the quality of OutSystems applications, with value for
OutSystems developers, in the form of efficiency and visibility
Preemptive regression testing of workflow-based web services
published_or_final_versio
- …