36 research outputs found
Automated Game Design Learning
While general game playing is an active field of research, the learning of
game design has tended to be either a secondary goal of such research or it has
been solely the domain of humans. We propose a field of research, Automated
Game Design Learning (AGDL), with the direct purpose of learning game designs
directly through interaction with games in the mode that most people experience
games: via play. We detail existing work that touches the edges of this field,
describe current successful projects in AGDL and the theoretical foundations
that enable them, point to promising applications enabled by AGDL, and discuss
next steps for this exciting area of study. The key moves of AGDL are to use
game programs as the ultimate source of truth about their own design, and to
make these design properties available to other systems and avenues of inquiry.Comment: 8 pages, 2 figures. Accepted for CIG 201
Stateful Testing: Finding More Errors in Code and Contracts
Automated random testing has shown to be an effective approach to finding
faults but still faces a major unsolved issue: how to generate test inputs
diverse enough to find many faults and find them quickly. Stateful testing, the
automated testing technique introduced in this article, generates new test
cases that improve an existing test suite. The generated test cases are
designed to violate the dynamically inferred contracts (invariants)
characterizing the existing test suite. As a consequence, they are in a good
position to detect new errors, and also to improve the accuracy of the inferred
contracts by discovering those that are unsound. Experiments on 13 data
structure classes totalling over 28,000 lines of code demonstrate the
effectiveness of stateful testing in improving over the results of long
sessions of random testing: stateful testing found 68.4% new errors and
improved the accuracy of automatically inferred contracts to over 99%, with
just a 7% time overhead.Comment: 11 pages, 3 figure
DSpot: Test Amplification for Automatic Assessment of Computational Diversity
Context: Computational diversity, i.e., the presence of a set of programs
that all perform compatible services but that exhibit behavioral differences
under certain conditions, is essential for fault tolerance and security.
Objective: We aim at proposing an approach for automatically assessing the
presence of computational diversity. In this work, computationally diverse
variants are defined as (i) sharing the same API, (ii) behaving the same
according to an input-output based specification (a test-suite) and (iii)
exhibiting observable differences when they run outside the specified input
space. Method: Our technique relies on test amplification. We propose source
code transformations on test cases to explore the input domain and
systematically sense the observation domain. We quantify computational
diversity as the dissimilarity between observations on inputs that are outside
the specified domain. Results: We run our experiments on 472 variants of 7
classes from open-source, large and thoroughly tested Java classes. Our test
amplification multiplies by ten the number of input points in the test suite
and is effective at detecting software diversity. Conclusion: The key insights
of this study are: the systematic exploration of the observable output space of
a class provides new insights about its degree of encapsulation; the behavioral
diversity that we observe originates from areas of the code that are
characterized by their flexibility (caching, checking, formatting, etc.).Comment: 12 page
Fairness Testing: Testing Software for Discrimination
This paper defines software fairness and discrimination and develops a
testing-based method for measuring if and how much software discriminates,
focusing on causality in discriminatory behavior. Evidence of software
discrimination has been found in modern software systems that recommend
criminal sentences, grant access to financial products, and determine who is
allowed to participate in promotions. Our approach, Themis, generates efficient
test suites to measure discrimination. Given a schema describing valid system
inputs, Themis generates discrimination tests automatically and does not
require an oracle. We evaluate Themis on 20 software systems, 12 of which come
from prior work with explicit focus on avoiding discrimination. We find that
(1) Themis is effective at discovering software discrimination, (2)
state-of-the-art techniques for removing discrimination from algorithms fail in
many situations, at times discriminating against as much as 98% of an input
subdomain, (3) Themis optimizations are effective at producing efficient test
suites for measuring discrimination, and (4) Themis is more efficient on
systems that exhibit more discrimination. We thus demonstrate that fairness
testing is a critical aspect of the software development cycle in domains with
possible discrimination and provide initial tools for measuring software
discrimination.Comment: Sainyam Galhotra, Yuriy Brun, and Alexandra Meliou. 2017. Fairness
Testing: Testing Software for Discrimination. In Proceedings of 2017 11th
Joint Meeting of the European Software Engineering Conference and the ACM
SIGSOFT Symposium on the Foundations of Software Engineering (ESEC/FSE),
Paderborn, Germany, September 4-8, 2017 (ESEC/FSE'17).
https://doi.org/10.1145/3106237.3106277, ESEC/FSE, 201
A Mapping Study of scientific merit of papers, which subject are web applications test techniques, considering their validity threats
Progress in software engineering requires (1) more empirical studies of quality, (2) increased focus on synthesizing evidence, (3) more theories to be built and tested, and (4) the validity of the experiment is directly related with the level of confidence in the process of experimental investigation. This paper presents the results of a qualitative and quantitative classification of the threats to the validity of software engineering experiments comprising a total of 92 articles published in the period 2001-2015, dealing with software testing of Web applications. Our results show that 29.4% of the analyzed articles do not mention any threats to validity, 44.2% do it briefly, and 14% do it judiciously; that leaves a question: these studies have scientific value
Statistical log differencing
National Research Foundation (NRF) Singapor