268 research outputs found

    Fairness Testing: Testing Software for Discrimination

    Full text link
    This paper defines software fairness and discrimination and develops a testing-based method for measuring if and how much software discriminates, focusing on causality in discriminatory behavior. Evidence of software discrimination has been found in modern software systems that recommend criminal sentences, grant access to financial products, and determine who is allowed to participate in promotions. Our approach, Themis, generates efficient test suites to measure discrimination. Given a schema describing valid system inputs, Themis generates discrimination tests automatically and does not require an oracle. We evaluate Themis on 20 software systems, 12 of which come from prior work with explicit focus on avoiding discrimination. We find that (1) Themis is effective at discovering software discrimination, (2) state-of-the-art techniques for removing discrimination from algorithms fail in many situations, at times discriminating against as much as 98% of an input subdomain, (3) Themis optimizations are effective at producing efficient test suites for measuring discrimination, and (4) Themis is more efficient on systems that exhibit more discrimination. We thus demonstrate that fairness testing is a critical aspect of the software development cycle in domains with possible discrimination and provide initial tools for measuring software discrimination.Comment: Sainyam Galhotra, Yuriy Brun, and Alexandra Meliou. 2017. Fairness Testing: Testing Software for Discrimination. In Proceedings of 2017 11th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering (ESEC/FSE), Paderborn, Germany, September 4-8, 2017 (ESEC/FSE'17). https://doi.org/10.1145/3106237.3106277, ESEC/FSE, 201

    Angels and monsters: An empirical investigation of potential test effectiveness and efficiency improvement from strongly subsuming higher order mutation

    Get PDF
    We study the simultaneous test effectiveness and efficiency improvement achievable by Strongly Subsuming Higher Order Mutants (SSHOMs), constructed from 15,792 first order mutants in four Java programs. Using SSHOMs in place of the first order mutants they subsume yielded a 35%-45% reduction in the number of mutants required, while simultaneously improving test efficiency by 15% and effectiveness by between 5.6% and 12%. Trivial first order faults often combine to form exceptionally non-trivial higher order faults; apparently innocuous angels can combine to breed monsters. Nevertheless, these same monsters can be recruited to improve automated test effectiveness and efficiency

    Threats to the validity of mutation-based test assessment

    Get PDF
    Much research on software testing and test techniques relies on experimental studies based on mutation testing. In this paper we reveal that such studies are vulnerable to a potential threat to validity, leading to possible Type I errors; incorrectly rejecting the Null Hypothesis. Our findings indicate that Type I errors occur, for arbitrary experiments that fail to take countermeasures, approximately 62% of the time. Clearly, a Type I error would potentially compromise any scientific conclusion. We show that the problem derives from such studies’ combined use of both subsuming and subsumed mutants. We collected articles published in the last two years at three leading software engineering conferences. Of those that use mutation-based test assessment, we found that 68% are vulnerable to this threat to validity

    Automated Repair of Feature Interaction Failures in Automated Driving Systems

    Get PDF
    In the past years, several automated repair strategies have been proposed to fix bugs in individual software programs without any human intervention. There has been, however, little work on how automated repair techniques can resolve failures that arise at the system-level and are caused by undesired interactions among different system components or functions. Feature interaction failures are common in complex systems such as autonomous cars that are typically built as a composition of independent features (i.e., units of functionality). In this paper, we propose a repair technique to automatically resolve undesired feature interaction failures in automated driving systems (ADS) that lead to the violation of system safety requirements. Our repair strategy achieves its goal by (1) localizing faults spanning several lines of code, (2) simultaneously resolving multiple interaction failures caused by independent faults, (3) scaling repair strategies from the unit-level to the system-level, and (4) resolving failures based on their order of severity. We have evaluated our approach using two industrial ADS containing four features. Our results show that our repair strategy resolves the undesired interaction failures in these two systems in less than 16h and outperforms existing automated repair techniques

    Mutation Testing Advances: An Analysis and Survey

    Get PDF

    Combining Model Checking and Testing

    Get PDF
    Abstract Model checking and testing have a lot in common. Over the last two decades, significant progress has been made on how to broaden the scope of model checking from finite-state abstractions to actual software implementations. One way to do this consists of adapting model checking into a form of systematic testing that is applicable to industrial-size software. This chapter presents an overview of this strand of software model checking
    • …
    corecore