Search CORE

100 research outputs found

Fairness Testing: Testing Software for Discrimination

Author: Angwin Julia
Brun Yuriy
Ferral Katelyn
Gonzalez Jesus A.
Guglielmo Luigi Di
Ingold David
Letzter Rafi
Mattioli Dana
Meliou Alexandra
Meliou Alexandra
Nadella Satya
Olson Parmy
Shahani Aarti
Soper Spencer
Soper Spencer
von Rhein Alexander
Zafar Muhammad Bilal
Zemel Richard
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 10/09/2017
Field of study

This paper defines software fairness and discrimination and develops a testing-based method for measuring if and how much software discriminates, focusing on causality in discriminatory behavior. Evidence of software discrimination has been found in modern software systems that recommend criminal sentences, grant access to financial products, and determine who is allowed to participate in promotions. Our approach, Themis, generates efficient test suites to measure discrimination. Given a schema describing valid system inputs, Themis generates discrimination tests automatically and does not require an oracle. We evaluate Themis on 20 software systems, 12 of which come from prior work with explicit focus on avoiding discrimination. We find that (1) Themis is effective at discovering software discrimination, (2) state-of-the-art techniques for removing discrimination from algorithms fail in many situations, at times discriminating against as much as 98% of an input subdomain, (3) Themis optimizations are effective at producing efficient test suites for measuring discrimination, and (4) Themis is more efficient on systems that exhibit more discrimination. We thus demonstrate that fairness testing is a critical aspect of the software development cycle in domains with possible discrimination and provide initial tools for measuring software discrimination.Comment: Sainyam Galhotra, Yuriy Brun, and Alexandra Meliou. 2017. Fairness Testing: Testing Software for Discrimination. In Proceedings of 2017 11th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering (ESEC/FSE), Paderborn, Germany, September 4-8, 2017 (ESEC/FSE'17). https://doi.org/10.1145/3106237.3106277, ESEC/FSE, 201

arXiv.org e-Print Archive

Crossref

Threats to the validity of mutation-based test assessment

Author: Harman M
Henard C
Jia Y
Papadakis M
Traon YL
Publication venue: The 25th International Symposium on Software Testing and Analysis
Publication date: 01/01/2016
Field of study

Much research on software testing and test techniques relies on experimental studies based on mutation testing. In this paper we reveal that such studies are vulnerable to a potential threat to validity, leading to possible Type I errors; incorrectly rejecting the Null Hypothesis. Our findings indicate that Type I errors occur, for arbitrary experiments that fail to take countermeasures, approximately 62% of the time. Clearly, a Type I error would potentially compromise any scientific conclusion. We show that the problem derives from such studies’ combined use of both subsuming and subsumed mutants. We collected articles published in the last two years at three leading software engineering conferences. Of those that use mutation-based test assessment, we found that 68% are vulnerable to this threat to validity

UCL Discovery

Open Repository and Bibliography - Luxembourg

MuDelta: Delta-Oriented Mutation Testing at Commit Time

Author: Chekam TT
Harman M
Ma W
Papadakis M
Publication venue: 43rd IEEE/ACM International Conference on Software Engineering - Software Engineering in Practice (ICSE-SEIP) / 43rd ACM/IEEE International Conference on Software Engineering - New Ideas and Emerging Results (ICSE-NIER)
Publication date: 07/05/2021
Field of study

To effectively test program changes using mutation testing, one needs to use mutants that are relevant to the altered program behaviours. In view of this, we introduce MuDelta, an approach that identifies commit-relevant mutants; mutants that affect and are affected by the changed program behaviours. Our approach uses machine learning applied on a combined scheme of graph and vector-based representations of static code features. Our results, from 50 commits in 21 Coreutils programs, demonstrate a strong prediction ability of our approach; yielding 0.80 (ROC) and 0.50 (PR Curve) AUC values with 0.63 and 0.32 precision and recall values. These predictions are significantly higher than random guesses, 0.20 (PR-Curve) AUC, 0.21 and 0.21 precision and recall, and subsequently lead to strong relevant tests that kill 45%more relevant mutants than randomly sampled mutants (either sampled from those residing on the changed component(s) or from the changed lines). Our results also show that MuDelta selects mutants with 27% higher fault revealing ability in fault introducing commits. Taken together, our results corroborate the conclusion that commit-based mutation testing is suitable and promising for evolving software

UCL Discovery

CARISMA: a context-sensitive approach to race-condition sample-instance selection for multithreaded applications

Author: Chan WK
Tse TH
Xu B
Zhai K
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2012
Field of study

Dynamic race detectors can explore multiple thread schedules of a multithreaded program over the same input to detect data races. Although existing sampling-based precise race detectors reduce overheads effectively so that lightweight precise race detection can be performed in testing or post-deployment environments, they are ineffective in detecting races if the sampling rates are low. This paper presents CARISMA to address this problem. CARISMA exploits the insight that along an execution trace, a program may potentially handle many accesses to the memory locations created at the same site for similar purposes. Iterating over multiple execution trials of the same input, CARISMA estimates and distributes the sampling budgets among such location creation sites, and probabilistically collects a fraction of all accesses to the memory locations associated with such sites for subsequent race detection. Our experiment shows that, compared with PACER on the same platform and at the same sampling rate (such as 1%), CARISMA is significantly more effective. © 2012 ACM.postprin

CiteSeerX

HKU Scholars Hub

A Study of Equivalent and Stubborn Mutation Operators using Human Analysis of Equivalence

Author: Aho A. V.
Budd T. A.
Harman M.
International Standards
Just R.
Offutt A. J.
Schuler D.
Voas J.
Publication venue: Association for Computer Machinery (ACM)
Publication date: 01/01/2014
Field of study

Though mutation testing has been widely studied for more than thirty years, the prevalence and properties of equivalent mutants remain largely unknown. We report on the causes and prevalence of equivalent mutants and their relationship to stubborn mutants (those that remain undetected by a high quality test suite, yet are non-equivalent). Our results, based on manual analysis of 1,230 mutants from 18 programs, reveal a highly uneven distribution of equivalence and stubbornness. For example, the ABS class and half UOI class generate many equivalent and almost no stubborn mutants, while the LCR class generates many stubborn and few equivalent mutants. We conclude that previous test effectiveness studies based on fault seeding could be skewed, while developers of mutation testing tools should prioritise those operators that we found generate disproportionately many stubborn (and few equivalent) mutants

CiteSeerX

Crossref

UCL Discovery

An empirical study on mutation testing of WS-BPEL programs

Author: Liu H
Pan L
Sun C
Wang Q
Zhang X
Publication venue: Oxford University Press (United Kingdom)
Publication date: 01/01/2016
Field of study

Nowadays, applications are increasingly deployed as Web services in the globally distributed cloud computing environment. Multiple services are normally composed to fulfill complex functionalities. Business Process Execution Language for Web Services (WS-BPEL) is an XML-based service composition language that is used to define a complex business process by orchestrating multiple services. Compared with traditional applications, WS-BPEL programs pose many new challenges to the quality assurance, especially testing, of service compositions. A number of techniques have been proposed for testing WS-BPEL programs, but only a few studies have been conducted to systematically evaluate the effectiveness of these techniques. Mutation testing has been widely acknowledged as not only a testing method in its own right but also a popular technique for measuring the fault-detection effectiveness of other testing methods. Several previous studies have proposed a family of mutation operators for generating mutants by seeding various faults into WS-BPEL programs. In this study, we conduct a series of empirical studies to evaluate the applicability and effectiveness of various mutation operators for WS-BPEL programs. The experimental results provide insightful and comprehensive guidance for mutation testing of WS-BPEL programs in practice. In particular, our work is the systematic study in the selection of effective mutation operators specifically for WS-BPEL programs

RMIT Research Repository

Victoria University Eprints Repository

An Empirical Study on Mutation, Statement and Branch Coverage Fault Revelation that Avoids the Unreliable Clean Program Assumption

Author: Harman Mark
Le Traon Yves
Papadakis Mike
Titcheu Chekam Thierry
Publication venue
Publication date: 01/01/2017
Field of study

Many studies suggest using coverage concepts, such as branch coverage, as the starting point of testing, while others as the most prominent test quality indicator. Yet the relationship between coverage and fault-revelation remains unknown, yielding uncertainty and controversy. Most previous studies rely on the Clean Program Assumption, that a test suite will obtain similar coverage for both faulty and fixed (‘clean’) program versions. This assumption may appear intuitive, especially for bugs that denote small semantic deviations. However, we present evidence that the Clean Program Assumption does not always hold, thereby raising a critical threat to the validity of previous results. We then conducted a study using a robust experimental methodology that avoids this threat to validity, from which our primary finding is that strong mutation testing has the highest fault revelation of four widely-used criteria. Our findings also revealed that fault revelation starts to increase significantly only once relatively high levels of coverage are attained

Crossref

UCL Discovery

Open Repository and Bibliography - Luxembourg

Commit-Aware Mutation Testing

Author: Laurent Thomas
Ma Wei
Ojdanic Milos
Papadakis Mike
Titcheu Chekam Thierry
Ventresque Anthony
Publication venue
Publication date: 01/01/2020
Field of study

Crossref

Open Repository and Bibliography - Luxembourg