Search CORE

2,751 research outputs found

Fault Detection Effectiveness of Metamorphic Relations Developed for Testing Supervised Classifiers

Author: Kanewala Upulee
Saha Prashanta
Publication venue
Publication date: 15/04/2019
Field of study

In machine learning, supervised classifiers are used to obtain predictions for unlabeled data by inferring prediction functions using labeled data. Supervised classifiers are widely applied in domains such as computational biology, computational physics and healthcare to make critical decisions. However, it is often hard to test supervised classifiers since the expected answers are unknown. This is commonly known as the \emph{oracle problem} and metamorphic testing (MT) has been used to test such programs. In MT, metamorphic relations (MRs) are developed from intrinsic characteristics of the software under test (SUT). These MRs are used to generate test data and to verify the correctness of the test results without the presence of a test oracle. Effectiveness of MT heavily depends on the MRs used for testing. In this paper we have conducted an extensive empirical study to evaluate the fault detection effectiveness of MRs that have been used in multiple previous studies to test supervised classifiers. Our study uses a total of 709 reachable mutants generated by multiple mutation engines and uses data sets with varying characteristics to test the SUT. Our results reveal that only 14.8\% of these mutants are detected using the MRs and that the fault detection effectiveness of these MRs do not scale with the increased number of mutants when compared to what was reported in previous studies.Comment: 8 pages, AITesting 201

arXiv.org e-Print Archive

Crossref

The Oracle Problem in Software Testing: A Survey

Author: Barr ET
Harman M
McMinn P
Shahbaz M
Yoo S
Publication venue
Publication date: 20/11/2014
Field of study

Testing involves examining the behaviour of a system in order to discover potential faults. Given an input for a system, the challenge of distinguishing the corresponding desired, correct behaviour from potentially incorrect behavior is called the “test oracle problem”. Test oracle automation is important to remove a current bottleneck that inhibits greater overall test automation. Without test oracle automation, the human has to determine whether observed behaviour is correct. The literature on test oracles has introduced techniques for oracle automation, including modelling, specifications, contract-driven development and metamorphic testing. When none of these is completely adequate, the final source of test oracle information remains the human, who may be aware of informal specifications, expectations, norms and domain specific information that provide informal oracle guidance. All forms of test oracles, even the humble human, involve challenges of reducing cost and increasing benefit. This paper provides a comprehensive survey of current approaches to the test oracle problem and an analysis of trends in this important area of software testing research and practice

CiteSeerX

UCL Discovery

The Earth is Flat? Unveiling Factual Errors in Large Language Models

Author: Huang Jen-tse
Jiao Wenxiang
Lyu Michael R.
Shi Juluan
Tu Zhaopeng
Wang Wenxuan
Yuan Youliang
Publication venue
Publication date: 01/01/2024
Field of study

Large Language Models (LLMs) like ChatGPT are foundational in various applications due to their extensive knowledge from pre-training and fine-tuning. Despite this, they are prone to generating factual and commonsense errors, raising concerns in critical areas like healthcare, journalism, and education to mislead users. Current methods for evaluating LLMs' veracity are limited by test data leakage or the need for extensive human labor, hindering efficient and accurate error detection. To tackle this problem, we introduce a novel, automatic testing framework, FactChecker, aimed at uncovering factual inaccuracies in LLMs. This framework involves three main steps: First, it constructs a factual knowledge graph by retrieving fact triplets from a large-scale knowledge database. Then, leveraging the knowledge graph, FactChecker employs a rule-based approach to generates three types of questions (Yes-No, Multiple-Choice, and WH questions) that involve single-hop and multi-hop relations, along with correct answers. Lastly, it assesses the LLMs' responses for accuracy using tailored matching strategies for each question type. Our extensive tests on six prominent LLMs, including text-davinci-002, text-davinci-003, ChatGPT~(gpt-3.5-turbo, gpt-4), Vicuna, and LLaMA-2, reveal that FactChecker can trigger factual errors in up to 45\% of questions in these models. Moreover, we demonstrate that FactChecker's test cases can improve LLMs' factual accuracy through in-context learning and fine-tuning (e.g., llama-2-13b-chat's accuracy increase from 35.3\% to 68.5\%). We are making all code, data, and results available for future research endeavors

arXiv.org e-Print Archive

Perfect is the enemy of test oracle

Author: Ibrahimzada Ali Reza
Jabbarvand Reyhaneh
Tekinoglu Dilara
Varli Yigit
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 05/04/2023
Field of study

Automation of test oracles is one of the most challenging facets of software testing, but remains comparatively less addressed compared to automated test input generation. Test oracles rely on a ground-truth that can distinguish between the correct and buggy behavior to determine whether a test fails (detects a bug) or passes. What makes the oracle problem challenging and undecidable is the assumption that the ground-truth should know the exact expected, correct, or buggy behavior. However, we argue that one can still build an accurate oracle without knowing the exact correct or buggy behavior, but how these two might differ. This paper presents SEER, a learning-based approach that in the absence of test assertions or other types of oracle, can determine whether a unit test passes or fails on a given method under test (MUT). To build the ground-truth, SEER jointly embeds unit tests and the implementation of MUTs into a unified vector space, in such a way that the neural representation of tests are similar to that of MUTs they pass on them, but dissimilar to MUTs they fail on them. The classifier built on top of this vector representation serves as the oracle to generate "fail" labels, when test inputs detect a bug in MUT or "pass" labels, otherwise. Our extensive experiments on applying SEER to more than 5K unit tests from a diverse set of open-source Java projects show that the produced oracle is (1) effective in predicting the fail or pass labels, achieving an overall accuracy, precision, recall, and F1 measure of 93%, 86%, 94%, and 90%, (2) generalizable, predicting the labels for the unit test of projects that were not in training or validation set with negligible performance drop, and (3) efficient, detecting the existence of bugs in only 6.5 milliseconds on average.Comment: Published in ESEC/FSE 202

arXiv.org e-Print Archive

Automated metamorphic testing of variability analysis tools

Author: Anand
Argelich
Benavides
Benavides
Biere
Brummayer
Chan
Chen
Cohen
Cook
Dubois
Eén
Gebser
Gebser
Jang
Kuo
Le Berre
Le Berre
Liu
Müller
Nie
Pipatsrisawat
Pohl
Segura
Segura
Segura
Segura
Stoiber
Svahnberg
Utting
Van Gelder
Wang
Weyuker
Wohlin
Xie
Publication venue: 'Wiley'
Publication date: 01/01/2014
Field of study

Variability determines the capability of software applications to be configured and customized. A common need during the development of variability–intensive systems is the automated analysis of their underlying variability models, e.g. detecting contradictory configuration options. The analysis operations that are performed on variability models are often very complex, which hinders the testing of the corresponding analysis tools and makes difficult, often infeasible, to determine the correctness of their outputs, i.e. the well–known oracle problem in software testing. In this article, we present a generic approach for the automated detection of faults in variability analysis tools overcoming the oracle problem. Our work enables the generation of random variability models together with the exact set of valid configurations represented by these models. These test data are generated from scratch using step–wise transformations and assuring that certain constraints (a.k.a. metamorphic relations) hold at each step. To show the feasibility and generalizability of our approach, it has been used to automatically test several analysis tools in three variability domains: feature models, CUDF documents and Boolean formulas. Among other results, we detected 19 real bugs in 7 out of the 15 tools under test.CICYT TIN2012-32273CICYT IPT-2012- 0890-390000Junta de Andalucía TIC-5906Junta de Andalucía P12-TIC- 186

idUS. Depósito de Investigación Universidad de Sevilla

Automating test oracles generation

Author: Goffi Alberto
Pezzè Mauro
Publication venue
Publication date: 16/05/2018
Field of study

Software systems play a more and more important role in our everyday life. Many relevant human activities nowadays involve the execution of a piece of software. Software has to be reliable to deliver the expected behavior, and assessing the quality of software is of primary importance to reduce the risk of runtime errors. Software testing is the most common quality assessing technique for software. Testing consists in running the system under test on a finite set of inputs, and checking the correctness of the results. Thoroughly testing a software system is expensive and requires a lot of manual work to define test inputs (stimuli used to trigger different software behaviors) and test oracles (the decision procedures checking the correctness of the results). Researchers have addressed the cost of testing by proposing techniques to automatically generate test inputs. While the generation of test inputs is well supported, there is no way to generate cost-effective test oracles: Existing techniques to produce test oracles are either too expensive to be applied in practice, or produce oracles with limited effectiveness that can only identify blatant failures like system crashes. Our intuition is that cost-effective test oracles can be generated using information produced as a byproduct of the normal development activities. The goal of this thesis is to create test oracles that can detect faults leading to semantic and non-trivial errors, and that are characterized by a reasonable generation cost. We propose two ways to generate test oracles, one derives oracles from the software redundancy and the other from the natural language comments that document the source code of software systems. We present a technique that exploits redundant sequences of method calls encoding the software redundancy to automatically generate test oracles named CCOracles. We describe how CCOracles are automatically generated, deployed, and executed. We prove the effectiveness of CCOracles by measuring their fault-finding effectiveness when combined with both automatically generated and hand-written test inputs. We also present Toradocu, a technique that derives executable specifications from Javadoc comments of Java constructors and methods. From such specifications, Toradocu generates test oracles that are then deployed into existing test suites to assess the outputs of given test inputs. We empirically evaluate Toradocu, showing that Toradocu accurately translates Javadoc comments into procedure specifications. We also show that Toradocu oracles effectively identify semantic faults in the SUT. CCOracles and Toradocu oracles stem from independent information sources and are complementary in the sense that they check different aspects of the system undertest

RERO DOC Digital Library

A Survey on Metamorphic Testing

Author: Ana B. Sanchez
Antonio Ruiz-Cortes
Gordon Fraser
Sergio Segura
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref

Metamorphic Testing for Software Libraries and Graphics Compilers

Author: Lascu Andrei
Publication venue: Computing, Imperial College London
Publication date: 01/04/2022
Field of study

Metamorphic Testing is a testing technique which mutates existing test cases in semantically equivalent forms, by making use of metamorphic relations, while avoiding the oracle problem. However, these required relations are not readily available for a given system under test. Defining effective metamorphic relations is difficult, and arguably the main obstacle towards adoption of metamorphic testing in production-level software development. One example application is testing graphics compilers, where the approximate and under-specified nature of the domain makes it hard to apply more traditional techniques. We propose an approach with a lower barrier of entry to applying metamorphic testing for a software library. The user must still identify relations that hold over their particular library, but can do so within a development-like environment. We apply methods from the domains of metamorphic testing and fuzzing to produce complex test cases. We consider the user interaction a bonus, as they can control what parts of the target codebase is tested, potentially focusing on less-tested or critical sections of the codebase. We implement our proposed approach in a tool, MF++, which synthesises C++ test cases for a C++ library, defined by user-provided ingredients. We applied MF++ to 7 libraries in the domains of satisfiability modulo theories and Presburger arithmetic,. Our evaluation of MF++ was able to identify 21 bugs in these tools. We additionally provide an automatic reducer for tests generated by MF++, named MF++R. In addition to minimising tests exposing issues, MF++R can also be used to identify incorrect user-provided relations. Additionally, we investigate the combined use of MF++ and MF++R in order to augment code coverage of library test suites. We assess the utility of this application by contributing 21 tests aimed at improving coverage across 3 libraries.Open Acces

Spiral - Imperial College Digital Repository

Automated inference of likely metamorphic relations for model transformations

Author: Ruiz Cortés Antonio
Segura Rueda Sergio
Troya Castilla Javier
Publication venue: 'Elsevier BV'
Publication date: 01/01/2018
Field of study

Model transformations play a cornerstone role in Model-Driven Engineering (MDE) as they provide the essential mechanisms for manipulating and transforming models. Checking whether the output of a model transformation is correct is a manual and errorprone task, referred to as the oracle problem. Metamorphic testing alleviates the oracle problem by exploiting the relations among di erent inputs and outputs of the program under test, so-called metamorphic relations (MRs). One of the main challenges in metamorphic testing is the automated inference of likely MRs. This paper proposes an approach to automatically infer likely MRs for ATL model transformations, where the tester does not need to have any knowledge of the transformation. The inferred MRs aim at detecting faults in model transformations in three application scenarios, namely regression testing, incremental transformations and migrations among transformation languages. In the experiments performed, the inferred likely MRs have proved to be quite accurate, with a precision of 96.4% from a total of 4101 true positives out of 4254 MRs inferred. Furthermore, they have been useful for identifying mutants in regression testing scenarios, with a mutation score of 93.3%. Finally, our approach can be used in conjunction with current approaches for the automatic generation of test cases.Comisión Interministerial de Ciencia y Tecnología TIN2015-70560-RJunta de Andalucía P12-TIC-186

idUS. Depósito de Investigación Universidad de Sevilla