257 research outputs found
DSpot: Test Amplification for Automatic Assessment of Computational Diversity
Context: Computational diversity, i.e., the presence of a set of programs
that all perform compatible services but that exhibit behavioral differences
under certain conditions, is essential for fault tolerance and security.
Objective: We aim at proposing an approach for automatically assessing the
presence of computational diversity. In this work, computationally diverse
variants are defined as (i) sharing the same API, (ii) behaving the same
according to an input-output based specification (a test-suite) and (iii)
exhibiting observable differences when they run outside the specified input
space. Method: Our technique relies on test amplification. We propose source
code transformations on test cases to explore the input domain and
systematically sense the observation domain. We quantify computational
diversity as the dissimilarity between observations on inputs that are outside
the specified domain. Results: We run our experiments on 472 variants of 7
classes from open-source, large and thoroughly tested Java classes. Our test
amplification multiplies by ten the number of input points in the test suite
and is effective at detecting software diversity. Conclusion: The key insights
of this study are: the systematic exploration of the observable output space of
a class provides new insights about its degree of encapsulation; the behavioral
diversity that we observe originates from areas of the code that are
characterized by their flexibility (caching, checking, formatting, etc.).Comment: 12 page
Fault Revealing Test Oracles, Are We There Yet? Evaluating The Effectiveness Of Automatically Generated Test Oracles On Manually-Written And Automatically Generated Unit Tests
Tese de mestrado, Engenharia InformĂĄtica, 2022, Universidade de Lisboa, Faculdade de CiĂȘnciasAutomated test suite generation tools have been used in real development scenarios
and proven to be able to detect real faults. These tools, however, do not know the expected
behavior of the system and generate tests that execute the faulty behavior, but fail to identify the fault due to poor test oracles. To solve this problem, researchers have developed
several approaches to automatically generate test oracles that resemble manually-written
ones. However, there remain some questions regarding the use of these tools in real development scenarios. In particular, how effective are automatically generated test oracles
at revealing real faults? How long do these tools require to generate an oracle?
To answer these questions, we applied a recent and promising test oracle generation
approach (T5) to all fault-revealing test cases in the DEFECTS4J collection and investigated how effective are the generated test oracles at detecting real faults as well as the
time required by the tool to generate them;
Our results show that: (1) out-of-the-box, oracles generated by T5 do not compile; (2)
after a simple procedure, out of the 1696 test oracles, only 466 compile and 58 of them
manage to correctly identify the fault; (3) when considering the 835 bugs in DEFECTS4J,
T5 was able to detect 27, i.e., 3.23% of the bugs. Moreover, T5 required, on average,
401.3 seconds to generate a test oracle.
The approaches and datasets presented in this thesis bring automated test oracle generation one step closer to being used in real software, by providing insight into current
problems of several tools as well as introducing a way to test automated test oracle generation tools that are being developed regarding their effectiveness on detecting real software
faults
Towards Automatic Generation of Amplified Regression Test Oracles
Regression testing is crucial in ensuring that pure code refactoring does not
adversely affect existing software functionality, but it can be expensive,
accounting for half the cost of software maintenance. Automated test case
generation reduces effort but may generate weak test suites. Test amplification
is a promising solution that enhances tests by generating additional or
improving existing ones, increasing test coverage, but it faces the test oracle
problem. To address this, we propose a test oracle derivation approach that
uses object state data produced during System Under Test (SUT) test execution
to amplify regression test oracles. The approach monitors the object state
during test execution and compares it to the previous version to detect any
changes in relation to the SUT's intended behaviour. Our preliminary evaluation
shows that the proposed approach can enhance the detection of behaviour changes
substantially, providing initial evidence of its effectiveness.Comment: 8 pages, 1 figur
The Oracle Problem in Software Testing: A Survey
Testing involves examining the behaviour of a system in order to discover potential faults. Given an input for a system, the challenge of distinguishing the corresponding desired, correct behaviour from potentially incorrect behavior is called the âtest oracle problemâ. Test oracle automation is important to remove a current bottleneck that inhibits greater overall test automation. Without test oracle automation, the human has to determine whether observed behaviour is correct. The literature on test oracles has introduced techniques for oracle automation, including modelling, specifications, contract-driven development and metamorphic testing. When none of these is completely adequate, the final source of test oracle information remains the human, who may be aware of informal specifications, expectations, norms and domain specific information that provide informal oracle guidance. All forms of test oracles, even the humble human, involve challenges of reducing cost and increasing benefit. This paper provides a comprehensive survey of current approaches to the test oracle problem and an analysis of trends in this important area of software testing research and practice
Test Advising Framework
Test cases are represented in various formats depending on the process, the technique or the tool used to generate the tests. While different test case representations are necessary, this diversity challenges us in comparing test cases and leveraging strengths among them - a common test representation will help.
In this thesis, we define a new Test Case Language (TCL) that can be used to represent test cases that vary in structure and are generated by multiple test generation frameworks. We also present a methodology for transforming test cases of varying representations into a common format where they can be matched and analyzed. With the common representation in our test case description language, we define five advice functions to leverage the testing strength from one type of tests to improve the effectiveness of other type(s) of tests. These advice functions analyze test input values, method call sequences, or test oracles of one source test suite to derive advice, and utilize the advice to amplify the effectiveness of an original test suite. Our assessment shows that the amplified test suite derived from the advice functions has improved values in terms of code coverage and mutant kill score compared to the original test suite before the advice functions applied
Automated Unit Testing of Evolving Software
As software programs evolve, developers need to ensure that new changes do
not affect the originally intended functionality of the program. To increase their
confidence, developers commonly write unit tests along with the program, and
execute them after a change is made. However, manually writing these unit-tests
is difficult and time-consuming, and as their number increases, so does the cost
of executing and maintaining them.
Automated test generation techniques have been proposed in the literature
to assist developers in the endeavour of writing these tests. However, it remains
an open question how well these tools can help with fault finding in practice,
and maintaining these automatically generated tests may require extra effort
compared to human written ones.
This thesis evaluates the effectiveness of a number of existing automatic
unit test generation techniques at detecting real faults, and explores how these
techniques can be improved. In particular, we present a novel multi-objective
search-based approach for generating tests that reveal changes across two versions
of a program. We then investigate whether these tests can be used such that no
maintenance effort is necessary.
Our results show that overall, state-of-the-art test generation tools can indeed
be effective at detecting real faults: collectively, the tools revealed more than half
of the bugs we studied. We also show that our proposed alternative technique
that is better suited to the problem of revealing changes, can detect more faults,
and does so more frequently. However, we also find that for a majority of
object-oriented programs, even a random search can achieve good results. Finally, we
show that such change-revealing tests can be generated on demand in practice,
without requiring them to be maintained over time
Large Language Models for Software Engineering: Survey and Open Problems
This paper provides a survey of the emerging area of Large Language Models
(LLMs) for Software Engineering (SE). It also sets out open research challenges
for the application of LLMs to technical problems faced by software engineers.
LLMs' emergent properties bring novelty and creativity with applications right
across the spectrum of Software Engineering activities including coding,
design, requirements, repair, refactoring, performance improvement,
documentation and analytics. However, these very same emergent properties also
pose significant technical challenges; we need techniques that can reliably
weed out incorrect solutions, such as hallucinations. Our survey reveals the
pivotal role that hybrid techniques (traditional SE plus LLMs) have to play in
the development and deployment of reliable, efficient and effective LLM-based
SE
Automated Software Transplantation
Automated program repair has excited researchers for more than a decade, yet it has yet to find full scale deployment in industry. We report our experience with SAPFIX: the first deployment of automated end-to-end fault fixing, from test case design through to deployed repairs in production code. We have used SAPFIX at Facebook to repair 6 production systems, each consisting of tens of millions of lines of code, and which are collectively used by hundreds of millions of people worldwide. In its first three months of operation, SAPFIX produced 55 repair candidates for 57 crashes reported to SAPFIX, of which 27 have been deem as correct by developers and 14 have been landed into production automatically by SAPFIX. SAPFIX has thus demonstrated the potential of the search-based repair research agenda by deploying, to hundreds of millions of users worldwide, software systems that have been automatically tested and repaired. Automated software transplantation (autotransplantation) is a form of automated software engineering, where we use search based software engineering to be able to automatically move a functionality of interest from a âdonorâ program that implements it into a âhostâ program that lacks it. Autotransplantation is a kind of automated program repair where we repair the âhostâ program by augmenting it with the missing functionality. Automated software transplantation would open many exciting avenues for software development: suppose we could autotransplant code from one system into another, entirely unrelated, system, potentially written in a different programming language. Being able to do so might greatly enhance the software engineering practice, while reducing the costs. Automated software transplantation manifests in two different flavors: monolingual, when the languages of the host and donor programs is the same, or multilingual when the languages differ. This thesis introduces a theory of automated software transplantation, and two algorithms implemented in two tools that achieve this: ”SCALPEL for monolingual software transplantation and ÏSCALPEL for multilingual software transplantation. Leveraging lightweight annotation, program analysis identifies an organ (interesting behavior to transplant); testing validates that the organ exhibits the desired behavior during its extraction and after its implantation into a host. We report encouraging results: in 14 of 17 monolingual transplantation experiments involving 6 donors and 4 hosts, popular real-world systems, we successfully autotransplanted 6 new functionalities; and in 10 out of 10 multlingual transplantation experiments involving 10 donors and 10 hosts, popular real-world systems written in 4 different programming languages, we successfully autotransplanted 10 new functionalities. That is, we have passed all the test suites that validates the new functionalities behaviour and the fact that the initial program behaviour is preserved. Additionally, we have manually checked the behaviour exercised by the organ. Autotransplantation is also very useful: in just 26 hours computation time we successfully autotransplanted the H.264 video encoding functionality from the x264 system to the VLC media player, a task that is currently done manually by the developers of VLC, since 12 years ago. We autotransplanted call graph generation and indentation for C programs into Kate, (a popular KDE based test editor used as an IDE by a lot of C developers) two features currently missing from Kate, but requested by the users of Kate. Autotransplantation is also efficient: the total runtime across 15 monolingual transplants is 5 hours and a half; the total runtime across 10 multilingual transplants is 33 hours
- âŠ