57 research outputs found
JUGE: An Infrastructure for Benchmarking Java Unit Test Generators
Researchers and practitioners have designed and implemented various automated
test case generators to support effective software testing. Such generators
exist for various languages (e.g., Java, C#, or Python) and for various
platforms (e.g., desktop, web, or mobile applications). Such generators exhibit
varying effectiveness and efficiency, depending on the testing goals they aim
to satisfy (e.g., unit-testing of libraries vs. system-testing of entire
applications) and the underlying techniques they implement. In this context,
practitioners need to be able to compare different generators to identify the
most suited one for their requirements, while researchers seek to identify
future research directions. This can be achieved through the systematic
execution of large-scale evaluations of different generators. However, the
execution of such empirical evaluations is not trivial and requires a
substantial effort to collect benchmarks, setup the evaluation infrastructure,
and collect and analyse the results. In this paper, we present our JUnit
Generation benchmarking infrastructure (JUGE) supporting generators (e.g.,
search-based, random-based, symbolic execution, etc.) seeking to automate the
production of unit tests for various purposes (e.g., validation, regression
testing, fault localization, etc.). The primary goal is to reduce the overall
effort, ease the comparison of several generators, and enhance the knowledge
transfer between academia and industry by standardizing the evaluation and
comparison process. Since 2013, eight editions of a unit testing tool
competition, co-located with the Search-Based Software Testing Workshop, have
taken place and used and updated JUGE. As a result, an increasing amount of
tools (over ten) from both academia and industry have been evaluated on JUGE,
matured over the years, and allowed the identification of future research
directions
Recovering fitness gradients for interprocedural Boolean flags in search-based testing
National Research Foundation (NRF) Singapore under Corp Lab @ University scheme; National Research Foundation (NRF) Singapore under its NSoE Programm
Instance Space Analysis of Search-Based Software Testing
Search-based software testing (SBST) is now a mature area, with numerous
techniques developed to tackle the challenging task of software testing. SBST
techniques have shown promising results and have been successfully applied in
the industry to automatically generate test cases for large and complex
software systems. Their effectiveness, however, is problem-dependent. In this
paper, we revisit the problem of objective performance evaluation of SBST
techniques considering recent methodological advances -- in the form of
Instance Space Analysis (ISA) -- enabling the strengths and weaknesses of SBST
techniques to be visualized and assessed across the broadest possible space of
problem instances (software classes) from common benchmark datasets. We
identify features of SBST problems that explain why a particular instance is
hard for an SBST technique, reveal areas of hard and easy problems in the
instance space of existing benchmark datasets, and identify the strengths and
weaknesses of state-of-the-art SBST techniques. In addition, we examine the
diversity and quality of common benchmark datasets used in experimental
evaluations
Private API Access and Functional Mocking in Automated Unit Test Generation
Not all object oriented code is easily testable: Dependency objects might be difficult or even impossible to instantiate, and object-oriented encapsulation makes testing potentially simple code difficult if it cannot easily be accessed. When this happens, then developers can resort to mock objects that simulate the complex dependencies, or circumvent object-oriented encapsulation and access private APIs directly through the use of, for example, Java reflection. Can automated unit test generation benefit from these techniques as well? In this paper we investigate this question by extending the EvoSuite unit test generation tool with the ability to directly access private APIs and to create mock objects using the popular Mockito framework. However, care needs to be taken that this does not impact the usefulness of the generated tests: For example, a test accessing a private field could later fail if that field is renamed, even if that renaming is part of a semantics-preserving refactoring. Such a failure would not be revealing a true regression bug, but is a false positive, which wastes the developer's time for investigating and fixing the test. Our experiments on the SF110 and Defects4J benchmarks confirm the anticipated improvements in terms of code coverage and bug finding, but also confirm the existence of false positives. However, by ensuring the test generator only uses mocking and reflection if there is no other way to reach some part of the code, their number remains small
InterEvo-TR: Interactive Evolutionary Test Generation With Readability Assessment
Automated test case generation has proven to be useful to reduce the usually
high expenses of software testing. However, several studies have also noted the
skepticism of testers regarding the comprehension of generated test suites when
compared to manually designed ones. This fact suggests that involving testers
in the test generation process could be helpful to increase their acceptance of
automatically-produced test suites. In this paper, we propose incorporating
interactive readability assessments made by a tester into EvoSuite, a
widely-known evolutionary test generation tool. Our approach, InterEvo-TR,
interacts with the tester at different moments during the search and shows
different test cases covering the same coverage target for their subjective
evaluation. The design of such an interactive approach involves a schedule of
interaction, a method to diversify the selected targets, a plan to save and
handle the readability values, and some mechanisms to customize the level of
engagement in the revision, among other aspects. To analyze the potential and
practicability of our proposal, we conduct a controlled experiment in which 39
participants, including academics, professional developers, and student
collaborators, interact with InterEvo-TR. Our results show that the strategy to
select and present intermediate results is effective for the purpose of
readability assessment. Furthermore, the participants' actions and responses to
a questionnaire allowed us to analyze the aspects influencing test code
readability and the benefits and limitations of an interactive approach in the
context of test case generation, paving the way for future developments based
on interactivity.Comment: 17 pages, 10 figures, 5 tables, journal pape
Coverage Goal Selector for Combining Multiple Criteria in Search-Based Unit Test Generation
Unit testing is critical to the software development process, ensuring the
correctness of basic programming units in a program (e.g., a method).
Search-based software testing (SBST) is an automated approach to generating
test cases. SBST generates test cases with genetic algorithms by specifying the
coverage criterion (e.g., branch coverage). However, a good test suite must
have different properties, which cannot be captured using an individual
coverage criterion. Therefore, the state-of-the-art approach combines multiple
criteria to generate test cases. Since combining multiple coverage criteria
brings multiple objectives for optimization, it hurts the test suites' coverage
for certain criteria compared with using the single criterion. To cope with
this problem, we propose a novel approach named \textbf{smart selection}. Based
on the coverage correlations among criteria and the subsumption relationships
among coverage goals, smart selection selects a subset of coverage goals to
reduce the number of optimization objectives and avoid missing any properties
of all criteria. We conduct experiments to evaluate smart selection on
Java classes with three state-of-the-art genetic algorithms under the
-minute budget. On average, smart selection outperforms combining all goals
on of the classes having significant differences between the two
approaches. Secondly, we conduct experiments to verify our assumptions about
coverage criteria relationships. Furthermore, we experiment with different
budgets of , , and minutes, confirming the advantage of smart
selection over combining all goals.Comment: arXiv admin note: substantial text overlap with arXiv:2208.0409
INTEREVO-TR: Interactive Evolutionary Test Generation with Readability Assessment
Automated test case generation has proven to be useful to reduce the usually high expenses of software testing. However, several studies have also noted the skepticism of testers regarding the comprehension of generated test suites when compared to manually designed ones. This fact suggests that involving testers in the test generation process could be helpful to increase their acceptance of automatically-produced test suites. In this paper, we propose incorporating interactive readability assessments made by a tester into EvoSuite, a widely-known evolutionary test generation tool. Our approach, InterEvo-TR, interacts with the tester at different moments during the search and shows different test cases covering the same coverage target for their subjective evaluation. The design of such an interactive approach involves a schedule of interaction, a method to diversify the selected targets, a plan to save and handle the readability values, and some mechanisms to customize the level of engagement in the revision, among other aspects. To analyze the potential and practicability of our proposal, we conduct a controlled experiment in which 39 participants, including academics, professional developers, and student collaborators, interact with InterEvo-TR. Our results show that the strategy to select and present intermediate results is effective for the purpose of readability assessment. Furthermore, the participants' actions and responses to a questionnaire allowed us to analyze the aspects influencing test code readability and the benefits and limitations of an interactive approach in the context of test case generation, paving the way for future developments based on interactivity
Ant colony optimization for object-oriented unit test generation
Generating useful unit tests for object-oriented programs is difficult for traditional optimization methods. One not only needs to identify values to be used as inputs, but also synthesize a program which creates the required state in the program under test. Many existing Automated Test Generation (ATG) approaches combine search with performance-enhancing heuristics. We present Tiered Ant Colony Optimization (Taco) for generating unit tests for object-oriented programs. The algorithm is formed of three Tiers of ACO, each of which tackles a distinct task: goal prioritization, test program synthesis, and data generation for the synthesised program. Test program synthesis allows the creation of complex objects, and exploration of program state, which is the breakthrough that has allowed the successful application of ACO to object-oriented test generation. Taco brings the mature search ecosystem of ACO to bear on ATG for complex object-oriented programs, providing a viable alternative to current approaches. To demonstrate the effectiveness of Taco, we have developed a proof-of-concept tool which successfully generated tests for an average of 54% of the methods in 170 Java classes, a result competitive with industry standard Randoop
.NET/C# instrumentation for search-based software testing
C# is one of the most widely used programming languages. However, to the best of our knowledge, there has been no work in the literature aimed at enabling search-based software testing techniques for applications running on the .NET platform, like the ones written in C#. In this paper, we propose a search-based approach and an open source tool to enable white-box testing for C# applications. The approach is integrated with a .NET bytecode instrumentation, in order to collect code coverage at runtime during the search. In addition, by taking advantage of Branch Distance, we define heuristics to better guide the search, e.g., how heuristically close it is to cover a branch in the source code. To empirically evaluate our technique, we integrated our tool into the EvoMaster test generation tool and conducted experiments on three .NET RESTful APIs as case studies. Results show that our technique significantly outperforms gray-box testing tools in terms of code coverage.publishedVersio
Automatic generation of smell-free unit tests
Tese de mestrado, Engenharia InformĂĄtica, 2022, Universidade de Lisboa, Faculdade de CiĂȘnciasAutomated test generation tools (such as EvoSuite) typically aim to maximize code
coverage. However, they frequently disregard non-coverage aspects that can be relevant
for testers, such as the quality of the generated tests. Therefore, automatically generated
tests are often affected by a set of test-specific bad programming practices that may hinder
the quality of both test and production code, i.e., test smells. Given that other researchers
have successfully integrated non-coverage quality metrics into EvoSuite, we decided to
extend the EvoSuite tool such that the generated test code is smell-free. To this aim, we
compiled 54 test smells from several sources and selected 16 smells that are relevant to the
context of this work. We then augmented the tool with the respective test smell metrics
and investigated the diffusion of the selected smells and the distribution of the metrics.
Finally, we implemented an approach to optimize the test smell metrics as secondary
criteria. After establishing the optimal configuration to optimize as secondary criteria
(which we used throughout the remainder of the study), we conducted an empirical study
to assess whether the tests became significantly less smelly. Furthermore, we studied
how the proposed metrics affect the fault detection effectiveness, coverage, and size of
the generated tests. Our study revealed that the proposed approach reduces the overall
smelliness of the generated tests; in particular, the diffusion of the âIndirect Testingâ and
âUnrelated Assertionsâ smells improved considerably. Moreover, our approach improved
the smelliness of the tests generated by EvoSuite without compromising the code coverage
or fault detection effectiveness. The size and length of the generated tests were also not
affected by the new secondary criteria
- âŠ