Search CORE

61 research outputs found

JUGE: An Infrastructure for Benchmarking Java Unit Test Generators

Author: Devroey Xavier
Galeotti Juan Pablo
Gambi Alessio
Just René
Kifetew Fitsum
Panichella Annibale
Panichella Sebastiano
Publication venue
Publication date: 14/06/2021
Field of study

Researchers and practitioners have designed and implemented various automated test case generators to support effective software testing. Such generators exist for various languages (e.g., Java, C#, or Python) and for various platforms (e.g., desktop, web, or mobile applications). Such generators exhibit varying effectiveness and efficiency, depending on the testing goals they aim to satisfy (e.g., unit-testing of libraries vs. system-testing of entire applications) and the underlying techniques they implement. In this context, practitioners need to be able to compare different generators to identify the most suited one for their requirements, while researchers seek to identify future research directions. This can be achieved through the systematic execution of large-scale evaluations of different generators. However, the execution of such empirical evaluations is not trivial and requires a substantial effort to collect benchmarks, setup the evaluation infrastructure, and collect and analyse the results. In this paper, we present our JUnit Generation benchmarking infrastructure (JUGE) supporting generators (e.g., search-based, random-based, symbolic execution, etc.) seeking to automate the production of unit tests for various purposes (e.g., validation, regression testing, fault localization, etc.). The primary goal is to reduce the overall effort, ease the comparison of several generators, and enhance the knowledge transfer between academia and industry by standardizing the evaluation and comparison process. Since 2013, eight editions of a unit testing tool competition, co-located with the Search-Based Software Testing Workshop, have taken place and used and updated JUGE. As a result, an increasing amount of tools (over ten) from both academia and industry have been evaluated on JUGE, matured over the years, and allowed the identification of future research directions

arXiv.org e-Print Archive

Archivio della ricerca - Fondazione Bruno Kessler

ZHAW digitalcollection

Repository of the University of Namur

EvoSuite at the SBST 2016 Tool Competition

Author: Arcuri A.
Fraser G.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2016
Field of study

EvoSuite is a search-based tool that automatically generates unit tests for Java code. This paper summarizes the results and experiences of EvoSuite's participation at the fourth unit testing competition at SBST 2016, where Evo-Suite achieved the highest overall score

Crossref

White Rose Research Online

Open Repository and Bibliography - Luxembourg

Recovering fitness gradients for interprocedural Boolean flags in search-based testing

Author: Dong Naipeng
Dong Naipeng
Dong Z.
Dong Z.
Haijun Wang
Harman Mark
Lin Yun
Panichella A.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/07/2020
Field of study

National Research Foundation (NRF) Singapore under Corp Lab @ University scheme; National Research Foundation (NRF) Singapore under its NSoE Programm

Crossref

Institutional Knowledge at Singapore Management University

Instance Space Analysis of Search-Based Software Testing

Author: Aleti Aldeida
Munoz Mario Andres
Neelofar Neelofar
Smith-Miles Kate
Publication venue
Publication date: 04/12/2023
Field of study

Search-based software testing (SBST) is now a mature area, with numerous techniques developed to tackle the challenging task of software testing. SBST techniques have shown promising results and have been successfully applied in the industry to automatically generate test cases for large and complex software systems. Their effectiveness, however, is problem-dependent. In this paper, we revisit the problem of objective performance evaluation of SBST techniques considering recent methodological advances -- in the form of Instance Space Analysis (ISA) -- enabling the strengths and weaknesses of SBST techniques to be visualized and assessed across the broadest possible space of problem instances (software classes) from common benchmark datasets. We identify features of SBST problems that explain why a particular instance is hard for an SBST technique, reveal areas of hard and easy problems in the instance space of existing benchmark datasets, and identify the strengths and weaknesses of state-of-the-art SBST techniques. In addition, we examine the diversity and quality of common benchmark datasets used in experimental evaluations

arXiv.org e-Print Archive

Private API Access and Functional Mocking in Automated Unit Test Generation

Author: Arcuri A.
Fraser G.
Just R.
Publication venue: IEEE
Publication date: 01/01/2017
Field of study

Not all object oriented code is easily testable: Dependency objects might be difficult or even impossible to instantiate, and object-oriented encapsulation makes testing potentially simple code difficult if it cannot easily be accessed. When this happens, then developers can resort to mock objects that simulate the complex dependencies, or circumvent object-oriented encapsulation and access private APIs directly through the use of, for example, Java reflection. Can automated unit test generation benefit from these techniques as well? In this paper we investigate this question by extending the EvoSuite unit test generation tool with the ability to directly access private APIs and to create mock objects using the popular Mockito framework. However, care needs to be taken that this does not impact the usefulness of the generated tests: For example, a test accessing a private field could later fail if that field is renamed, even if that renaming is part of a semantics-preserving refactoring. Such a failure would not be revealing a true regression bug, but is a false positive, which wastes the developer's time for investigating and fixing the test. Our experiments on the SF110 and Defects4J benchmarks confirm the anticipated improvements in terms of code coverage and bug finding, but also confirm the existence of false positives. However, by ensuring the test generator only uses mocking and reflection if there is no other way to reach some part of the code, their number remains small

Crossref

White Rose Research Online

Open Repository and Bibliography - Luxembourg

InterEvo-TR: Interactive Evolutionary Test Generation With Readability Assessment

Author: Delgado-Pérez Pedro
Medina-Bulo Inmaculada
Ramírez Aurora
Romero José Raúl
Valle-Gómez Kevin J.
Publication venue
Publication date: 13/01/2024
Field of study

Automated test case generation has proven to be useful to reduce the usually high expenses of software testing. However, several studies have also noted the skepticism of testers regarding the comprehension of generated test suites when compared to manually designed ones. This fact suggests that involving testers in the test generation process could be helpful to increase their acceptance of automatically-produced test suites. In this paper, we propose incorporating interactive readability assessments made by a tester into EvoSuite, a widely-known evolutionary test generation tool. Our approach, InterEvo-TR, interacts with the tester at different moments during the search and shows different test cases covering the same coverage target for their subjective evaluation. The design of such an interactive approach involves a schedule of interaction, a method to diversify the selected targets, a plan to save and handle the readability values, and some mechanisms to customize the level of engagement in the revision, among other aspects. To analyze the potential and practicability of our proposal, we conduct a controlled experiment in which 39 participants, including academics, professional developers, and student collaborators, interact with InterEvo-TR. Our results show that the strategy to select and present intermediate results is effective for the purpose of readability assessment. Furthermore, the participants' actions and responses to a questionnaire allowed us to analyze the aspects influencing test code readability and the benefits and limitations of an interactive approach in the context of test case generation, paving the way for future developments based on interactivity.Comment: 17 pages, 10 figures, 5 tables, journal pape

arXiv.org e-Print Archive

INTEREVO-TR: Interactive Evolutionary Test Generation with Readability Assessment

Author: Delgado Pérez Pedro
Medina Bulo María Inmaculada
Ramírez Aurora
Romero José Raúl
Valle Gómez Kevin Jesús
Publication venue: IEEE Xplore
Publication date: 01/12/2022
Field of study

Repositorio de Objetos de Docencia e Investigación de la Universidad de Cádiz

Automatic generation of smell-free unit tests

Author: Afonso João Gonçalo Balsinha
Publication venue
Publication date: 01/01/2022
Field of study

Tese de mestrado, Engenharia Informática, 2022, Universidade de Lisboa, Faculdade de CiênciasAutomated test generation tools (such as EvoSuite) typically aim to maximize code coverage. However, they frequently disregard non-coverage aspects that can be relevant for testers, such as the quality of the generated tests. Therefore, automatically generated tests are often affected by a set of test-specific bad programming practices that may hinder the quality of both test and production code, i.e., test smells. Given that other researchers have successfully integrated non-coverage quality metrics into EvoSuite, we decided to extend the EvoSuite tool such that the generated test code is smell-free. To this aim, we compiled 54 test smells from several sources and selected 16 smells that are relevant to the context of this work. We then augmented the tool with the respective test smell metrics and investigated the diffusion of the selected smells and the distribution of the metrics. Finally, we implemented an approach to optimize the test smell metrics as secondary criteria. After establishing the optimal configuration to optimize as secondary criteria (which we used throughout the remainder of the study), we conducted an empirical study to assess whether the tests became significantly less smelly. Furthermore, we studied how the proposed metrics affect the fault detection effectiveness, coverage, and size of the generated tests. Our study revealed that the proposed approach reduces the overall smelliness of the generated tests; in particular, the diffusion of the “Indirect Testing” and “Unrelated Assertions” smells improved considerably. Moreover, our approach improved the smelliness of the tests generated by EvoSuite without compromising the code coverage or fault detection effectiveness. The size and length of the generated tests were also not affected by the new secondary criteria

Universidade de Lisboa: Repositório.UL

Unit Test Generation During Software Development: EvoSuite Plugins for Maven, IntelliJ and Jenkins

Author: Arcuri A.
Campos J.
Fraser G.
Publication venue: Institute of Electrical and Electronics Engineers
Publication date: 21/07/2016
Field of study

Different techniques to automatically generate unit tests for object oriented classes have been proposed, but how to integrate these tools into the daily activities of software development is a little investigated question. In this paper, we report on our experience in supporting industrial partners in introducing the EVOSUITE automated JUnit test generation tool in their software development processes. The first step consisted of providing a plugin to the Apache Maven build infrastructure. The move from a research-oriented point-and-click tool to an automated step of the build process has implications on how developers interact with the tool and generated tests, and therefore, we produced a plugin for the popular IntelliJ Integrated Development Environment (IDE). As build automation is a core component of Continuous Integration (CI), we provide a further plugin to the Jenkins CI system, which allows developers to monitor the results of EVOSUITE and integrate generated tests in their source tree. In this paper, we discuss the resulting architecture of the plugins, and the challenges arising when building such plugins. Although the plugins described are targeted for the EVOSUITE tool, they can be adapted and their architecture can be reused for other test generation tools as well

Biblioteca Digital de la Comunidad de Madrid

White Rose Research Online

Basic block coverage for search-based unit testing and crash reproduction

Author: Derakhshanfar Pouria
Devroey Xavier
Zaidman Andy
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2022
Field of study

Search-based techniques have been widely used for white-box test generation. Many of these approaches rely on the approach level and branch distance heuristics to guide the search process and generate test cases with high line and branch coverage. Despite the positive results achieved by these two heuristics, they only use the information related to the coverage of explicit branches (e.g., indicated by conditional and loop statements), but ignore potential implicit branchings within basic blocks of code. If such implicit branching happens at runtime (e.g., if an exception is thrown in a branchless-method), the existing fitness functions cannot guide the search process. To address this issue, we introduce a new secondary objective, called Basic Block Coverage (BBC), which takes into account the coverage level of relevant basic blocks in the control flow graph. We evaluated the impact of BBC on search-based unit test generation (using the DynaMOSA algorithm) and search-based crash reproduction (using the STDistance and WeightedSum fitness functions). Our results show that for unit test generation, BBC improves the branch coverage of the generated tests. Although small (∼ 1.5%), this improvement in the branch coverage is systematic and leads to an increase of the output domain coverage and implicit runtime exception coverage, and of the diversity of runtime states. In terms of crash reproduction, in the combination of STDistance and WeightedSum, BBC helps in reproducing 3 new crashes for each fitness function. BBC significantly decreases the time required to reproduce 43.5% and 45.1% of the crashes using STDistance and WeightedSum, respectively. For these crashes, BBC reduces the consumed time by 71.7% (for STDistance) and 68.7% (for WeightedSum) on average.Software Engineerin

arXiv.org e-Print Archive

TU Delft Repository

Repository of the University of Namur