11 research outputs found
EvoSuite at the SBST 2016 Tool Competition
EvoSuite is a search-based tool that automatically generates unit tests for Java code. This paper summarizes the results and experiences of EvoSuite's participation at the fourth unit testing competition at SBST 2016, where Evo-Suite achieved the highest overall score
Метод генерации тестовых данных по исходному коду Java программ
Цель предлагаемого метода – повышение эффективности автоматической генерации и минимизации множества тестовых данных для обеспечения покрытия исходного кода Java программ. Рассматриваются различные виды покрытия, способы абстрактной интерпретации и редукции пространства поиска. В основу метода положены формальные методы анализа поведения модели.The objective of proposed method is to increase efficiency of automatic generation and minimization of test data set needed to guarantee coverage of source code of Java programs. Different kinds of coverage, methods of abstract interpretation and state-space reduction are discussed. The basis of the proposed method is formal methods of model behavior analysis
Software testing or the bugs’ nightmare
Software development is not error-free. For decades, bugs –including physical ones– have become a significant development problem requiring major maintenance efforts. Even in some cases, solving bugs led to increment them. One of the main reasons for bug’s prominence is their ability to hide. Finding them is difficult and costly in terms of time and resources. However, software testing made significant progress identifying them by using different strategies that combine knowledge from every single part of the program. This paper humbly reviews some different approaches from software testing that discover bugs automatically and presents some different state-of-the-art methods and tools currently used in this area. It covers three testing strategies: search-based methods, symbolic execution, and fuzzers. It also provides some income about the application of diversity in these areas, and common and future challenges on automatic test generation that still need to be addressed
JUGE: An Infrastructure for Benchmarking Java Unit Test Generators
Researchers and practitioners have designed and implemented various automated
test case generators to support effective software testing. Such generators
exist for various languages (e.g., Java, C#, or Python) and for various
platforms (e.g., desktop, web, or mobile applications). Such generators exhibit
varying effectiveness and efficiency, depending on the testing goals they aim
to satisfy (e.g., unit-testing of libraries vs. system-testing of entire
applications) and the underlying techniques they implement. In this context,
practitioners need to be able to compare different generators to identify the
most suited one for their requirements, while researchers seek to identify
future research directions. This can be achieved through the systematic
execution of large-scale evaluations of different generators. However, the
execution of such empirical evaluations is not trivial and requires a
substantial effort to collect benchmarks, setup the evaluation infrastructure,
and collect and analyse the results. In this paper, we present our JUnit
Generation benchmarking infrastructure (JUGE) supporting generators (e.g.,
search-based, random-based, symbolic execution, etc.) seeking to automate the
production of unit tests for various purposes (e.g., validation, regression
testing, fault localization, etc.). The primary goal is to reduce the overall
effort, ease the comparison of several generators, and enhance the knowledge
transfer between academia and industry by standardizing the evaluation and
comparison process. Since 2013, eight editions of a unit testing tool
competition, co-located with the Search-Based Software Testing Workshop, have
taken place and used and updated JUGE. As a result, an increasing amount of
tools (over ten) from both academia and industry have been evaluated on JUGE,
matured over the years, and allowed the identification of future research
directions
Automated Test Case Generation Using Code Models and Domain Adaptation
State-of-the-art automated test generation techniques, such as search-based
testing, are usually ignorant about what a developer would create as a test
case. Therefore, they typically create tests that are not human-readable and
may not necessarily detect all types of complex bugs developer-written tests
would do. In this study, we leverage Transformer-based code models to generate
unit tests that can complement search-based test generation. Specifically, we
use CodeT5, i.e., a state-of-the-art large code model, and fine-tune it on the
test generation downstream task. For our analysis, we use the Methods2test
dataset for fine-tuning CodeT5 and Defects4j for project-level domain
adaptation and evaluation. The main contribution of this study is proposing a
fully automated testing framework that leverages developer-written tests and
available code models to generate compilable, human-readable unit tests.
Results show that our approach can generate new test cases that cover lines
that were not covered by developer-written tests. Using domain adaptation, we
can also increase line coverage of the model-generated unit tests by 49.9% and
54% in terms of mean and median (compared to the model without domain
adaptation). We can also use our framework as a complementary solution
alongside common search-based methods to increase the overall coverage with
mean and median of 25.3% and 6.3%. It can also increase the mutation score of
search-based methods by killing extra mutants (up to 64 new mutants were killed
per project in our experiments).Comment: 10 pages + referenc
Improving Readability in Automatic Unit Test Generation
In object-oriented programming, quality assurance is commonly provided through writing unit tests, to exercise the operations of each class. If unit tests are created and maintained manually, this can be a time-consuming and laborious task. For this reason, automatic methods are often used to generate tests that seek to cover all paths of the tested code. Search may be guided by criteria that are opaque to the programmer, resulting in test sequences that are long and confusing.
This has a negative impact on test maintenance. Once tests have been created, the job is not done: programmers need to reason about the tests throughout the lifecycle, as the tested software units evolve. Maintenance includes diagnosing failing tests (whether due to a software fault or an invalid test) and preserving test oracles (ensuring that checked assertions are still relevant). Programmers also need to understand the tests created for code that they did not write themselves, in order to understand the intent of that code. If generated tests cannot be easily understood, then they will be extremely difficult to maintain.
The overall objective of this thesis is to reaffirm the importance of unit test maintenance;
and to offer novel techniques to improve the readability of automatically generated tests.
The first contribution is an empirical survey of 225 developers from different parts of the world, who were asked to give their opinions about unit testing practices and problems. The survey responses confirm that unit testing is considered important; and that there is an
appetite for higher-quality automated test generation, with a view to test maintenance.
The second contribution is a domain-specific model of unit test readability, based on human judgements. The model is used to augment automated unit test generation to produce test suites with both high coverage and improved readability. In evaluations, 30 programmers
preferred our improved tests and were able to answer maintenance questions 14level of accuracy.
The third contribution is a novel algorithm for generating descriptive test names that summarise API- level coverage goals. Test optimisation ensures that each test is short, bears a clear relation to the covered code, and can be readily identified by programmers. In
evaluations, 47 programmers agreed with the choice of synthesised names and that these were as descriptive as manually chosen names. Participants were also more accurate and faster at matching generated tests against the tested code, compared to matching with manually-chosen test names