13,131 research outputs found
A Study on Software Testability and the Quality of Testing in Object-Oriented Systems
Software testing is known to be important to the delivery of high-quality systems, but it is also challenging, expensive and time-consuming. This has motivated academic and industrial researchers to seek ways to improve the testability of software. Software testability is the ease with which a software artefact can be effectively tested.
The first step towards building testable software components is to understand the factors – of software processes, products and people – that are related to and can influence software testability. In particular, the goal of this thesis is to provide researchers and practitioners with a comprehensive understanding of design and source code factors that can affect the testability of a class in object oriented systems. This thesis considers three different views on software testability that address three related aspects: 1) the distribution of unit tests in relation to the dynamic coupling and centrality of software production classes, 2) the relationship between dynamic (i.e., runtime) software properties and class testability, and 3) the relationship between code smells, test smells and the factors related to smells distribution. The thesis utilises a combination of source code analysis techniques (both static and dynamic), software metrics, software visualisation techniques and graph-based metrics (from complex networks theory) to address its goals and objectives.
A systematic mapping study was first conducted to thoroughly investigate the body of research on dynamic software metrics and to identify issues associated with their selection, design and implementation. This mapping study identified, evaluated and classified 62 research works based on a pre-tested protocol and a set of classification criteria. Based on the findings of this study, a number of dynamic metrics were selected and used in the experiments that were then conducted.
The thesis demonstrates that by using a combination of visualisation, dynamic analysis, static analysis and graph-based metrics it is feasible to identify central classes and to diagrammatically depict testing coverage information. Experimental results show that, even in projects with high test coverage, some classes appear to be left without any direct unit testing, even though they play a central role during a typical execution profile. It is contended that the proposed visualisation techniques could be particularly helpful when developers need to maintain and reengineer existing test suites.
Another important finding of this thesis is that frequently executed and tightly coupled classes are correlated with the testability of the class – such classes require larger unit tests and more test cases. This information could inform estimates of the effort required to test classes when developing new unit tests or when maintaining and refactoring existing tests.
An additional key finding of this thesis is that test and code smells, in general, can have a negative impact on class testability. Increasing levels of size and complexity in code are associated with the increased presence of test smells. In addition, production classes that contain smells generally require larger unit tests, and are also likely to be associated with test smells in their associated unit tests. There are some particular smells that are more significantly associated with class testability than other smells. Furthermore, some particular code smells can be seen as a sign for the presence of test smells, as some test and code smells are found to co-occur in the test and production code. These results suggest that code smells, and specifically certain types of smells, as well as measures of size and complexity, can be used to provide a more comprehensive indication of smells likely to emerge in test code produced subsequently (or vice versa in a test-first context). Such findings should contribute positively to the work of testers and maintainers when writing unit tests and when refactoring and maintaining existing tests
Mutation Testing as a Safety Net for Test Code Refactoring
Refactoring is an activity that improves the internal structure of the code
without altering its external behavior. When performed on the production code,
the tests can be used to verify that the external behavior of the production
code is preserved. However, when the refactoring is performed on test code,
there is no safety net that assures that the external behavior of the test code
is preserved. In this paper, we propose to adopt mutation testing as a means to
verify if the behavior of the test code is preserved after refactoring.
Moreover, we also show how this approach can be used to identify the part of
the test code which is improperly refactored
Security Code Smells in Android ICC
Android Inter-Component Communication (ICC) is complex, largely
unconstrained, and hard for developers to understand. As a consequence, ICC is
a common source of security vulnerability in Android apps. To promote secure
programming practices, we have reviewed related research, and identified
avoidable ICC vulnerabilities in Android-run devices and the security code
smells that indicate their presence. We explain the vulnerabilities and their
corresponding smells, and we discuss how they can be eliminated or mitigated
during development. We present a lightweight static analysis tool on top of
Android Lint that analyzes the code under development and provides just-in-time
feedback within the IDE about the presence of such smells in the code.
Moreover, with the help of this tool we study the prevalence of security code
smells in more than 700 open-source apps, and manually inspect around 15% of
the apps to assess the extent to which identifying such smells uncovers ICC
security vulnerabilities.Comment: Accepted on 28 Nov 2018, Empirical Software Engineering Journal
(EMSE), 201
Evaluating Maintainability Prejudices with a Large-Scale Study of Open-Source Projects
Exaggeration or context changes can render maintainability experience into
prejudice. For example, JavaScript is often seen as least elegant language and
hence of lowest maintainability. Such prejudice should not guide decisions
without prior empirical validation. We formulated 10 hypotheses about
maintainability based on prejudices and test them in a large set of open-source
projects (6,897 GitHub repositories, 402 million lines, 5 programming
languages). We operationalize maintainability with five static analysis
metrics. We found that JavaScript code is not worse than other code, Java code
shows higher maintainability than C# code and C code has longer methods than
other code. The quality of interface documentation is better in Java code than
in other code. Code developed by teams is not of higher and large code bases
not of lower maintainability. Projects with high maintainability are not more
popular or more often forked. Overall, most hypotheses are not supported by
open-source data.Comment: 20 page
Automatic generation of smell-free unit tests
Tese de mestrado, Engenharia Informática, 2022, Universidade de Lisboa, Faculdade de CiênciasAutomated test generation tools (such as EvoSuite) typically aim to maximize code
coverage. However, they frequently disregard non-coverage aspects that can be relevant
for testers, such as the quality of the generated tests. Therefore, automatically generated
tests are often affected by a set of test-specific bad programming practices that may hinder
the quality of both test and production code, i.e., test smells. Given that other researchers
have successfully integrated non-coverage quality metrics into EvoSuite, we decided to
extend the EvoSuite tool such that the generated test code is smell-free. To this aim, we
compiled 54 test smells from several sources and selected 16 smells that are relevant to the
context of this work. We then augmented the tool with the respective test smell metrics
and investigated the diffusion of the selected smells and the distribution of the metrics.
Finally, we implemented an approach to optimize the test smell metrics as secondary
criteria. After establishing the optimal configuration to optimize as secondary criteria
(which we used throughout the remainder of the study), we conducted an empirical study
to assess whether the tests became significantly less smelly. Furthermore, we studied
how the proposed metrics affect the fault detection effectiveness, coverage, and size of
the generated tests. Our study revealed that the proposed approach reduces the overall
smelliness of the generated tests; in particular, the diffusion of the “Indirect Testing” and
“Unrelated Assertions” smells improved considerably. Moreover, our approach improved
the smelliness of the tests generated by EvoSuite without compromising the code coverage
or fault detection effectiveness. The size and length of the generated tests were also not
affected by the new secondary criteria
An Empirical Study of Using Large Language Models for Unit Test Generation
A code generation model generates code by taking a prompt from a code
comment, existing code, or a combination of both. Although code generation
models (e.g. GitHub Copilot) are increasingly being adopted in practice, it is
unclear whether they can successfully be used for unit test generation without
fine-tuning. We investigated how well three generative models (Codex,
GPT-3.5-Turbo, and StarCoder) can generate test cases to fill this gap. We used
two benchmarks (HumanEval and Evosuite SF110) to investigate the context
generation's effect in the unit test generation process. We evaluated the
models based on compilation rates, test correctness, coverage, and test smells.
We found that the Codex model achieved above 80% coverage for the HumanEval
dataset, but no model had more than 2% coverage for the EvoSuite SF110
benchmark. The generated tests also suffered from test smells, such as
Duplicated Asserts and Empty Tests.Comment: Preprint submitted to Journal of Systems and Software; 36 pages, 4
figures, 7 table
The Emotional and Chromatic Layers of Urban Smells
People are able to detect up to 1 trillion odors. Yet, city planning is
concerned only with a few bad odors, mainly because odors are currently
captured only through complaints made by urban dwellers. To capture both good
and bad odors, we resort to a methodology that has been recently proposed and
relies on tagging information of geo-referenced pictures. In doing so for the
cities of London and Barcelona, this work makes three new contributions. We
study 1) how the urban smellscape changes in time and space; 2) which emotions
people share at places with specific smells; and 3) what is the color of a
smell, if it exists. Without social media data, insights about those three
aspects have been difficult to produce in the past, further delaying the
creation of urban restorative experiences.Comment: 11 pages, 18 figures, final version published in the Proceedings of
the Tenth International Conference on Web and Social Media (ICWSM 2016
On the Effectiveness of Unit Tests in Test-driven Development
Background: Writing unit tests is one of the primary activities
in test-driven development. Yet, the existing reviews report few
evidence supporting or refuting the effect of this development approach
on test case quality. Lack of ability and skills of developers to
produce sufficiently good test cases are also reported as limitations
of applying test-driven development in industrial practice.
Objective: We investigate the impact of test-driven development
on the effectiveness of unit test cases compared to an incremental
test last development in an industrial context.
Method: We conducted an experiment in an industrial setting
with 24 professionals. Professionals followed the two development
approaches to implement the tasks. We measure unit test effectiveness
in terms of mutation score. We also measure branch and
method coverage of test suites to compare our results with the
literature.
Results: In terms of mutation score, we have found that the test
cases written for a test-driven development task have a higher
defect detection ability than test cases written for an incremental
test-last development task. Subjects wrote test cases that cover
more branches on a test-driven development task compared to the
other task. However, test cases written for an incremental test-last
development task cover more methods than those written for the
second task.
Conclusion: Our findings are different from previous studies
conducted at academic settings. Professionals were able to perform
more effective unit testing with test-driven development. Furthermore,
we observe that the coverage measure preferred in academic
studies reveal different aspects of a development approach. Our
results need to be validated in larger industrial contexts.Istanbul Technical University
Scientific Research Projects (MGA-2017-40712), and the
Academy of Finland (Decision No. 278354)
- …