154,477 research outputs found
Automatically Documenting Software Artifacts
Software artifacts, such as database schema and unit test cases, constantly change during evolution and maintenance of software systems. Co-evolution of code and DB schemas in Database-Centric Applications (DCAs) often leads to two types of challenging scenarios for developers, where (i) changes to the DB schema need to be incorporated in the source code, and (ii) maintenance of a DCAs code requires understanding of how the features are implemented by relying on DB operations and corresponding schema constraints. On the other hand, the number of unit test cases often grows as new functionality is introduced into the system, and maintaining these unit tests is important to reduce the introduction of regression bugs due to outdated unit tests. Therefore, one critical artifact that developers need to be able to maintain during evolution and maintenance of software systems is up-to-date and complete documentation. In order to understand developer practices regarding documenting and maintaining these software artifacts, we designed two empirical studies both composed of (i) an online survey of contributors of open source projects and (ii) a mining-based analysis of method comments in these projects. We observed that documenting methods with database accesses and unit test cases is not a common practice. Further, motivated by the findings of the studies, we proposed three novel approaches: (i) DBScribe is an approach for automatically documenting database usages and schema constraints, (ii) UnitTestScribe is an approach for automatically documenting test cases, and (iii) TeStereo tags stereotypes for unit tests and generates html reports to improve the comprehension and browsing of unit tests in a large test suite. We evaluated our tools in the case studies with industrial developers and graduate students. In general, developers indicated that descriptions generated by the tools are complete, concise, and easy to read. The reports are useful for source code comprehension tasks as well as other tasks, such as code smell detection and source code navigation
Which Software Faults Are Tests Not Detecting?
Context: Software testing plays an important role in assuring the reliability of systems. Assessing the efficacy of testing remains challenging with few established test effectiveness metrics. Those metrics that have been used (e.g. coverage and mutation analysis) have been criticised for insufficiently differentiating between the faults detected by tests. Objective: We investigate how effective tests are at detecting different types of faults and whether some types of fault evade tests more than others. Our aim is to suggest to developers specific ways in which their tests need to be improved to increase fault detection. Method: We investigate seven fault types and analyse how often each goes undetected in 10 open source systems. We statistically look for any relationship between the test set and faults. Results: Our results suggest that the fault detection rates of unit tests are relatively low, typically finding only about a half of all faults. In addition, conditional boundary and method call removals are less well detected by tests than other fault types. Conclusions: We conclude that the testing of these open source systems needs to be improved across the board. In addition, despite boundary cases being long known to attract faults, tests covering boundaries need particular improvement. Overall, we recommend that developers do not rely only on code coverage and mutation score to measure the effectiveness of their tests
Symbolic Execution for Runtime Error Detection and Investigation of Refactoring Activities Based on a New Dataset
It is a big challenge in software engineering to produce huge, reliable and robust software systems.
In industry, developers typically have to focus on solving problems quickly.
The importance of code quality in time pressure is frequently secondary.
However, software code quality is very important, because a too complex, hard-to-maintain code results in more bugs, and makes further development more expensive.
The research work behind this thesis is inspired by the wish to develop high quality software systems in industry in a more effective and easier way to make the lives of customers and eventually end-users more comfortable and more effective.
The thesis consists of two main topics: the utilization of symbolic execution for runtime error detection and the investigation of practical refactoring activity.
Both topics address the area of program source code quality.
Symbolic execution is a program analysis technique which explores the possible execution paths of a program by handling the inputs as unknown variables (symbolic variables).
The main usages of symbolic execution are generating inputs of program failure and high-coverage test cases. It is able to expose defects that would be very difficult and timeconsuming to find through manual testing, and would be exponentially more costly to fix if they were not detected until runtime.
In this work, we focus on runtime error detection (such as null pointer dereference, bad array indexing, division by zero, etc.) by discovering critical execution paths in Java programs.
One of the greater challenges in symbolic execution is the very large number of possible execution paths, which increas exponentially.
Our research proposes approaches to handling the aforementioned problem of path explosion by applying symbolic execution at the level of methods.
We also investigated the limitations of this state space together with the development of efficient search heuristics.
To make the detection of runtime errors more accurate, we propose a novel algorithm that keeps track of the conditions above symbolic variables during the analysis.
Source code refactoring is a popular and powerful technique for improving the internal structure of software systems.
The concept of refactoring was introduced by Martin Fowler.
He originally proposed that detecting code smells should be the primary technique for identifying refactoring opportunities in the code.
However, we lack empirical research results on how, when and why refactoring is used in everyday software development, what are its effects on short- and long-term maintainability and costs.
By getting answers to these questions, we could understand how developers refactor code in practice, which would help propose new methods and tools for them that are aligned with their current habits leading to more effective software engineering methodologies in the industry.
To help further empirical investigations of code refactoring, we proposed a publicly available refactoring dataset.
The dataset consists of refactorings and source code metrics of open-source
Java systems.
We subjected the dataset to an analysis of the effects of code refactoring on source code metrics and maintainability, which are primary quality attributes in software development
Improving regression testing efficiency and reliability via test-suite transformations
As software becomes more important and ubiquitous, high quality software also becomes crucial. Developers constantly make changes to improve software, and they rely on regression testing—the process of running tests after every change—to ensure that changes do not break existing functionality. Regression testing is widely used both in industry and in open source, but it suffers from two main challenges. (1) Regression testing is costly. Developers run a large number of tests in the test suite after every change, and changes happen very frequently. The cost is both in the time developers spend waiting for the tests to finish running so that developers know whether the changes break existing functionality, and in the monetary cost of running the tests on machines. (2) Regression test suites contain flaky tests, which nondeterministically pass or fail when run on the same version of code, regardless of any changes. Flaky test failures can mislead developers into believing that their changes break existing functionality, even though those tests can fail without any changes. Developers will therefore waste time trying to debug non existent faults in their changes.
This dissertation proposes three lines of work that address these challenges of regression testing through test-suite transformations that modify test suites to make them more efficient or more reliable. Specifically, two lines of work explore how to reduce the cost of regression testing and one line of work explores how to fix existing flaky tests.
First, this dissertation investigates the effectiveness of test-suite reduction (TSR), a traditional test-suite transformation that removes tests deemed redundant with respect to other tests in the test suite based on heuristics. TSR outputs a smaller, reduced test suite to be run in the future. However, TSR risks removing tests that can potentially detect faults in future changes. While TSR was proposed over two decades ago, it was always evaluated using program versions with seeded faults. Such evaluations do not precisely predict the effectiveness of the reduced test suite on the future changes. This dissertation evaluates TSR in a real-world setting using real software evolution with real test failures. The results show that TSR techniques proposed in the past are not as effective as suggested by traditional TSR metrics, and those same metrics do not predict how effective a reduced test suite is in the future. Researchers need to either propose new TSR techniques that produce more effective reduced test suites or better metrics for predicting the effectiveness of reduced test suites.
Second, this dissertation proposes a new transformation to improve regression testing cost when using a modern build system by optimizing the placement of tests, implemented in a technique called TestOptimizer. Modern build systems treat a software project as a group of inter-dependent modules, including test modules that contain only tests. As such, when developers make a change, the build system can use a developer-specified dependency graph among modules to determine which test modules are affected by any changed modules and to run only tests in the affected test modules. However, wasteful test executions are a problem when using build systems this way. Suboptimal placements of tests, where developers may place some tests in a module that has more dependencies than the test actually needs, lead to running more tests than necessary after a change. TestOptimizer analyzes a project and proposes moving tests to reduce the number of test executions that are triggered over time due to developer changes. Evaluation of TestOptimizer on five large proprietary projects at Microsoft shows that the suggested test movements can reduce 21.7 million test executions (17.1%) across all evaluation projects. Developers accepted and intend to implement 84.4% of the reported suggestions.
Third, to make regression testing more reliable, this dissertation proposes iFixFlakies, a framework for fixing a prominent kind of flaky tests: order dependent tests. Order-dependent tests pass or fail depending on the order in which the tests are run. Intuitively, order-dependent tests fail either because they need another test to set up the state for them to pass, or because some other test pollutes the state before they are run, and the polluted state makes them fail. The key insight behind iFixFlakies is that test suites often already have tests, which we call helpers, that contain the logic for setting/resetting the state needed for order-dependent tests to pass. iFixFlakies searches a test suite for these helpers and then recommends patches for order-dependent tests using code from the helpers. Evaluation of iFixFlakies on 137 truly order-dependent tests from a public dataset shows that 81 of them have helpers, and iFixFlakies can fix all 81. Furthermore, among our GitHub pull requests for 78 of these order dependent tests (3 of 81 had been already fixed), developers accepted 38; the remaining ones are still pending, and none are rejected so far
Recommended from our members
Understanding Software Development and Testing Practices
A bad software development process leads to wasted effort and inferior products. In order to improve a software process, it must be first understood. In this work I focus on understanding software processes.
The first process we seek to understand is Continuous Integration (CI). CI systems automate the compilation, building, and testing of software. Despite CI rising as a big success story in automated software engineering, it has received almost no attention from the research community. For example, how widely is CI used in practice, and what are some costs and benefits associated with CI? Without answering such questions, developers, tool builders, and researchers make decisions based on folklore instead of data.
We use three complementary methods to study the usage of CI in open-source projects. To understand which CI systems developers use, we analyzed 34,544 open-source projects from GitHub. To understand how developers use CI, we analyzed 1,529,291 builds from the most commonly used CI system. To understand why projects use or do not use CI, we surveyed 442 developers. With this data, we answered several key questions related to the usage, costs, and benefits of CI. Among our results, we show evidence that supports the claim that CI helps projects release more often, that CI is widely adopted by the most popular projects, as well as finding that the overall percentage of projects using CI continues to grow, making it important and timely to focus more research on CI.
Furthermore, we present a qualitative study of the barriers and needs developers face when using CI. In this paper, we conduct 16 semi-structured interviews with developers from different industries and development scales. We triangulate our findings by running two surveys. The Focused Survey samples 51 developers at a single company. The Broad Survey samples a population of 523 developers from all over the world. We identify trade-offs developers face when using and implementing CI. Developers face trade-offs between speed and certainty (Assurance), between better access and information security (Security), and between more configuration options and better ease of use (Flexibility). We present implications of these trade-offs for developers, tool builders, and researchers.
Additionally, we seek to use code and test changes to understand conformance to the Test Driven Development (TDD) process. We designed and implemented TDDViz, a tool that supports developers in better understanding how they conform to TDD. TDDViz supports this understanding by providing novel visualizations of developers’ TDD process. To enable TDDViz’s visualizations, we developed a novel automatic inferencer that identifies the phases that make up the TDD process solely based on code and test changes.
We evaluate TDDViz using two complementary methods: a controlled experiment with 35 participants to evaluate the visualization, and a case study with 2601 TDD Sessions to evaluate the inference algorithm. The controlled experiment shows that, in comparison to existing visualizations, participants performed significantly better when using TDDViz to answer questions about code evolution. In addition, the case study shows that the inferencing algorithm in TDDViz infers TDD phases with an accuracy (F-measure) of 87%
Unit testing methods for Internet of Things Mbed OS operating system
Abstract. Embedded operating systems for Internet of Things are responsible for managing hardware and software in these systems. From the vast number of IoT operating system projects available, some projects are backed by large companies or institutes and some are developed completely by the open source community. IoT operating system testing focuses on the key features of IoT such as networking and limited resources.
In this thesis, problems in Mbed OS operating system testing methods are identified and a unit testing solution is implemented. The implemented unit testing framework allows developers to write and run unit tests. The framework is also integrated into Mbed OS continuous integration to increase test coverage.
This thesis shows how functional testing and unit testing are the most common types of testing in open source embedded operating system projects. Mbed OS unit testing framework results shows how running tests on PC platforms is faster than running tests on IoT devices. This framework also enables developers to write unit tests more freely and improve Mbed OS development process.
The implemented unit testing framework solved issues in Mbed OS testing but more in depth research is needed to improve testing methods further.Yksikkötestausmenetelmät esineiden internet Mbed OS käyttöjärjestelmälle. Tiivistelmä. Esineiden internettiin tarkoitetut sulautetut käyttöjärjestelmät ovat tarvittavia laitteiston ja sovellusten hallintaan IoT järjestelmissä. Saatavilla olevien IoT käyttöjärjestelmien joukosta osa on suurten yritysten tai instituutioiden tukemia, ja osa on täysin vapaan lähdekoodin yhteisön kehittämiä. IoT käyttöjärjestelmän testaus keskittyy esineiden internetin avainominaisuuksiin kuten verkkotietoliikenteeseen ja rajallisiin resursseihin.
Työssä tunnistetaan Mbed OS käyttöjärjestelmän testausmenetelmien ongelmia ja kehitetään yksikkötestaustyökalu. Kehitetty yksikkötestausympäristö mahdollistaa kehittäjille yksikkötestien kirjoittamisen ja ajamisen. Testaustyökalu yhdistetään myös Mbed OS jatkuvan integraation prosessiin testauskattavuuden parantamiseksi.
Työssä katsotaan kuinka funktionaaliset testit ja yksikkötestit ovat yleisimmät testityypit avoimen lähdekoodin sulautetuissa käyttöjärjestelmäprojekteissa. Mbed OS yksikkötestaustyökalu näyttää kuinka testien ajaminen PC ympäristössä on nopeampaa kuin IoT laitteissa. Tämä työkalu myös mahdollistaa kehittäjien kirjoittaa yksikkötestejä vapaammin ja siten parantaa kehitysprosessia.
Kehitetty yksikkötestaustyökalu ratkaisi Mbed OS testauksen ongelmia, mutta syventävää tutkimusta tarvitaan enemmän testausmenetelmien parantamiseksi edelleen
To the attention of mobile software developers: Guess what, test your app!
Software testing is an important phase in the software development life-cycle
because it helps in identifying bugs in a software system before it is shipped
into the hand of its end users. There are numerous studies on how developers
test general-purpose software applications. The idiosyncrasies of mobile
software applications, however, set mobile apps apart from general-purpose
systems (e.g., desktop, stand-alone applications, web services). This paper
investigates working habits and challenges of mobile software developers with
respect to testing. A key finding of our exhaustive study, using 1000 Android
apps, demonstrates that mobile apps are still tested in a very ad hoc way, if
tested at all. However, we show that, as in other types of software, testing
increases the quality of apps (demonstrated in user ratings and number of code
issues). Furthermore, we find evidence that tests are essential when it comes
to engaging the community to contribute to mobile open source software. We
discuss reasons and potential directions to address our findings. Yet another
relevant finding of our study is that Continuous Integration and Continuous
Deployment (CI/CD) pipelines are rare in the mobile apps world (only 26% of the
apps are developed in projects employing CI/CD) --- we argue that one of the
main reasons is due to the lack of exhaustive and automatic testing.Comment: Journal of Empirical Software Engineerin
- …