2,236 research outputs found
Automatic Software Repair: a Bibliography
This article presents a survey on automatic software repair. Automatic
software repair consists of automatically finding a solution to software bugs
without human intervention. This article considers all kinds of repairs. First,
it discusses behavioral repair where test suites, contracts, models, and
crashing inputs are taken as oracle. Second, it discusses state repair, also
known as runtime repair or runtime recovery, with techniques such as checkpoint
and restart, reconfiguration, and invariant restoration. The uniqueness of this
article is that it spans the research communities that contribute to this body
of knowledge: software engineering, dependability, operating systems,
programming languages, and security. It provides a novel and structured
overview of the diversity of bug oracles and repair operators used in the
literature
Towards the detection and analysis of performance regression introducing code changes
In contemporary software development, developers commonly conduct regression testing to ensure that code changes do not affect software quality. Performance regression testing is an emerging research area from the regression testing domain in software engineering. Performance regression testing aims to maintain the system\u27s performance. Conducting performance regression testing is known to be expensive. It is also complex, considering the increase of committed code and developing team members working simultaneously. Many automated regression testing techniques have been proposed in prior research. However, challenges in the practice of locating and resolving performance regression still exist. Directing regression testing to the commit level provides solutions to locate the root cause, yet it hinders the development process. This thesis outlines motivations and solutions to address locating performance regression root causes. First, we challenge a deterministic state-of-art approach by expanding the testing data to find improvement areas. The deterministic approach was found to be limited in searching for the best regression-locating rule. Thus, we presented two stochastic approaches to develop models that can learn from historical commits. The goal of the first stochastic approach is to view the research problem as a search-based optimization problem seeking to reach the highest detection rate. We are applying different multi-objective evolutionary algorithms and conducting a comparison between them. This thesis also investigates whether simplifying the search space by combining objectives would achieve comparative results. The second stochastic approach addresses the severity of class imbalance any system could have since code changes introducing regression are rare but costly. We formulate the identification of problematic commits that introduce performance regression as a binary classification problem that handles class imbalance. Further, the thesis provides an exploratory study on the challenges developers face in resolving performance regression. The study is based on the questions posted on a technical form directed to performance regression. We collected around 2k questions discussing the regression of software execution time, and all were manually analyzed. The study resulted in a categorization of the challenges. We also discussed the difficulty level of performance regression issues within the development community. This study provides insights to help developers during the software design and implementation to avoid regression causes
Enabling Richer Insight Into Runtime Executions Of Systems
Systems software of very large scales are being heavily used today in various important scenarios such as online retail, banking, content services, web search and social networks. As the scale of functionality and complexity grows in these software, managing the implementations becomes a considerable challenge for developers, designers and maintainers. Software needs to be constantly monitored and tuned for optimal efficiency and user satisfaction. With large scale, these systems incorporate significant degrees of asynchrony, parallelism and distributed executions, reducing the manageability of software including performance management. Adding to the complexity, developers are under pressure between developing new functionality for customers and maintaining existing programs. This dissertation argues that the manual effort currently required to manage performance of these systems is very high, and can be automated to both reduce the likelihood of problems and quickly fix them once identified. The execution logs from these systems are easily available and provide rich information about the internals at runtime for diagnosis purposes, but the volume of logs is simply too large for today\u27s techniques. Developers hence spend many human hours observing and investigating executions of their systems during development and diagnosis of software, for performance management. This dissertation proposes the application of machine learning techniques to automatically analyze logs from executions, to challenging tasks in different phases of the software lifecycle. It is shown that the careful application of statistical techniques to features extracted from instrumentation, can distill the rich log data into easily comprehensible forms for the developers
Multi-Objective Search-Based Software Microbenchmark Prioritization
Ensuring that software performance does not degrade after a code change is
paramount. A potential solution, particularly for libraries and frameworks, is
regularly executing software microbenchmarks, a performance testing technique
similar to (functional) unit tests. This often becomes infeasible due to the
extensive runtimes of microbenchmark suites, however. To address that
challenge, research has investigated regression testing techniques, such as
test case prioritization (TCP), which reorder the execution within a
microbenchmark suite to detect larger performance changes sooner. Such
techniques are either designed for unit tests and perform sub-par on
microbenchmarks or require complex performance models, reducing their potential
application drastically. In this paper, we propose a search-based technique
based on multi-objective evolutionary algorithms (MOEAs) to improve the current
state of microbenchmark prioritization. The technique utilizes three
objectives, i.e., coverage to maximize, coverage overlap to minimize, and
historical performance change detection to maximize. We find that our technique
improves over the best coverage-based, greedy baselines in terms of average
percentage of fault-detection on performance (APFD-P) and Top-3 effectiveness
by 26 percentage points (pp) and 43 pp (for Additional) and 17 pp and 32 pp
(for Total) to 0.77 and 0.24, respectively. Employing the Indicator-Based
Evolutionary Algorithm (IBEA) as MOEA leads to the best effectiveness among six
MOEAs. Finally, the technique's runtime overhead is acceptable at 19% of the
overall benchmark suite runtime, if we consider the enormous runtimes often
spanning multiple hours. The added overhead compared to the greedy baselines is
miniscule at 1%.These results mark a step forward for universally applicable
performance regression testing techniques.Comment: 17 pages, 5 figure
Towards the Automation of Migration and Safety of Third-Party Libraries
The process of migration from one library to a new, different library is very complex. Typically, the developer needs to find functions in the new library that are most adequate in replacing the functions of the retired library. This process is subjective and time-consuming as the developer needs to fully understand the documentation of both libraries to be able to migrate from an old library to a new one and find the right matching function(s) if exists. Our goal is helping the developer to have better experiences with library migration by identifying the key problems related to this process. Based on our critical literature review, we identified three main challenges related to the automation of library migration: (1) the mining of existing migrations, (2) learning from these migrations to recommend them in similar contexts, and (3) guaranteeing the safety of the recommended migrations
On The Use of Over-Approximate Analysis in Support of Software Development and Testing
The effectiveness of dynamic program analyses, such as profiling and memory-leak detection, crucially depend on the quality of the test inputs. However, adequate sets of inputs are rarely available. Existing automated input generation techniques can help but tend to be either too expensive or ineffective. For example, traditional symbolic execution scales poorly to real-world programs and random input generation may never reach deep states within the program.
For scalable, effective, automated input generation that can better support dynamic analysis, I propose an approach that extends traditional symbolic execution by targeting increasingly small fragments of a program. The approach starts by generating inputs for the whole program and progressively introduces additional unconstrained state until it reaches a given program coverage objective. This approach is applicable to any client dynamic analysis requiring high coverage that is also tolerant of over-approximated program behavior--behavior that cannot occur on a complete execution.
To assess the effectiveness of my approach, I applied it to two client techniques. The first technique infers the actual path taken by a program execution by observing the CPU's electromagnetic emanations and requires inputs to generate a model that can recognize executed path segments.
The client inference works by piece wise matching the observed emanation waveform to those recorded in a model.
It requires the model to be complete (i.e. contain every piece) and the waveforms are sufficiently distinct that the inclusion of extra samples is unlikely to cause a misinference.
After applying my approach to generate inputs covering all subsegments of the program’s execution paths, I designed a source generator to automatically construct a harness and scaffolding to replay these inputs against fragments of the original program.
The inference client constructs the model by recording the harness execution.
The second technique performs automated regression testing by identifying behavioral differences between two program versions and requires inputs to perform differential testing.
It explores local behavior in a neighborhood of the program changes by generating inputs to functions near (as measured by call-graph) to the modified code.
The inputs are then concretely executed on both versions, periodically checking internal state for behavioral differences.
The technique requires high coverage inputs for a full examination, and tolerates infeasible local state since both versions likely execute it equivalently.
I will then present a separate technique to improve the coverage obtained by symbolic execution of floating-point programs.
This technique is equally applicable to both traditional symbolic execution and my progressively under-constrained symbolic execution.
Its key idea is to approximate floating-point expressions with fixed-point analogs.
In concluding, I will also discuss future research directions, including additional empirical evaluations and the investigation of additional client analyses that could benefit from my approach.Ph.D
Test Smell: A Parasitic Energy Consumer in Software Testing
Traditionally, energy efficiency research has focused on reducing energy
consumption at the hardware level and, more recently, in the design and coding
phases of the software development life cycle. However, software testing's
impact on energy consumption did not receive attention from the research
community. Specifically, how test code design quality and test smell (e.g.,
sub-optimal design and bad practices in test code) impact energy consumption
has not been investigated yet. This study examined 12 Apache projects to
analyze the association between test smell and its effects on energy
consumption in software testing. We conducted a mixed-method empirical analysis
from two dimensions; software (data mining in Apache projects) and developers'
views (a survey of 62 software practitioners). Our findings show that: 1) test
smell is associated with energy consumption in software testing. Specifically
smelly part of a test case consumes 10.92\% more energy compared to the
non-smelly part. 2) certain test smells are more energy-hungry than others, 3)
refactored test cases tend to consume less energy than their smelly
counterparts, and 4) most developers lack knowledge about test smells' impact
on energy consumption. We conclude the paper with several observations that can
direct future research and developments
- …