139 research outputs found
Recommended from our members
Leveraging Generated Tests
The main goal of automated test generation is to improve the reliability of a program by exposing faults to developers. To this end, testing should cover the largest possible portion of the program given a test budget (i.e., time and resources) as frequently as possible. Coverage of a program entity in testing increases our confidence in the correctness of that entity.
Generating various tests to cover a program entity is a particularly hard problem to solve for large software systems because the test inputs are complex and they often exhibit sophisticated feature interactions. As a result, current test generation techniques, such as symbolic execution or search-based testing, do not scale well to complex, large-scale systems.
This dissertation presents a test generation technique which aims to increase the frequency of coverage in large, complex software systems. It leverages the information of existing test cases to direct the automated testing. We show the results of the application of this technique to some large systems such as GCC compiler ( 850K Lines of code), and Mozillas JavaScript engine ( 120K lines of code). It increases the frequency of coverage upto the factor of 9x, compared to the state-of-the-art technique.
It also proposes non-adequate test-case reduction for reducing the size of test cases by cov-erage and mutant detection criteria. C%-coverage test reduction technique reduces a test case while preseving at least C% of coverage in the original test case. N-mutant test reduction tech-nique reduces a test cases while preserving detection of N mutants of the original test case. We evaluate the effectiveness of these test reduction techniques on different attributes of test cases.
This research suggest that the generated test cases should be treated as first-class artifacts in the software development and they can be leveraged for interesting testing tasks
Configuring Test Generators using Bug Reports: A Case Study of GCC Compiler and Csmith
The correctness of compilers is instrumental in the safety and reliability of
other software systems, as bugs in compilers can produce executables that do
not reflect the intent of programmers. Such errors are difficult to identify
and debug. Random test program generators are commonly used in testing
compilers, and they have been effective in uncovering bugs. However, the
problem of guiding these test generators to produce test programs that are more
likely to find bugs remains challenging. In this paper, we use the code
snippets in the bug reports to guide the test generation. The main idea of this
work is to extract insights from the bug reports about the language features
that are more prone to inadequate implementation and using the insights to
guide the test generators. We use the GCC C compiler to evaluate the
effectiveness of this approach. In particular, we first cluster the test
programs in the GCC bugs reports based on their features. We then use the
centroids of the clusters to compute configurations for Csmith, a popular test
generator for C compilers. We evaluated this approach on eight versions of GCC
and found that our approach provides higher coverage and triggers more
miscompilation failures than the state-of-the-art test generation techniques
for GCC.Comment: The 36th ACM/SIGAPP Symposium on Applied Computing, Software
Verification and Testing Track (SAC-SVT'21
Evaluation of Generalizability of Neural Program Analyzers under Semantic-Preserving Transformations
The abundance of publicly available source code repositories, in conjunction
with the advances in neural networks, has enabled data-driven approaches to
program analysis. These approaches, called neural program analyzers, use neural
networks to extract patterns in the programs for tasks ranging from development
productivity to program reasoning. Despite the growing popularity of neural
program analyzers, the extent to which their results are generalizable is
unknown.
In this paper, we perform a large-scale evaluation of the generalizability of
two popular neural program analyzers using seven semantically-equivalent
transformations of programs. Our results caution that in many cases the neural
program analyzers fail to generalize well, sometimes to programs with
negligible textual differences. The results provide the initial stepping stones
for quantifying robustness in neural program analyzers.Comment: for related work, see arXiv:2008.0156
Using test case reduction and prioritization to improve symbolic execution
Scaling symbolic execution to large programs or programs with complex inputs remains difficult due to path explosion and complex constraints, as well as external method calls. Additionally, creating an effective test structure with sym-bolic inputs can be difficult. A popular symbolic execution strategy in practice is to perform symbolic execution not “from scratch ” but based on existing test cases. This paper proposes that the effectiveness of this approach to symbolic execution can be enhanced by (1) reducing the size of seed test cases and (2) prioritizing seed test cases to maximize ex-ploration efficiency. The proposed test case reduction strat-egy is based on a recently introduced generalization of delta-debugging, and our prioritization techniques include novel methods that, for this purpose, can outperform some tradi-tional regression testing algorithms. We show that applying these methods can significantly improve the effectiveness of symbolic execution based on existing test cases
Study of Distractors in Neural Models of Code
Finding important features that contribute to the prediction of neural models
is an active area of research in explainable AI. Neural models are opaque and
finding such features sheds light on a better understanding of their
predictions. In contrast, in this work, we present an inverse perspective of
distractor features: features that cast doubt about the prediction by affecting
the model's confidence in its prediction. Understanding distractors provide a
complementary view of the features' relevance in the predictions of neural
models. In this paper, we apply a reduction-based technique to find distractors
and provide our preliminary results of their impacts and types. Our experiments
across various tasks, models, and datasets of code reveal that the removal of
tokens can have a significant impact on the confidence of models in their
predictions and the categories of tokens can also play a vital role in the
model's confidence. Our study aims to enhance the transparency of models by
emphasizing those tokens that significantly influence the confidence of the
models.Comment: The 1st International Workshop on Interpretability and Robustness in
Neural Software Engineering, Co-located with ICSE (InteNSE'23
Memorization and Generalization in Neural Code Intelligence Models
Deep Neural Networks (DNN) are increasingly commonly used in software
engineering and code intelligence tasks. These are powerful tools that are
capable of learning highly generalizable patterns from large datasets through
millions of parameters. At the same time, training DNNs means walking a knife's
edges, because their large capacity also renders them prone to memorizing data
points. While traditionally thought of as an aspect of over-training, recent
work suggests that the memorization risk manifests especially strongly when the
training datasets are noisy and memorization is the only recourse.
Unfortunately, most code intelligence tasks rely on rather noise-prone and
repetitive data sources, such as GitHub, which, due to their sheer size, cannot
be manually inspected and evaluated. We evaluate the memorization and
generalization tendencies in neural code intelligence models through a case
study across several benchmarks and model families by leveraging established
approaches from other fields that use DNNs, such as introducing targeted noise
into the training dataset. In addition to reinforcing prior general findings
about the extent of memorization in DNNs, our results shed light on the impact
of noisy dataset in training.Comment: manuscript in preparatio
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
Although deep neural models substantially reduce the overhead of feature
engineering, the features readily available in the inputs might significantly
impact training cost and the performance of the models. In this paper, we
explore the impact of an unsuperivsed feature enrichment approach based on
variable roles on the performance of neural models of code. The notion of
variable roles (as introduced in the works of Sajaniemi et al. [Refs. 1,2]) has
been found to help students' abilities in programming. In this paper, we
investigate if this notion would improve the performance of neural models of
code. To the best of our knowledge, this is the first work to investigate how
Sajaniemi et al.'s concept of variable roles can affect neural models of code.
In particular, we enrich a source code dataset by adding the role of individual
variables in the dataset programs, and thereby conduct a study on the impact of
variable role enrichment in training the Code2Seq model. In addition, we shed
light on some challenges and opportunities in feature enrichment for neural
code intelligence models.Comment: Accepted in the 1st International Workshop on Interpretability and
Robustness in Neural Software Engineering (InteNSE'23), Co-located with ICS
- …