856 research outputs found
Structured Review of the Evidence for Effects of Code Duplication on Software Quality
This report presents the detailed steps and results of a structured review of code clone literature. The aim of the review is to investigate the evidence for the claim that code duplication has a negative effect on code changeability. This report contains only the details of the review for which there is not enough place to include them in the companion paper published at a conference (Hordijk, Ponisio et al. 2009 - Harmfulness of Code Duplication - A Structured Review of the Evidence)
Do the Fix Ingredients Already Exist? An Empirical Inquiry into the Redundancy Assumptions of Program Repair Approaches
Much initial research on automatic program repair has focused on experimental
results to probe their potential to find patches and reduce development effort.
Relatively less effort has been put into understanding the hows and whys of
such approaches. For example, a critical assumption of the GenProg technique is
that certain bugs can be fixed by copying and re-arranging existing code. In
other words, GenProg assumes that the fix ingredients already exist elsewhere
in the code. In this paper, we formalize these assumptions around the concept
of ''temporal redundancy''. A temporally redundant commit is only composed of
what has already existed in previous commits. Our experiments show that a large
proportion of commits that add existing code are temporally redundant. This
validates the fundamental redundancy assumption of GenProg.Comment: ICSE - 36th IEEE International Conference on Software Engineering
(2014
Evaluating Representation Learning of Code Changes for Predicting Patch Correctness in Program Repair
A large body of the literature of automated program repair develops
approaches where patches are generated to be validated against an oracle (e.g.,
a test suite). Because such an oracle can be imperfect, the generated patches,
although validated by the oracle, may actually be incorrect. While the state of
the art explore research directions that require dynamic information or rely on
manually-crafted heuristics, we study the benefit of learning code
representations to learn deep features that may encode the properties of patch
correctness. Our work mainly investigates different representation learning
approaches for code changes to derive embeddings that are amenable to
similarity computations. We report on findings based on embeddings produced by
pre-trained and re-trained neural networks. Experimental results demonstrate
the potential of embeddings to empower learning algorithms in reasoning about
patch correctness: a machine learning predictor with BERT transformer-based
embeddings associated with logistic regression yielded an AUC value of about
0.8 in predicting patch correctness on a deduplicated dataset of 1000 labeled
patches. Our study shows that learned representations can lead to reasonable
performance when comparing against the state-of-the-art, PATCH-SIM, which
relies on dynamic information. These representations may further be
complementary to features that were carefully (manually) engineered in the
literature
Automated Change Rule Inference for Distance-Based API Misuse Detection
Developers build on Application Programming Interfaces (APIs) to reuse
existing functionalities of code libraries. Despite the benefits of reusing
established libraries (e.g., time savings, high quality), developers may
diverge from the API's intended usage; potentially causing bugs or, more
specifically, API misuses. Recent research focuses on developing techniques to
automatically detect API misuses, but many suffer from a high false-positive
rate. In this article, we improve on this situation by proposing ChaRLI (Change
RuLe Inference), a technique for automatically inferring change rules from
developers' fixes of API misuses based on API Usage Graphs (AUGs). By
subsequently applying graph-distance algorithms, we use change rules to
discriminate API misuses from correct usages. This allows developers to reuse
others' fixes of an API misuse at other code locations in the same or another
project. We evaluated the ability of change rules to detect API misuses based
on three datasets and found that the best mean relative precision (i.e., for
testable usages) ranges from 77.1 % to 96.1 % while the mean recall ranges from
0.007 % to 17.7 % for individual change rules. These results underpin that
ChaRLI and our misuse detection are helpful complements to existing API misuse
detectors
Supporting Source Code Feature Analysis Using Execution Trace Mining
Software maintenance is a significant phase of a software life-cycle. Once a system is developed the main focus shifts to maintenance to keep the system up to date. A system may be changed for various reasons such as fulfilling customer requirements, fixing bugs or optimizing existing code. Code needs to be studied and understood before any modification is done to it. Understanding code is a time intensive and often complicated part of software maintenance that is supported by documentation and various tools such as
profilers, debuggers and source code analysis techniques. However, most of the tools fail to assist in locating the portions of the code that implement the functionality the software developer is focusing. Mining execution traces can help developers identify parts of the source code specific to the functionality of interest and at the same time help them understand the behaviour of the code.
We propose a use-driven hybrid framework of static and dynamic analyses to mine and manage execution traces to support software developers in understanding how the system's functionality is implemented through feature analysis. We express a system's use as a set of tests. In our approach, we develop a set of uses that represents how a system is used or how a user uses some specific functionality. Each use set describes a user's interaction with the system. To manage large and complex traces we organize them by system use and segment them by user interface events. The segmented traces are also clustered based on internal and external method types. The clusters are further categorized into groups based on application programming interfaces and active clones. To further support comprehension we propose a taxonomy of metrics which are used to quantify the trace.
To validate the framework we built a tool called TrAM that implements trace mining and provides visualization features. It can quantify the trace method information, mine similar code fragments called active clones, cluster methods based on types, categorise them based on groups and quantify their behavioural aspects using a set of metrics. The tool also lets the users visualize the design and implementation of a system using images, filtering, grouping, event and system use, and present them with values calculated using trace, group, clone and method metrics. We also conducted a case study on five different subject systems using the tool to determine the dynamic properties of the source code clones at runtime and answer three research questions using our findings. We compared our tool with trace mining tools and profilers in terms of features, and scenarios. Finally, we evaluated TrAM by conducting a user study on its effectiveness, usability and information management
Recommending Stack Overflow Posts for Fixing Runtime Exceptions using Failure Scenario Matching
Using online Q&A forums, such as Stack Overflow (SO), for guidance to resolve
program bugs, among other development issues, is commonplace in modern software
development practice. Runtime exceptions (RE) is one such important class of
bugs that is actively discussed on SO. In this work we present a technique and
prototype tool called MAESTRO that can automatically recommend an SO post that
is most relevant to a given Java RE in a developer's code. MAESTRO compares the
exception-generating program scenario in the developer's code with that
discussed in an SO post and returns the post with the closest match. To extract
and compare the exception scenario effectively, MAESTRO first uses the answer
code snippets in a post to implicate a subset of lines in the post's question
code snippet as responsible for the exception and then compares these lines
with the developer's code in terms of their respective Abstract Program Graph
(APG) representations. The APG is a simplified and abstracted derivative of an
abstract syntax tree, proposed in this work, that allows an effective
comparison of the functionality embodied in the high-level program structure,
while discarding many of the low-level syntactic or semantic differences. We
evaluate MAESTRO on a benchmark of 78 instances of Java REs extracted from the
top 500 Java projects on GitHub and show that MAESTRO can return either a
highly relevant or somewhat relevant SO post corresponding to the exception
instance in 71% of the cases, compared to relevant posts returned in only 8% -
44% instances, by four competitor tools based on state-of-the-art techniques.
We also conduct a user experience study of MAESTRO with 10 Java developers,
where the participants judge MAESTRO reporting a highly relevant or somewhat
relevant post in 80% of the instances. In some cases the post is judged to be
even better than the one manually found by the participant
- …