20,788 research outputs found
Lightweight Multilingual Software Analysis
Developer preferences, language capabilities and the persistence of older
languages contribute to the trend that large software codebases are often
multilingual, that is, written in more than one computer language. While
developers can leverage monolingual software development tools to build
software components, companies are faced with the problem of managing the
resultant large, multilingual codebases to address issues with security,
efficiency, and quality metrics. The key challenge is to address the opaque
nature of the language interoperability interface: one language calling
procedures in a second (which may call a third, or even back to the first),
resulting in a potentially tangled, inefficient and insecure codebase. An
architecture is proposed for lightweight static analysis of large multilingual
codebases: the MLSA architecture. Its modular and table-oriented structure
addresses the open-ended nature of multiple languages and language
interoperability APIs. We focus here as an application on the construction of
call-graphs that capture both inter-language and intra-language calls. The
algorithms for extracting multilingual call-graphs from codebases are
presented, and several examples of multilingual software engineering analysis
are discussed. The state of the implementation and testing of MLSA is
presented, and the implications for future work are discussed.Comment: 15 page
SZZ Unleashed: An Open Implementation of the SZZ Algorithm -- Featuring Example Usage in a Study of Just-in-Time Bug Prediction for the Jenkins Project
Numerous empirical software engineering studies rely on detailed information
about bugs. While issue trackers often contain information about when bugs were
fixed, details about when they were introduced to the system are often absent.
As a remedy, researchers often rely on the SZZ algorithm as a heuristic
approach to identify bug-introducing software changes. Unfortunately, as
reported in a recent systematic literature review, few researchers have made
their SZZ implementations publicly available. Consequently, there is a risk
that research effort is wasted as new projects based on SZZ output need to
initially reimplement the approach. Furthermore, there is a risk that newly
developed (closed source) SZZ implementations have not been properly tested,
thus conducting research based on their output might introduce threats to
validity. We present SZZ Unleashed, an open implementation of the SZZ algorithm
for git repositories. This paper describes our implementation along with a
usage example for the Jenkins project, and conclude with an illustrative study
on just-in-time bug prediction. We hope to continue evolving SZZ Unleashed on
GitHub, and warmly invite the community to contribute
Metrics for Graph Comparison: A Practitioner's Guide
Comparison of graph structure is a ubiquitous task in data analysis and
machine learning, with diverse applications in fields such as neuroscience,
cyber security, social network analysis, and bioinformatics, among others.
Discovery and comparison of structures such as modular communities, rich clubs,
hubs, and trees in data in these fields yields insight into the generative
mechanisms and functional properties of the graph.
Often, two graphs are compared via a pairwise distance measure, with a small
distance indicating structural similarity and vice versa. Common choices
include spectral distances (also known as distances) and distances
based on node affinities. However, there has of yet been no comparative study
of the efficacy of these distance measures in discerning between common graph
topologies and different structural scales.
In this work, we compare commonly used graph metrics and distance measures,
and demonstrate their ability to discern between common topological features
found in both random graph models and empirical datasets. We put forward a
multi-scale picture of graph structure, in which the effect of global and local
structure upon the distance measures is considered. We make recommendations on
the applicability of different distance measures to empirical graph data
problem based on this multi-scale view. Finally, we introduce the Python
library NetComp which implements the graph distances used in this work
A Study on the Effects of Exception Usage in Open-Source C++ Systems
Exception handling (EH) is a feature common to many modern programming languages, including C++, Java, and Python, that allows error handling in client code to be performed in a way that is both systematic and largely detached from the implementation of the main functionality. However, C++ developers sometimes choose not to use EH, as they feel that its use increases complexity of the resulting code: new control flow paths are added to the code, "stack unwinding'' adds extra responsibilities for the developer to worry about, and EH arguably detracts from the modular design of the system. In this thesis, we perform an exploratory empirical study of the effects of exceptions usage in 2721 open source C++ systems taken from GitHub. We observed that the number of edges in an augmented call graph increases, on average, by 22% when edges for exception flow are added to a graph. Additionally, about 8 out of 9 functions that may propagate a throw from another function. These results suggest that, in practice, the use of C++ EH can add complexity to the design of the system that developers must strive to be aware of
Towards Automated Performance Bug Identification in Python
Context: Software performance is a critical non-functional requirement,
appearing in many fields such as mission critical applications, financial, and
real time systems. In this work we focused on early detection of performance
bugs; our software under study was a real time system used in the
advertisement/marketing domain.
Goal: Find a simple and easy to implement solution, predicting performance
bugs.
Method: We built several models using four machine learning methods, commonly
used for defect prediction: C4.5 Decision Trees, Na\"{\i}ve Bayes, Bayesian
Networks, and Logistic Regression.
Results: Our empirical results show that a C4.5 model, using lines of code
changed, file's age and size as explanatory variables, can be used to predict
performance bugs (recall=0.73, accuracy=0.85, and precision=0.96). We show that
reducing the number of changes delivered on a commit, can decrease the chance
of performance bug injection.
Conclusions: We believe that our approach can help practitioners to eliminate
performance bugs early in the development cycle. Our results are also of
interest to theoreticians, establishing a link between functional bugs and
(non-functional) performance bugs, and explicitly showing that attributes used
for prediction of functional bugs can be used for prediction of performance
bugs
- …