2,003 research outputs found
Software Metrics in Boa Large-Scale Software Mining Infrastructure: Challenges and Solutions
In this paper, we describe our experience implementing some of classic
software engineering metrics using Boa - a large-scale software repository
mining platform - and its dedicated language. We also aim to take an advantage
of the Boa infrastructure to propose new software metrics and to characterize
open source projects by software metrics to provide reference values of
software metrics based on large number of open source projects. Presented
software metrics, well known and proposed in this paper, can be used to build
large-scale software defect prediction models. Additionally, we present the
obstacles we met while developing metrics, and our analysis can be used to
improve Boa in its future releases. The implemented metrics can also be used as
a foundation for more complex explorations of open source projects and serve as
a guide how to implement software metrics using Boa as the source code of the
metrics is freely available to support reproducible research.Comment: Chapter 8 of the book "Software Engineering: Improving Practice
through Research" (B. Hnatkowska and M. \'Smia{\l}ek, eds.), pp. 131-146,
201
Darwinian Data Structure Selection
Data structure selection and tuning is laborious but can vastly improve an
application's performance and memory footprint. Some data structures share a
common interface and enjoy multiple implementations. We call them Darwinian
Data Structures (DDS), since we can subject their implementations to survival
of the fittest. We introduce ARTEMIS a multi-objective, cloud-based
search-based optimisation framework that automatically finds optimal, tuned DDS
modulo a test suite, then changes an application to use that DDS. ARTEMIS
achieves substantial performance improvements for \emph{every} project in
Java projects from DaCapo benchmark, popular projects and uniformly
sampled projects from GitHub. For execution time, CPU usage, and memory
consumption, ARTEMIS finds at least one solution that improves \emph{all}
measures for () of the projects. The median improvement across
the best solutions is , , for runtime, memory and CPU
usage.
These aggregate results understate ARTEMIS's potential impact. Some of the
benchmarks it improves are libraries or utility functions. Two examples are
gson, a ubiquitous Java serialization framework, and xalan, Apache's XML
transformation tool. ARTEMIS improves gson by \%, and for
memory, runtime, and CPU; ARTEMIS improves xalan's memory consumption by
\%. \emph{Every} client of these projects will benefit from these
performance improvements.Comment: 11 page
Empirical Evidence of Large-Scale Diversity in API Usage of Object-Oriented Software
In this paper, we study how object-oriented classes are used across thousands
of software packages. We concentrate on "usage diversity'", defined as the
different statically observable combinations of methods called on the same
object. We present empirical evidence that there is a significant usage
diversity for many classes. For instance, we observe in our dataset that Java's
String is used in 2460 manners. We discuss the reasons of this observed
diversity and the consequences on software engineering knowledge and research
A heuristic-based approach to code-smell detection
Encapsulation and data hiding are central tenets of the object oriented paradigm. Deciding what data and behaviour to form into a class and where to draw the line between its public and private details can make the difference between a class that is an understandable, flexible and reusable abstraction and one which is not. This decision is a difficult one and may easily result in poor encapsulation which can then have serious implications for a number of system qualities. It is often hard to identify such encapsulation problems within large software systems until they cause a maintenance problem (which is usually too late) and attempting to perform such analysis manually can also be tedious and error prone. Two of the common encapsulation problems that can arise as a consequence of this decomposition process are data classes and god classes. Typically, these two problems occur together – data classes are lacking in functionality that has typically been sucked into an over-complicated and domineering god class. This paper describes the architecture of a tool which automatically detects data and god classes that has been developed as a plug-in for the Eclipse IDE. The technique has been evaluated in a controlled study on two large open source systems which compare the tool results to similar work by Marinescu, who employs a metrics-based approach to detecting such features. The study provides some valuable insights into the strengths and weaknesses of the two approache
- …