2,003 research outputs found

    Software Metrics in Boa Large-Scale Software Mining Infrastructure: Challenges and Solutions

    Get PDF
    In this paper, we describe our experience implementing some of classic software engineering metrics using Boa - a large-scale software repository mining platform - and its dedicated language. We also aim to take an advantage of the Boa infrastructure to propose new software metrics and to characterize open source projects by software metrics to provide reference values of software metrics based on large number of open source projects. Presented software metrics, well known and proposed in this paper, can be used to build large-scale software defect prediction models. Additionally, we present the obstacles we met while developing metrics, and our analysis can be used to improve Boa in its future releases. The implemented metrics can also be used as a foundation for more complex explorations of open source projects and serve as a guide how to implement software metrics using Boa as the source code of the metrics is freely available to support reproducible research.Comment: Chapter 8 of the book "Software Engineering: Improving Practice through Research" (B. Hnatkowska and M. \'Smia{\l}ek, eds.), pp. 131-146, 201

    Darwinian Data Structure Selection

    Get PDF
    Data structure selection and tuning is laborious but can vastly improve an application's performance and memory footprint. Some data structures share a common interface and enjoy multiple implementations. We call them Darwinian Data Structures (DDS), since we can subject their implementations to survival of the fittest. We introduce ARTEMIS a multi-objective, cloud-based search-based optimisation framework that automatically finds optimal, tuned DDS modulo a test suite, then changes an application to use that DDS. ARTEMIS achieves substantial performance improvements for \emph{every} project in 55 Java projects from DaCapo benchmark, 88 popular projects and 3030 uniformly sampled projects from GitHub. For execution time, CPU usage, and memory consumption, ARTEMIS finds at least one solution that improves \emph{all} measures for 86%86\% (37/4337/43) of the projects. The median improvement across the best solutions is 4.8%4.8\%, 10.1%10.1\%, 5.1%5.1\% for runtime, memory and CPU usage. These aggregate results understate ARTEMIS's potential impact. Some of the benchmarks it improves are libraries or utility functions. Two examples are gson, a ubiquitous Java serialization framework, and xalan, Apache's XML transformation tool. ARTEMIS improves gson by 16.516.5\%, 1%1\% and 2.2%2.2\% for memory, runtime, and CPU; ARTEMIS improves xalan's memory consumption by 23.523.5\%. \emph{Every} client of these projects will benefit from these performance improvements.Comment: 11 page

    Empirical Evidence of Large-Scale Diversity in API Usage of Object-Oriented Software

    Get PDF
    In this paper, we study how object-oriented classes are used across thousands of software packages. We concentrate on "usage diversity'", defined as the different statically observable combinations of methods called on the same object. We present empirical evidence that there is a significant usage diversity for many classes. For instance, we observe in our dataset that Java's String is used in 2460 manners. We discuss the reasons of this observed diversity and the consequences on software engineering knowledge and research

    A heuristic-based approach to code-smell detection

    Get PDF
    Encapsulation and data hiding are central tenets of the object oriented paradigm. Deciding what data and behaviour to form into a class and where to draw the line between its public and private details can make the difference between a class that is an understandable, flexible and reusable abstraction and one which is not. This decision is a difficult one and may easily result in poor encapsulation which can then have serious implications for a number of system qualities. It is often hard to identify such encapsulation problems within large software systems until they cause a maintenance problem (which is usually too late) and attempting to perform such analysis manually can also be tedious and error prone. Two of the common encapsulation problems that can arise as a consequence of this decomposition process are data classes and god classes. Typically, these two problems occur together – data classes are lacking in functionality that has typically been sucked into an over-complicated and domineering god class. This paper describes the architecture of a tool which automatically detects data and god classes that has been developed as a plug-in for the Eclipse IDE. The technique has been evaluated in a controlled study on two large open source systems which compare the tool results to similar work by Marinescu, who employs a metrics-based approach to detecting such features. The study provides some valuable insights into the strengths and weaknesses of the two approache
    • …
    corecore