434 research outputs found

    Extracting Build Changes with BUILDDIFF

    Full text link
    Build systems are an essential part of modern software engineering projects. As software projects change continuously, it is crucial to understand how the build system changes because neglecting its maintenance can lead to expensive build breakage. Recent studies have investigated the (co-)evolution of build configurations and reasons for build breakage, but they did this only on a coarse grained level. In this paper, we present BUILDDIFF, an approach to extract detailed build changes from MAVEN build files and classify them into 95 change types. In a manual evaluation of 400 build changing commits, we show that BUILDDIFF can extract and classify build changes with an average precision and recall of 0.96 and 0.98, respectively. We then present two studies using the build changes extracted from 30 open source Java projects to study the frequency and time of build changes. The results show that the top 10 most frequent change types account for 73% of the build changes. Among them, changes to version numbers and changes to dependencies of the projects occur most frequently. Furthermore, our results show that build changes occur frequently around releases. With these results, we provide the basis for further research, such as for analyzing the (co-)evolution of build files with other artifacts or improving effort estimation approaches. Furthermore, our detailed change information enables improvements of refactoring approaches for build configurations and improvements of models to identify error-prone build files.Comment: Accepted at the International Conference of Mining Software Repositories (MSR), 201

    An Introduction to Software Ecosystems

    Full text link
    This chapter defines and presents different kinds of software ecosystems. The focus is on the development, tooling and analytics aspects of software ecosystems, i.e., communities of software developers and the interconnected software components (e.g., projects, libraries, packages, repositories, plug-ins, apps) they are developing and maintaining. The technical and social dependencies between these developers and software components form a socio-technical dependency network, and the dynamics of this network change over time. We classify and provide several examples of such ecosystems. The chapter also introduces and clarifies the relevant terms needed to understand and analyse these ecosystems, as well as the techniques and research methods that can be used to analyse different aspects of these ecosystems.Comment: Preprint of chapter "An Introduction to Software Ecosystems" by Tom Mens and Coen De Roover, published in the book "Software Ecosystems: Tooling and Analytics" (eds. T. Mens, C. De Roover, A. Cleve), 2023, ISBN 978-3-031-36059-6, reproduced with permission of Springer. The final authenticated version of the book and this chapter is available online at: https://doi.org/10.1007/978-3-031-36060-

    Vulnerable Open Source Dependencies: Counting Those That Matter

    Full text link
    BACKGROUND: Vulnerable dependencies are a known problem in today's open-source software ecosystems because OSS libraries are highly interconnected and developers do not always update their dependencies. AIMS: In this paper we aim to present a precise methodology, that combines the code-based analysis of patches with information on build, test, update dates, and group extracted from the very code repository, and therefore, caters to the needs of industrial practice for correct allocation of development and audit resources. METHOD: To understand the industrial impact of the proposed methodology, we considered the 200 most popular OSS Java libraries used by SAP in its own software. Our analysis included 10905 distinct GAVs (group, artifact, version) when considering all the library versions. RESULTS: We found that about 20% of the dependencies affected by a known vulnerability are not deployed, and therefore, they do not represent a danger to the analyzed library because they cannot be exploited in practice. Developers of the analyzed libraries are able to fix (and actually responsible for) 82% of the deployed vulnerable dependencies. The vast majority (81%) of vulnerable dependencies may be fixed by simply updating to a new version, while 1% of the vulnerable dependencies in our sample are halted, and therefore, potentially require a costly mitigation strategy. CONCLUSIONS: Our case study shows that the correct counting allows software development companies to receive actionable information about their library dependencies, and therefore, correctly allocate costly development and audit resources, which is spent inefficiently in case of distorted measurements.Comment: This is a pre-print of the paper that appears, with the same title, in the proceedings of the 12th International Symposium on Empirical Software Engineering and Measurement, 201

    REI:An integrated measure for software reusability

    Get PDF
    To capitalize upon the benefits of software reuse, an efficient selection among candidate reusable assets should be performed in terms of functional fitness and adaptability. The reusability of assets is usually measured through reusability indices. However, these do not capture all facets of reusability, such as structural characteristics, external quality attributes, and documentation. In this paper, we propose a reusability index (REI) as a synthesis of various software metrics and evaluate its ability to quantify reuse, based on IEEE Standard on Software Metrics Validity. The proposed index is compared with existing ones through a case study on 80 reusable open-source assets. To illustrate the applicability of the proposed index, we performed a pilot study, where real-world reuse decisions have been compared with decisions imposed by the use of metrics (including REI). The results of the study suggest that the proposed index presents the highest predictive and discriminative power; it is the most consistent in ranking reusable assets and the most strongly correlated to their levels of reuse. The findings of the paper are discussed to understand the most important aspects in reusability assessment (interpretation of results), and interesting implications for research and practice are provided

    RefDiff: Detecting Refactorings in Version Histories

    Full text link
    Refactoring is a well-known technique that is widely adopted by software engineers to improve the design and enable the evolution of a system. Knowing which refactoring operations were applied in a code change is a valuable information to understand software evolution, adapt software components, merge code changes, and other applications. In this paper, we present RefDiff, an automated approach that identifies refactorings performed between two code revisions in a git repository. RefDiff employs a combination of heuristics based on static analysis and code similarity to detect 13 well-known refactoring types. In an evaluation using an oracle of 448 known refactoring operations, distributed across seven Java projects, our approach achieved precision of 100% and recall of 88%. Moreover, our evaluation suggests that RefDiff has superior precision and recall than existing state-of-the-art approaches.Comment: Paper accepted at 14th International Conference on Mining Software Repositories (MSR), pages 1-11, 201

    Predicting Changes to Source Code

    Get PDF
    Organizations typically use issue tracking systems (ITS) such as Jira to plan software releases and assign requirements to developers. Organizations typically also use source control management (SCM) repositories such as Git to track historical changes to a code-base. These ITS and SCM repositories contain valuable data that remains largely untapped. As developers churn through an organization, it becomes expensive for developers to spend time determining which software artifact must be modified to implement a requirement. In this work we created, developed, tested and evaluated a tool called Class Change Predictor, otherwise known as CCP, for predicting which class will implement a requirement. Understanding which class will implement a requirement supports several software engineering tasks such as refactoring and assigning requirements to developers. CCP is a data-mining tool operating on top of ITS and SCM repositories which gathers a unique combination of metrics. CCP leverages requirement text to compare current requirements to past requirements and requirements to source code files. CCP performs static analysis on the code-base of each major release of the software artifact. We evaluated CCP on different open source datasets (and the Digital Democracy dataset) by using several machine learning classifiers and pre-processing procedures. Our results show that we can achieve high precision on three out of four datasets. We conclude that accurate class change prediction is feasible, and we propose numerous solutions to increase future accuracy

    Empirically-Grounded Construction of Bug Prediction and Detection Tools

    Get PDF
    There is an increasing demand on high-quality software as software bugs have an economic impact not only on software projects, but also on national economies in general. Software quality is achieved via the main quality assurance activities of testing and code reviewing. However, these activities are expensive, thus they need to be carried out efficiently. Auxiliary software quality tools such as bug detection and bug prediction tools help developers focus their testing and reviewing activities on the parts of software that more likely contain bugs. However, these tools are far from adoption as mainstream development tools. Previous research points to their inability to adapt to the peculiarities of projects and their high rate of false positives as the main obstacles of their adoption. We propose empirically-grounded analysis to improve the adaptability and efficiency of bug detection and prediction tools. For a bug detector to be efficient, it needs to detect bugs that are conspicuous, frequent, and specific to a software project. We empirically show that the null-related bugs fulfill these criteria and are worth building detectors for. We analyze the null dereferencing problem and find that its root cause lies in methods that return null. We propose an empirical solution to this problem that depends on the wisdom of the crowd. For each API method, we extract the nullability measure that expresses how often the return value of this method is checked against null in the ecosystem of the API. We use nullability to annotate API methods with nullness annotation and warn developers about missing and excessive null checks. For a bug predictor to be efficient, it needs to be optimized as both a machine learning model and a software quality tool. We empirically show how feature selection and hyperparameter optimizations improve prediction accuracy. Then we optimize bug prediction to locate the maximum number of bugs in the minimum amount of code by finding the most cost-effective combination of bug prediction configurations, i.e., dependent variables, machine learning model, and response variable. We show that using both source code and change metrics as dependent variables, applying feature selection on them, then using an optimized Random Forest to predict the number of bugs results in the most cost-effective bug predictor. Throughout this thesis, we show how empirically-grounded analysis helps us achieve efficient bug prediction and detection tools and adapt them to the characteristics of each software project
    corecore