Search CORE

13,061 research outputs found

A Comprehensive Study of Bloated Dependencies in the Maven Ecosystem

Author: Baudry Benoit
Harrand Nicolas
Monperrus Martin
Soto-Valero César
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 21/01/2020
Field of study

Build automation tools and package managers have a profound influence on software development. They facilitate the reuse of third-party libraries, support a clear separation between the application's code and its external dependencies, and automate several software development tasks. However, the wide adoption of these tools introduces new challenges related to dependency management. In this paper, we propose an original study of one such challenge: the emergence of bloated dependencies. Bloated dependencies are libraries that the build tool packages with the application's compiled code but that are actually not necessary to build and run the application. This phenomenon artificially grows the size of the built binary and increases maintenance effort. We propose a tool, called DepClean, to analyze the presence of bloated dependencies in Maven artifacts. We analyze 9,639 Java artifacts hosted on Maven Central, which include a total of 723,444 dependency relationships. Our key result is that 75.1% of the analyzed dependency relationships are bloated. In other words, it is feasible to reduce the number of dependencies of Maven artifacts up to 1/4 of its current count. We also perform a qualitative study with 30 notable open-source projects. Our results indicate that developers pay attention to their dependencies and are willing to remove bloated dependencies: 18/21 answered pull requests were accepted and merged by developers, removing 131 dependencies in total.Comment: Manuscript submitted to Empirical Software Engineering (EMSE

arXiv.org e-Print Archive

Source File Set Search for Clone-and-Own Reuse Analysis

Author: Inoue Katsuro
Ishio Takashi
Ito Kaoru
Sakaguchi Yusuke
Publication venue
Publication date: 01/01/2017
Field of study

Clone-and-own approach is a natural way of source code reuse for software developers. To assess how known bugs and security vulnerabilities of a cloned component affect an application, developers and security analysts need to identify an original version of the component and understand how the cloned component is different from the original one. Although developers may record the original version information in a version control system and/or directory names, such information is often either unavailable or incomplete. In this research, we propose a code search method that takes as input a set of source files and extracts all the components including similar files from a software ecosystem (i.e., a collection of existing versions of software packages). Our method employs an efficient file similarity computation using b-bit minwise hashing technique. We use an aggregated file similarity for ranking components. To evaluate the effectiveness of this tool, we analyzed 75 cloned components in Firefox and Android source code. The tool took about two hours to report the original components from 10 million files in Debian GNU/Linux packages. Recall of the top-five components in the extracted lists is 0.907, while recall of a baseline using SHA-1 file hash is 0.773, according to the ground truth recorded in the source code repositories.Comment: 14th International Conference on Mining Software Repositorie

arXiv.org e-Print Archive

NAIST Academic Repository

Crossref

Recommended from our members

Similarities, challenges and opportunities of wikipedia content and open source projects

Author: Capiluppi A
Publication venue: 'Wiley'
Publication date: 04/09/2012
Field of study

Copyright @ 2012 John Wiley & Sons, Ltd.Several years of research and evidence have demonstrated that Open Source Software (OSS) portals often contain a large amount of software projects that simply do not evolve, developed by relatively small communities, struggling to attract a sustained number of contributors. These portals have started to increasingly act as a storage for abandoned projects, and researchers and practitioners should try and point out how to take advantage of such content. Similarly, other online content portals (like Wikipedia) could be harvested for valuable content. In this paper we argue that, even with differences in the requested expertise, many projects reliant on content and contributions by users undergo a similar evolution, and follow similar patterns: when a project fails to attract contributors, it appears to be not evolving, or abandoned. Far from a negative finding, even those projects could provide valuable content that should be harvested and identified based on common characteristics: by using the attributes of “usefulness” and “modularity” we isolate valuable content in both Wikipedia pages and OSS projects

Brunel University Research Archive

Analysis of source code metrics from ns-2 and ns-3 network simulators

Author: Amaya Rodríguez Claudio Antonio
Domínguez Morales Manuel Jesús
Font Calvo Juan Luis
Sevillano Ramos José Luis
Íñigo Blasco Pablo
Publication venue: 'Elsevier BV'
Publication date: 01/01/2011
Field of study

Ns-2 and its successor ns-3 are discrete-event simulators which are closely related to each other as they share common background, concepts and similar aims. Ns-3 is still under development, but it offers some interesting characteristics for developers while ns-2 still has a large user base. While other studies have compared different network simulators, focusing on performance measurements, in this paper we adopted a different approach by focusing on technical characteristics and using software metrics to obtain useful conclusions. We chose ns-2 and ns-3 for our case study because of the popularity of the former in research and the increasing use of the latter. This reflects the current situation where ns-3 has emerged as a viable alternative to ns-2 due to its features and design. The paper assesses the current state of both projects and their respective evolution supported by the measurements obtained from a broad set of software metrics. By considering other qualitative characteristics we obtained a summary of technical features of both simulators including, architectural design, software dependencies or documentation policies.Ministerio de Ciencia e Innovación TEC2009-10639-C04-0

idUS. Depósito de Investigación Universidad de Sevilla

Selection of third party software in Off-The-Shelf-based software development: an interview study with industrial practitioners

Author: Ayala Martínez Claudia Patricia
Conradi Reidar
Franch Gutiérrez Javier
Hauge Oyvind
Li Jingyue
Publication venue: 'Elsevier BV'
Publication date: 01/01/2011
Field of study

The success of software development using third party components highly depends on the ability to select a suitable component for the intended application. The evidence shows that there is limited knowledge about current industrial OTS selection practices. As a result, there is often a gap between theory and practice, and the proposed methods for supporting selection are rarely adopted in the industrial practice. This paper's goal is to investigate the actual industrial practice of component selection in order to provide an initial empirical basis that allows the reconciliation of research and industrial endeavors. The study consisted of semi-structured interviews with 23 employees from 20 different software-intensive companies that mostly develop web information system applications. It provides qualitative information that help to further understand these practices, and emphasize some aspects that have been overlooked by researchers. For instance, although the literature claims that component repositories are important for locating reusable components; these are hardly used in industrial practice. Instead, other resources that have not received considerable attention are used with this aim. Practices and potential market niches for software-intensive companies have been also identified. The results are valuable from both the research and the industrial perspectives as they provide a basis for formulating well-substantiated hypotheses and more effective improvement strategies.Peer ReviewedPostprint (author's final draft

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Towards Understanding Third-party Library Dependency in C/C++ Ecosystem

Author: Li Yi
Liu Chengwei
Liu Yang
Luo Ping
Tang Wei
Wu Jiahui
Xu Zhengzi
Yang Shouguo
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 20/09/2022
Field of study

Third-party libraries (TPLs) are frequently reused in software to reduce development cost and the time to market. However, external library dependencies may introduce vulnerabilities into host applications. The issue of library dependency has received considerable critical attention. Many package managers, such as Maven, Pip, and NPM, are proposed to manage TPLs. Moreover, a significant amount of effort has been put into studying dependencies in language ecosystems like Java, Python, and JavaScript except C/C++. Due to the lack of a unified package manager for C/C++, existing research has only few understanding of TPL dependencies in the C/C++ ecosystem, especially at large scale. Towards understanding TPL dependencies in the C/C++ecosystem, we collect existing TPL databases, package management tools, and dependency detection tools, summarize the dependency patterns of C/C++ projects, and construct a comprehensive and precise C/C++ dependency detector. Using our detector, we extract dependencies from a large-scale database containing 24K C/C++ repositories from GitHub. Based on the extracted dependencies, we provide the results and findings of an empirical study, which aims at understanding the characteristics of the TPL dependencies. We further discuss the implications to manage dependency for C/C++ and the future research directions for software engineering researchers and developers in fields of library development, software composition analysis, and C/C++package manager.Comment: ASE 202

arXiv.org e-Print Archive

Studying and Enabling Reuse in android Mobile Apps

Author: Holtzhauer andrew Steven
Publication venue: W&M ScholarWorks
Publication date: 01/01/2014
Field of study

College of William & Mary: W&M Publish