13,061 research outputs found
A Comprehensive Study of Bloated Dependencies in the Maven Ecosystem
Build automation tools and package managers have a profound influence on
software development. They facilitate the reuse of third-party libraries,
support a clear separation between the application's code and its external
dependencies, and automate several software development tasks. However, the
wide adoption of these tools introduces new challenges related to dependency
management. In this paper, we propose an original study of one such challenge:
the emergence of bloated dependencies.
Bloated dependencies are libraries that the build tool packages with the
application's compiled code but that are actually not necessary to build and
run the application. This phenomenon artificially grows the size of the built
binary and increases maintenance effort. We propose a tool, called DepClean, to
analyze the presence of bloated dependencies in Maven artifacts. We analyze
9,639 Java artifacts hosted on Maven Central, which include a total of 723,444
dependency relationships. Our key result is that 75.1% of the analyzed
dependency relationships are bloated. In other words, it is feasible to reduce
the number of dependencies of Maven artifacts up to 1/4 of its current count.
We also perform a qualitative study with 30 notable open-source projects. Our
results indicate that developers pay attention to their dependencies and are
willing to remove bloated dependencies: 18/21 answered pull requests were
accepted and merged by developers, removing 131 dependencies in total.Comment: Manuscript submitted to Empirical Software Engineering (EMSE
Source File Set Search for Clone-and-Own Reuse Analysis
Clone-and-own approach is a natural way of source code reuse for software
developers. To assess how known bugs and security vulnerabilities of a cloned
component affect an application, developers and security analysts need to
identify an original version of the component and understand how the cloned
component is different from the original one. Although developers may record
the original version information in a version control system and/or directory
names, such information is often either unavailable or incomplete. In this
research, we propose a code search method that takes as input a set of source
files and extracts all the components including similar files from a software
ecosystem (i.e., a collection of existing versions of software packages). Our
method employs an efficient file similarity computation using b-bit minwise
hashing technique. We use an aggregated file similarity for ranking components.
To evaluate the effectiveness of this tool, we analyzed 75 cloned components in
Firefox and Android source code. The tool took about two hours to report the
original components from 10 million files in Debian GNU/Linux packages. Recall
of the top-five components in the extracted lists is 0.907, while recall of a
baseline using SHA-1 file hash is 0.773, according to the ground truth recorded
in the source code repositories.Comment: 14th International Conference on Mining Software Repositorie
Recommended from our members
Similarities, challenges and opportunities of wikipedia content and open source projects
Copyright @ 2012 John Wiley & Sons, Ltd.Several years of research and evidence have demonstrated that Open Source Software (OSS) portals often contain a large amount of software projects that simply do not evolve, developed by relatively small communities, struggling to attract a sustained number of contributors. These portals have started to
increasingly act as a storage for abandoned projects, and researchers and practitioners should try and point out how to take advantage of such content. Similarly, other online content portals (like Wikipedia) could be harvested for valuable content. In this paper we argue that, even with differences in the requested expertise, many projects reliant on content and contributions by users undergo a similar evolution, and follow similar patterns: when a project fails to attract contributors, it appears to be not evolving, or abandoned. Far from a negative finding, even those projects could provide valuable content that should be harvested and identified based on common characteristics: by using the attributes of “usefulness” and “modularity” we isolate valuable content in both Wikipedia pages and OSS projects
Analysis of source code metrics from ns-2 and ns-3 network simulators
Ns-2 and its successor ns-3 are discrete-event simulators which are closely related to each
other as they share common background, concepts and similar aims. Ns-3 is still under
development, but it offers some interesting characteristics for developers while ns-2 still
has a large user base. While other studies have compared different network simulators,
focusing on performance measurements, in this paper we adopted a different approach
by focusing on technical characteristics and using software metrics to obtain useful conclusions.
We chose ns-2 and ns-3 for our case study because of the popularity of the former in
research and the increasing use of the latter. This reflects the current situation where ns-3
has emerged as a viable alternative to ns-2 due to its features and design. The paper
assesses the current state of both projects and their respective evolution supported by
the measurements obtained from a broad set of software metrics. By considering other
qualitative characteristics we obtained a summary of technical features of both simulators
including, architectural design, software dependencies or documentation policies.Ministerio de Ciencia e Innovación TEC2009-10639-C04-0
Selection of third party software in Off-The-Shelf-based software development: an interview study with industrial practitioners
The success of software development using third party components highly depends on the ability to select a suitable component for the intended application. The evidence shows that there is limited knowledge about current industrial OTS selection practices. As a result, there is often a gap between theory and practice, and the proposed methods for supporting selection are rarely adopted in the industrial practice. This paper's goal is to investigate the actual industrial practice of component selection in order to provide an initial empirical basis that allows the reconciliation of research and industrial endeavors. The study consisted of semi-structured interviews with 23 employees from 20 different software-intensive companies that mostly develop web information system applications. It provides qualitative information that help to further understand these practices, and emphasize some aspects that have been overlooked by researchers. For instance, although the literature claims that component repositories are important for locating reusable components; these are hardly used in industrial practice. Instead, other resources that have not received considerable attention are used with this aim. Practices and potential market niches for software-intensive companies have been also identified. The results are valuable from both the research and the industrial perspectives as they provide a basis for formulating well-substantiated hypotheses and more effective improvement strategies.Peer ReviewedPostprint (author's final draft
Towards Understanding Third-party Library Dependency in C/C++ Ecosystem
Third-party libraries (TPLs) are frequently reused in software to reduce
development cost and the time to market. However, external library dependencies
may introduce vulnerabilities into host applications. The issue of library
dependency has received considerable critical attention. Many package managers,
such as Maven, Pip, and NPM, are proposed to manage TPLs. Moreover, a
significant amount of effort has been put into studying dependencies in
language ecosystems like Java, Python, and JavaScript except C/C++. Due to the
lack of a unified package manager for C/C++, existing research has only few
understanding of TPL dependencies in the C/C++ ecosystem, especially at large
scale.
Towards understanding TPL dependencies in the C/C++ecosystem, we collect
existing TPL databases, package management tools, and dependency detection
tools, summarize the dependency patterns of C/C++ projects, and construct a
comprehensive and precise C/C++ dependency detector. Using our detector, we
extract dependencies from a large-scale database containing 24K C/C++
repositories from GitHub. Based on the extracted dependencies, we provide the
results and findings of an empirical study, which aims at understanding the
characteristics of the TPL dependencies. We further discuss the implications to
manage dependency for C/C++ and the future research directions for software
engineering researchers and developers in fields of library development,
software composition analysis, and C/C++package manager.Comment: ASE 202
- …