14,139 research outputs found
A Quantitative Study of Java Software Buildability
Researchers, students and practitioners often encounter a situation when the
build process of a third-party software system fails. In this paper, we aim to
confirm this observation present mainly as anecdotal evidence so far. Using a
virtual environment simulating a programmer's one, we try to fully
automatically build target archives from the source code of over 7,200 open
source Java projects. We found that more than 38% of builds ended in failure.
Build log analysis reveals the largest portion of errors are
dependency-related. We also conduct an association study of factors affecting
build success
Vulnerable Open Source Dependencies: Counting Those That Matter
BACKGROUND: Vulnerable dependencies are a known problem in today's
open-source software ecosystems because OSS libraries are highly interconnected
and developers do not always update their dependencies. AIMS: In this paper we
aim to present a precise methodology, that combines the code-based analysis of
patches with information on build, test, update dates, and group extracted from
the very code repository, and therefore, caters to the needs of industrial
practice for correct allocation of development and audit resources. METHOD: To
understand the industrial impact of the proposed methodology, we considered the
200 most popular OSS Java libraries used by SAP in its own software. Our
analysis included 10905 distinct GAVs (group, artifact, version) when
considering all the library versions. RESULTS: We found that about 20% of the
dependencies affected by a known vulnerability are not deployed, and therefore,
they do not represent a danger to the analyzed library because they cannot be
exploited in practice. Developers of the analyzed libraries are able to fix
(and actually responsible for) 82% of the deployed vulnerable dependencies. The
vast majority (81%) of vulnerable dependencies may be fixed by simply updating
to a new version, while 1% of the vulnerable dependencies in our sample are
halted, and therefore, potentially require a costly mitigation strategy.
CONCLUSIONS: Our case study shows that the correct counting allows software
development companies to receive actionable information about their library
dependencies, and therefore, correctly allocate costly development and audit
resources, which is spent inefficiently in case of distorted measurements.Comment: This is a pre-print of the paper that appears, with the same title,
in the proceedings of the 12th International Symposium on Empirical Software
Engineering and Measurement, 201
Proactive Empirical Assessment of New Language Feature Adoption via Automated Refactoring: The Case of Java 8 Default Methods
Programming languages and platforms improve over time, sometimes resulting in
new language features that offer many benefits. However, despite these
benefits, developers may not always be willing to adopt them in their projects
for various reasons. In this paper, we describe an empirical study where we
assess the adoption of a particular new language feature. Studying how
developers use (or do not use) new language features is important in
programming language research and engineering because it gives designers
insight into the usability of the language to create meaning programs in that
language. This knowledge, in turn, can drive future innovations in the area.
Here, we explore Java 8 default methods, which allow interfaces to contain
(instance) method implementations.
Default methods can ease interface evolution, make certain ubiquitous design
patterns redundant, and improve both modularity and maintainability. A focus of
this work is to discover, through a scientific approach and a novel technique,
situations where developers found these constructs useful and where they did
not, and the reasons for each. Although several studies center around assessing
new language features, to the best of our knowledge, this kind of construct has
not been previously considered.
Despite their benefits, we found that developers did not adopt default
methods in all situations. Our study consisted of submitting pull requests
introducing the language feature to 19 real-world, open source Java projects
without altering original program semantics. This novel assessment technique is
proactive in that the adoption was driven by an automatic refactoring approach
rather than waiting for developers to discover and integrate the feature
themselves. In this way, we set forth best practices and patterns of using the
language feature effectively earlier rather than later and are able to possibly
guide (near) future language evolution. We foresee this technique to be useful
in assessing other new language features, design patterns, and other
programming idioms
Simplifying Deep-Learning-Based Model for Code Search
To accelerate software development, developers frequently search and reuse
existing code snippets from a large-scale codebase, e.g., GitHub. Over the
years, researchers proposed many information retrieval (IR) based models for
code search, which match keywords in query with code text. But they fail to
connect the semantic gap between query and code. To conquer this challenge, Gu
et al. proposed a deep-learning-based model named DeepCS. It jointly embeds
method code and natural language description into a shared vector space, where
methods related to a natural language query are retrieved according to their
vector similarities. However, DeepCS' working process is complicated and
time-consuming. To overcome this issue, we proposed a simplified model
CodeMatcher that leverages the IR technique but maintains many features in
DeepCS. Generally, CodeMatcher combines query keywords with the original order,
performs a fuzzy search on name and body strings of methods, and returned the
best-matched methods with the longer sequence of used keywords. We verified its
effectiveness on a large-scale codebase with about 41k repositories.
Experimental results showed the simplified model CodeMatcher outperforms DeepCS
by 97% in terms of MRR (a widely used accuracy measure for code search), and it
is over 66 times faster than DeepCS. Besides, comparing with the
state-of-the-art IR-based model CodeHow, CodeMatcher also improves the MRR by
73%. We also observed that: fusing the advantages of IR-based and
deep-learning-based models is promising because they compensate with each other
by nature; improving the quality of method naming helps code search, since
method name plays an important role in connecting query and code
- …