4,789 research outputs found
Source File Set Search for Clone-and-Own Reuse Analysis
Clone-and-own approach is a natural way of source code reuse for software
developers. To assess how known bugs and security vulnerabilities of a cloned
component affect an application, developers and security analysts need to
identify an original version of the component and understand how the cloned
component is different from the original one. Although developers may record
the original version information in a version control system and/or directory
names, such information is often either unavailable or incomplete. In this
research, we propose a code search method that takes as input a set of source
files and extracts all the components including similar files from a software
ecosystem (i.e., a collection of existing versions of software packages). Our
method employs an efficient file similarity computation using b-bit minwise
hashing technique. We use an aggregated file similarity for ranking components.
To evaluate the effectiveness of this tool, we analyzed 75 cloned components in
Firefox and Android source code. The tool took about two hours to report the
original components from 10 million files in Debian GNU/Linux packages. Recall
of the top-five components in the extracted lists is 0.907, while recall of a
baseline using SHA-1 file hash is 0.773, according to the ground truth recorded
in the source code repositories.Comment: 14th International Conference on Mining Software Repositorie
Stack Overflow: A Code Laundering Platform?
Developers use Question and Answer (Q&A) websites to exchange knowledge and
expertise. Stack Overflow is a popular Q&A website where developers discuss
coding problems and share code examples. Although all Stack Overflow posts are
free to access, code examples on Stack Overflow are governed by the Creative
Commons Attribute-ShareAlike 3.0 Unported license that developers should obey
when reusing code from Stack Overflow or posting code to Stack Overflow. In
this paper, we conduct a case study with 399 Android apps, to investigate
whether developers respect license terms when reusing code from Stack Overflow
posts (and the other way around). We found 232 code snippets in 62 Android apps
from our dataset that were potentially reused from Stack Overflow, and 1,226
Stack Overflow posts containing code examples that are clones of code released
in 68 Android apps, suggesting that developers may have copied the code of
these apps to answer Stack Overflow questions. We investigated the licenses of
these pieces of code and observed 1,279 cases of potential license violations
(related to code posting to Stack overflow or code reuse from Stack overflow).
This paper aims to raise the awareness of the software engineering community
about potential unethical code reuse activities taking place on Q&A websites
like Stack Overflow.Comment: In proceedings of the 24th IEEE International Conference on Software
Analysis, Evolution, and Reengineering (SANER
Product line architecture recovery with outlier filtering in software families: the Apo-Games case study
Software product line (SPL) approach has been widely adopted to achieve systematic reuse in families of software products. Despite its benefits, developing an SPL from scratch requires high up-front investment. Because of that, organizations commonly create product variants with opportunistic reuse approaches (e.g., copy-and-paste or clone-and-own). However, maintenance and evolution of a large number of product variants is a challenging task. In this context, a family of products developed opportunistically is a good starting point to adopt SPLs, known as extractive approach for SPL adoption. One of the initial phases of the extractive approach is the recovery and definition of a product line architecture (PLA) based on existing software variants, to support variant derivation and also to allow the customization according to customers’ needs. The problem of defining a PLA from existing system variants is that some variants can become highly unrelated to their predecessors, known as outlier variants. The inclusion of outlier variants in the PLA recovery leads to additional effort and noise in the common structure and complicates architectural decisions. In this work, we present an automatic approach to identify and filter outlier variants during the recovery and definition of PLAs. Our approach identifies the minimum subset of cross-product architectural information for an effective PLA recovery. To evaluate our approach, we focus on real-world variants of the Apo-Games family. We recover a PLA taking as input 34 Apo-Game variants developed by using opportunistic reuse. The results provided evidence that our automatic approach is able to identify and filter outlier variants, allowing to eliminate exclusive packages and classes without removing the whole variant. We consider that the recovered PLA can help domain experts to take informed decisions to support SPL adoption.This research was partially funded by INES 2.0; CNPq grants 465614/2014-0 and 408356/2018-9; and FAPESB grants JCB0060/2016 and BOL2443/201
- …