568 research outputs found
Source File Set Search for Clone-and-Own Reuse Analysis
Clone-and-own approach is a natural way of source code reuse for software
developers. To assess how known bugs and security vulnerabilities of a cloned
component affect an application, developers and security analysts need to
identify an original version of the component and understand how the cloned
component is different from the original one. Although developers may record
the original version information in a version control system and/or directory
names, such information is often either unavailable or incomplete. In this
research, we propose a code search method that takes as input a set of source
files and extracts all the components including similar files from a software
ecosystem (i.e., a collection of existing versions of software packages). Our
method employs an efficient file similarity computation using b-bit minwise
hashing technique. We use an aggregated file similarity for ranking components.
To evaluate the effectiveness of this tool, we analyzed 75 cloned components in
Firefox and Android source code. The tool took about two hours to report the
original components from 10 million files in Debian GNU/Linux packages. Recall
of the top-five components in the extracted lists is 0.907, while recall of a
baseline using SHA-1 file hash is 0.773, according to the ground truth recorded
in the source code repositories.Comment: 14th International Conference on Mining Software Repositorie
Recommended from our members
Analysing the Resolution of Security Bugs in Software Maintenance
Security bugs in software systems are often reported after incidents of malicious attacks. Developers often need to resolve these bugs quickly in order to maintain the security of such systems. Bug resolution includes two kinds of activities: triaging confirms that the bugs are indeed security problems, after which fixing involves making changes to the code.
It is reported in the literature that, statistically, security bugs are reopened more often compared to others, which poses two new research questions: (a) Are developers “rushing” to triage security bugs too soon under the pressure of deadlines? (b) Do developers need to spend more time fixing security bugs to avoid frequent reopening?
This thesis explores these questions in order to determine whether security bug fixing should take a higher priority than other bugs to avoid malicious attackers exploiting vulnerabilities before the problems are fixed, and whether security bug fixing should take a higher priority than other bugs.
In this thesis a quantitative approach has been adopted by conducting statistical empirical studies to observe the behaviour of software developers engaged in dealing with security bugs.
Firstly, the concept of "rush'' has been borrowed from the time management literature to refer to the behaviour of people delivering work under the pressure of deadlines. By observing how developers deliver bug resolution before the deadline of releases, the degree of rush has been measured as the ratio between the actual time spent by developers during triaging and the theoretical time the developers have by delaying the fixes until the next regular release.
In this thesis, a suggest that delaying bug assignment helps find the right developer and gives the developer more time to prepare for the same workload with more relaxed planning constraints. Secondly, to analyse the complexity of security bug fixes, the fan-in complexity of functions relevant to security bugs has been measured, rather than simply measuring the time spent by the software developers on the fixing of such bugs.
The first null hypothesis is tested using a Man-Whitney method on five software case studies, Samba, MozillaFirefox, RedHat, FreeBSD and Mozilla. The second null hypothesis is tested by comparing the results of fixing security and non-security bugs from the Samba and MozillaFirefox case studies.
Statistically significant results suggest that security bugs are triaged in a rush compared to non-security bugs for RedHat, FreeBSD and Mozilla.
In terms of fan-in, the results of the Samba and MozillaFirefox case studies suggest that security bugs are more complex to fix compared to non-security bugs
Does Code Review Speed Matter for Practitioners?
Increasing code velocity is a common goal for a variety of software projects. The efficiency of the code review process significantly impacts how fast the code gets merged into the final product and reaches the customers. We conducted a survey to study the code velocity-related beliefs and practices in place. We analyzed 75 completed surveys from 39 participants from the industry and 36 from the open-source community. Our critical findings are (a) the industry and open-source community hold a similar set of beliefs, (b) quick reaction time is of utmost importance and applies to the tooling infrastructure and the behavior of other engineers, (c) time-to merge is the essential code review metric to improve, (d) engineers have differing opinions about the benefits of increased code velocity for their career growth, and (e) the controlled application of the commit-then-review model can increase code velocity. Our study supports the continued need to invest in and improve code velocity regardless of the underlying organizational ecosystem
iLeak: A Lightweight System for Detecting Inadvertent Information Leaks
Data loss incidents, where data of sensitive nature are exposed to the public, have become too frequent and have caused damages of millions of dollars to companies and other organizations. Repeatedly, information leaks occur over the Internet, and half of the time they are accidental, caused by user negligence, misconfiguration of software, or inadequate understanding of an application's functionality. This paper presents iLeak, a lightweight, modular system for detecting inadvertent information leaks. Unlike previous solutions, iLeak builds on components already present in modern computers. In particular, we employ system tracing facilities and data indexing services, and combine them in a novel way to detect data leaks. Our design consists of three components: uaudits are responsible for capturing the information that exits the system, while Inspectors use the indexing service to identify if the transmitted data belong to files that contain potentially sensitive information. The Trail Gateway handles the communication and synchronization of uaudits and Inspectors. We implemented iLeak on Mac OS X using DTrace and the Spotlight indexing service. Finally, we show that iLeak is indeed lightweight, since it only incurs 4% overhead on protected applications
Public Sector Open Source Software Projects -- How is development organized?
Background: Open Source Software (OSS) started as an effort of communities of
volunteers, but its practices have been adopted far beyond these initial
scenarios. For instance, the strategic use of OSS in industry is constantly
growing nowadays in different verticals, including energy, automotive, and
health. For the public sector, however, the adoption has lagged behind even if
benefits particularly salient in the public sector context such as improved
interoperability, transparency, and digital sovereignty have been pointed out.
When Public Sector Organisations (PSOs) seek to engage with OSS, this
introduces challenges as they often lack the necessary technical capabilities,
while also being bound and influenced by regulations and practices for public
procurement. Aim: We aim to shed light on how public sector OSS projects, i.e.,
projects initiated, developed and governed by public sector organizations, are
developed and structured. We conjecture, based on the challenges of PSOs, that
the way development is organized in these type of projects to a large extent
disalign with the commonly adopted bazaar model (popularized by Eric Raymond),
which implies that development is carried out collaboratively in a larger
community. Method: We plan to contrast public sector OSS projects with a set of
earlier reported case studies of bazaar OSS projects, including Mockus et al.'s
reporting of the Apache web server and Mozilla browser OSS projects, along with
the replications performed on the FreeBSD, JBossAS, JOnAS, and Apache Geronimo
OSS projects. To enable comparable results, we will replicate the methodology
used by Mockus et al. on a purposefully sampled subset of public sector OSS
projects. The subset will be identified and characterized quantitatively by
mining relevant software repositories, and qualitatively investigated through
interviews with individuals from involved organizations.Comment: Registered Report accepted at MSR'2
- …