568 research outputs found

    Source File Set Search for Clone-and-Own Reuse Analysis

    Get PDF
    Clone-and-own approach is a natural way of source code reuse for software developers. To assess how known bugs and security vulnerabilities of a cloned component affect an application, developers and security analysts need to identify an original version of the component and understand how the cloned component is different from the original one. Although developers may record the original version information in a version control system and/or directory names, such information is often either unavailable or incomplete. In this research, we propose a code search method that takes as input a set of source files and extracts all the components including similar files from a software ecosystem (i.e., a collection of existing versions of software packages). Our method employs an efficient file similarity computation using b-bit minwise hashing technique. We use an aggregated file similarity for ranking components. To evaluate the effectiveness of this tool, we analyzed 75 cloned components in Firefox and Android source code. The tool took about two hours to report the original components from 10 million files in Debian GNU/Linux packages. Recall of the top-five components in the extracted lists is 0.907, while recall of a baseline using SHA-1 file hash is 0.773, according to the ground truth recorded in the source code repositories.Comment: 14th International Conference on Mining Software Repositorie

    Does Code Review Speed Matter for Practitioners?

    Get PDF
    Increasing code velocity is a common goal for a variety of software projects. The efficiency of the code review process significantly impacts how fast the code gets merged into the final product and reaches the customers. We conducted a survey to study the code velocity-related beliefs and practices in place. We analyzed 75 completed surveys from 39 participants from the industry and 36 from the open-source community. Our critical findings are (a) the industry and open-source community hold a similar set of beliefs, (b) quick reaction time is of utmost importance and applies to the tooling infrastructure and the behavior of other engineers, (c) time-to merge is the essential code review metric to improve, (d) engineers have differing opinions about the benefits of increased code velocity for their career growth, and (e) the controlled application of the commit-then-review model can increase code velocity. Our study supports the continued need to invest in and improve code velocity regardless of the underlying organizational ecosystem

    iLeak: A Lightweight System for Detecting Inadvertent Information Leaks

    Get PDF
    Data loss incidents, where data of sensitive nature are exposed to the public, have become too frequent and have caused damages of millions of dollars to companies and other organizations. Repeatedly, information leaks occur over the Internet, and half of the time they are accidental, caused by user negligence, misconfiguration of software, or inadequate understanding of an application's functionality. This paper presents iLeak, a lightweight, modular system for detecting inadvertent information leaks. Unlike previous solutions, iLeak builds on components already present in modern computers. In particular, we employ system tracing facilities and data indexing services, and combine them in a novel way to detect data leaks. Our design consists of three components: uaudits are responsible for capturing the information that exits the system, while Inspectors use the indexing service to identify if the transmitted data belong to files that contain potentially sensitive information. The Trail Gateway handles the communication and synchronization of uaudits and Inspectors. We implemented iLeak on Mac OS X using DTrace and the Spotlight indexing service. Finally, we show that iLeak is indeed lightweight, since it only incurs 4% overhead on protected applications

    Public Sector Open Source Software Projects -- How is development organized?

    Full text link
    Background: Open Source Software (OSS) started as an effort of communities of volunteers, but its practices have been adopted far beyond these initial scenarios. For instance, the strategic use of OSS in industry is constantly growing nowadays in different verticals, including energy, automotive, and health. For the public sector, however, the adoption has lagged behind even if benefits particularly salient in the public sector context such as improved interoperability, transparency, and digital sovereignty have been pointed out. When Public Sector Organisations (PSOs) seek to engage with OSS, this introduces challenges as they often lack the necessary technical capabilities, while also being bound and influenced by regulations and practices for public procurement. Aim: We aim to shed light on how public sector OSS projects, i.e., projects initiated, developed and governed by public sector organizations, are developed and structured. We conjecture, based on the challenges of PSOs, that the way development is organized in these type of projects to a large extent disalign with the commonly adopted bazaar model (popularized by Eric Raymond), which implies that development is carried out collaboratively in a larger community. Method: We plan to contrast public sector OSS projects with a set of earlier reported case studies of bazaar OSS projects, including Mockus et al.'s reporting of the Apache web server and Mozilla browser OSS projects, along with the replications performed on the FreeBSD, JBossAS, JOnAS, and Apache Geronimo OSS projects. To enable comparable results, we will replicate the methodology used by Mockus et al. on a purposefully sampled subset of public sector OSS projects. The subset will be identified and characterized quantitatively by mining relevant software repositories, and qualitatively investigated through interviews with individuals from involved organizations.Comment: Registered Report accepted at MSR'2
    corecore