8 research outputs found

    A Quantitative Study of Java Software Buildability

    Full text link
    Researchers, students and practitioners often encounter a situation when the build process of a third-party software system fails. In this paper, we aim to confirm this observation present mainly as anecdotal evidence so far. Using a virtual environment simulating a programmer's one, we try to fully automatically build target archives from the source code of over 7,200 open source Java projects. We found that more than 38% of builds ended in failure. Build log analysis reveals the largest portion of errors are dependency-related. We also conduct an association study of factors affecting build success

    Extracting Build Changes with BUILDDIFF

    Full text link
    Build systems are an essential part of modern software engineering projects. As software projects change continuously, it is crucial to understand how the build system changes because neglecting its maintenance can lead to expensive build breakage. Recent studies have investigated the (co-)evolution of build configurations and reasons for build breakage, but they did this only on a coarse grained level. In this paper, we present BUILDDIFF, an approach to extract detailed build changes from MAVEN build files and classify them into 95 change types. In a manual evaluation of 400 build changing commits, we show that BUILDDIFF can extract and classify build changes with an average precision and recall of 0.96 and 0.98, respectively. We then present two studies using the build changes extracted from 30 open source Java projects to study the frequency and time of build changes. The results show that the top 10 most frequent change types account for 73% of the build changes. Among them, changes to version numbers and changes to dependencies of the projects occur most frequently. Furthermore, our results show that build changes occur frequently around releases. With these results, we provide the basis for further research, such as for analyzing the (co-)evolution of build files with other artifacts or improving effort estimation approaches. Furthermore, our detailed change information enables improvements of refactoring approaches for build configurations and improvements of models to identify error-prone build files.Comment: Accepted at the International Conference of Mining Software Repositories (MSR), 201

    Identifying Bugs in Make and JVM-Oriented Builds

    Full text link
    Incremental and parallel builds are crucial features of modern build systems. Parallelism enables fast builds by running independent tasks simultaneously, while incrementality saves time and computing resources by processing the build operations that were affected by a particular code change. Writing build definitions that lead to error-free incremental and parallel builds is a challenging task. This is mainly because developers are often unable to predict the effects of build operations on the file system and how different build operations interact with each other. Faulty build scripts may seriously degrade the reliability of automated builds, as they cause build failures, and non-deterministic and incorrect build results. To reason about arbitrary build executions, we present buildfs, a generally-applicable model that takes into account the specification (as declared in build scripts) and the actual behavior (low-level file system operation) of build operations. We then formally define different types of faults related to incremental and parallel builds in terms of the conditions under which a file system operation violates the specification of a build operation. Our testing approach, which relies on the proposed model, analyzes the execution of single full build, translates it into buildfs, and uncovers faults by checking for corresponding violations. We evaluate the effectiveness, efficiency, and applicability of our approach by examining hundreds of Make and Gradle projects. Notably, our method is the first to handle Java-oriented build systems. The results indicate that our approach is (1) able to uncover several important issues (245 issues found in 45 open-source projects have been confirmed and fixed by the upstream developers), and (2) orders of magnitude faster than a state-of-the-art tool for Make builds

    Characteristics of Useful Code Reviews: An Empirical Study at Microsoft

    Get PDF
    Abstract-Over the past decade, both open source and commercial software projects have adopted contemporary peer code review practices as a quality control mechanism. Prior research has shown that developers spend a large amount of time and effort performing code reviews. Therefore, identifying factors that lead to useful code reviews can benefit projects by increasing code review effectiveness and quality. In a three-stage mixed research study, we qualitatively investigated what aspects of code reviews make them useful to developers, used our findings to build and verify a classification model that can distinguish between useful and not useful code review feedback, and finally we used this classifier to classify review comments enabling us to empirically investigate factors that lead to more effective code review feedback. In total, we analyzed 1.5 millions review comments from five Microsoft projects and uncovered many factors that affect the usefulness of review feedback. For example, we found that the proportion of useful comments made by a reviewer increases dramatically in the first year that he or she is at Microsoft but tends to plateau afterwards. In contrast, we found that the more files that are in a change, the lower the proportion of comments in the code review that will be of value to the author of the change. Based on our findings, we provide recommendations for practitioners to improve effectiveness of code reviews

    A Systematic Mapping Study of Empirical Studies Performed with Collections of Software Projects

    Get PDF
    Contexto: los proyectos software son insumos comunes en los experimentos de la Ingeniería del Software, aunque estos muchas veces sean seleccionados sin seguir una estrategia específica, lo cual disminuye la representatividad y replicación de los resultados. Una opción es el uso de colecciones preservadas de proyectos software, pero estas deben ser vigentes y con reglas explícitas que aseguren su actualización a lo largo del tiempo. Objetivo: realizar un estudio secundario sistematizado sobre las estrategias de selección de los proyectos software en estudios empíricos para conocer las reglas tenidas en cuenta, el grado de uso de colecciones de proyectos, los metadatos extraídos y los análisis estadísticos posteriores realizados. Método: se utilizó un mapeo sistemático para identificar estudios publicados desde enero de 2013 a diciembre de 2020. Resultados: se identificaron 122 estudios de los cuales el 72% utilizó reglas propias para la selección de proyectos y un 27% usó colecciones de proyectos existentes. Asimismo, no se encontraron evidencias de un marco estandarizado para la selección de proyectos, ni la aplicación de métodos estadísticos que se relacionen con la estrategia de recolección de las muestras.Context: software projects are commonresources in Software Engineering experiments,although these are often selected without following a specific strategy, which reduces the representativeness and replication of the results. An option is the use of preserved collections of software projects, but these must be current, with explicit guidelines that guarante etheir updating over a long period of time. Goal: to carry out a systematic secondary study about the strategies to select software projects in empirical studies to discover the guidelines taken into account, the degree of use of project collections, the meta-data extracted and the subsequent statistical analysis conducted. Method: A systematic mapping study to identify studies published from January 2013 to December 2020. Results: 122 studies were identified, of which the 72% used their own guidelines for project selection and the 27% used existent project collections. Likewise, there was no evidence of a standardized framework for the project selection process, nor the application of statistical methods that relates with the sample collection strategy.Fil: Carruthers, Juan Andrés. Universidad Nacional del Nordeste. Facultad de Cs.exactas Naturales y Agrimensura. Departamento de Informatica; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Nordeste; ArgentinaFil: Diaz Pace, Jorge Andres. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Tandil. Instituto Superior de Ingeniería del Software. Universidad Nacional del Centro de la Provincia de Buenos Aires. Instituto Superior de Ingeniería del Software; ArgentinaFil: Irrazábal, Emanuel Agustín. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Nordeste; Argentina. Universidad Nacional del Nordeste. Facultad de Cs.exactas Naturales y Agrimensura. Departamento de Informatica; Argentin

    A Systematic Mapping Study of Empirical Studies Performed with Collections of Software Projects

    Get PDF
    Contexto: los proyectos software son insumos comunes en los experimentos de la Ingeniería del Software, aunque estos muchas veces sean seleccionados sin seguir una estrategia específica, lo cual disminuye la representatividad y replicación de los resultados. Una opción es el uso de colecciones preservadas de proyectos software, pero estas deben ser vigentes y con reglas explícitas que aseguren su actualización a lo largo del tiempo. Objetivo: realizar un estudio secundario sistematizado sobre las estrategias de selección de los proyectos software en estudios empíricos para conocer las reglas tenidas en cuenta, el grado de uso de colecciones de proyectos, los metadatos extraídos y los análisis estadísticos posteriores realizados. Método: se utilizó un mapeo sistemático para identificar estudios publicados desde enero de 2013 a diciembre de 2020. Resultados: se identificaron 122 estudios de los cuales el 72% utilizó reglas propias para la selección de proyectos y un 27% usó colecciones de proyectos existentes. Asimismo, no se encontraron evidencias de un marco estandarizado para la selección de proyectos, ni la aplicación de métodos estadísticos que se relacionen con la estrategia de recolección de las muestras.Context: software projects are commonresources in Software Engineering experiments,although these are often selected without following a specific strategy, which reduces the representativeness and replication of the results. An option is the use of preserved collections of software projects, but these must be current, with explicit guidelines that guarante etheir updating over a long period of time. Goal: to carry out a systematic secondary study about the strategies to select software projects in empirical studies to discover the guidelines taken into account, the degree of use of project collections, the meta-data extracted and the subsequent statistical analysis conducted. Method: A systematic mapping study to identify studies published from January 2013 to December 2020. Results: 122 studies were identified, of which the 72% used their own guidelines for project selection and the 27% used existent project collections. Likewise, there was no evidence of a standardized framework for the project selection process, nor the application of statistical methods that relates with the sample collection strategy.Fil: Carruthers, Juan Andrés. Universidad Nacional del Nordeste. Facultad de Cs.exactas Naturales y Agrimensura. Departamento de Informatica; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Nordeste; ArgentinaFil: Diaz Pace, Jorge Andres. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Tandil. Instituto Superior de Ingeniería del Software. Universidad Nacional del Centro de la Provincia de Buenos Aires. Instituto Superior de Ingeniería del Software; ArgentinaFil: Irrazábal, Emanuel Agustín. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Nordeste; Argentina. Universidad Nacional del Nordeste. Facultad de Cs.exactas Naturales y Agrimensura. Departamento de Informatica; Argentin

    Dependency Management 2.0 – A Semantic Web Enabled Approach

    Get PDF
    Software development and evolution are highly distributed processes that involve a multitude of supporting tools and resources. Application programming interfaces are commonly used by software developers to reduce development cost and complexity by reusing code developed by third-parties or published by the open source community. However, these application programming interfaces have also introduced new challenges to the Software Engineering community (e.g., software vulnerabilities, API incompatibilities, and software license violations) that not only extend beyond the traditional boundaries of individual projects but also involve different software artifacts. As a result, there is the need for a technology-independent representation of software dependency semantics and the ability to seamlessly integrate this representation with knowledge from other software artifacts. The Semantic Web and its supporting technology stack have been widely promoted to model, integrate, and support interoperability among heterogeneous data sources. This dissertation takes advantage of the Semantic Web and its enabling technology stack for knowledge modeling and integration. The thesis introduces five major contributions: (1) We present a formal Software Build System Ontology – SBSON, which captures concepts and properties for software build and dependency management systems. This formal knowledge representation allows us to take advantage of Semantic Web inference services forming the basis for a more flexibility API dependency analysis compared to traditional proprietary analysis approaches. (2) We conducted a user survey which involved 53 open source developers to allow us to gain insights on how actual developers manage API breaking changes. (3) We introduced a novel approach which integrates our SBSON model with knowledge about source code usage and changes within the Maven ecosystem to support API consumers and producers in managing (assessing and minimizing) the impacts of breaking changes. (4) A Security Vulnerability Analysis Framework (SV-AF) is introduced, which integrates builds system, source code, versioning system, and vulnerability ontologies to trace and assess the impact of security vulnerabilities across project boundaries. (5) Finally, we introduce an Ontological Trustworthiness Assessment Model (OntTAM). OntTAM is an integration of our build, source code, vulnerability and license ontologies which supports a holistic analysis and assessment of quality attributes related to the trustworthiness of libraries and APIs in open source systems. Several case studies are presented to illustrate the applicability and flexibility of our modelling approach, demonstrating that our knowledge modeling approach can seamlessly integrate and reuse knowledge extracted from existing build and dependency management systems with other existing heterogeneous data sources found in the software engineering domain. As part of our case studies, we also demonstrate how this unified knowledge model can enable new types of project dependency analysis
    corecore