226 research outputs found

    Empirically-Grounded Construction of Bug Prediction and Detection Tools

    Get PDF
    There is an increasing demand on high-quality software as software bugs have an economic impact not only on software projects, but also on national economies in general. Software quality is achieved via the main quality assurance activities of testing and code reviewing. However, these activities are expensive, thus they need to be carried out efficiently. Auxiliary software quality tools such as bug detection and bug prediction tools help developers focus their testing and reviewing activities on the parts of software that more likely contain bugs. However, these tools are far from adoption as mainstream development tools. Previous research points to their inability to adapt to the peculiarities of projects and their high rate of false positives as the main obstacles of their adoption. We propose empirically-grounded analysis to improve the adaptability and efficiency of bug detection and prediction tools. For a bug detector to be efficient, it needs to detect bugs that are conspicuous, frequent, and specific to a software project. We empirically show that the null-related bugs fulfill these criteria and are worth building detectors for. We analyze the null dereferencing problem and find that its root cause lies in methods that return null. We propose an empirical solution to this problem that depends on the wisdom of the crowd. For each API method, we extract the nullability measure that expresses how often the return value of this method is checked against null in the ecosystem of the API. We use nullability to annotate API methods with nullness annotation and warn developers about missing and excessive null checks. For a bug predictor to be efficient, it needs to be optimized as both a machine learning model and a software quality tool. We empirically show how feature selection and hyperparameter optimizations improve prediction accuracy. Then we optimize bug prediction to locate the maximum number of bugs in the minimum amount of code by finding the most cost-effective combination of bug prediction configurations, i.e., dependent variables, machine learning model, and response variable. We show that using both source code and change metrics as dependent variables, applying feature selection on them, then using an optimized Random Forest to predict the number of bugs results in the most cost-effective bug predictor. Throughout this thesis, we show how empirically-grounded analysis helps us achieve efficient bug prediction and detection tools and adapt them to the characteristics of each software project

    Extracting Build Changes with BUILDDIFF

    Full text link
    Build systems are an essential part of modern software engineering projects. As software projects change continuously, it is crucial to understand how the build system changes because neglecting its maintenance can lead to expensive build breakage. Recent studies have investigated the (co-)evolution of build configurations and reasons for build breakage, but they did this only on a coarse grained level. In this paper, we present BUILDDIFF, an approach to extract detailed build changes from MAVEN build files and classify them into 95 change types. In a manual evaluation of 400 build changing commits, we show that BUILDDIFF can extract and classify build changes with an average precision and recall of 0.96 and 0.98, respectively. We then present two studies using the build changes extracted from 30 open source Java projects to study the frequency and time of build changes. The results show that the top 10 most frequent change types account for 73% of the build changes. Among them, changes to version numbers and changes to dependencies of the projects occur most frequently. Furthermore, our results show that build changes occur frequently around releases. With these results, we provide the basis for further research, such as for analyzing the (co-)evolution of build files with other artifacts or improving effort estimation approaches. Furthermore, our detailed change information enables improvements of refactoring approaches for build configurations and improvements of models to identify error-prone build files.Comment: Accepted at the International Conference of Mining Software Repositories (MSR), 201

    Utilizing public repositories to improve the decision process for security defect resolution and information reuse in the development environment

    Get PDF
    Security risks are contained in solutions in software systems that could have been avoided if the design choices were analyzed by using public information security data sources. Public security sources have been shown to contain more relevant and recent information on current technologies than any textbook or research article, and these sources are often used by developers for solving software related problems. However, solutions copied from public discussion forums such as StackOverflow may contain security implications when copied directly into the developers environment. Several different methods to identify security bugs are being implemented, and recent efforts are looking into identifying security bugs from communication artifacts during software development lifecycle as well as using public security information sources to support secure design and development. The primary goal of this thesis is to investigate how to utilize public information sources to reduce security defects in software artifacts through improving the decision process for defect resolution and information reuse in the development environment. We build a data collection tool for collecting data from public information security sources and public discussion forums, construct machine learning models for classifying discussion forum posts and bug reports as security or not-security related, as well as word embedding models for finding matches between public security sources and public discussion forum posts or bug reports. The results of this thesis demonstrate that using public information security sources can provide additional validation layers for defect classification models, as well as provide additional security context for public discussion forum posts. The contributions of this thesis are to provide understanding of how public information security sources can better provide context for bug reports and discussion forums. Additionally, we provide data collection APIs for collecting datasets from these sources, and classification and word embedding models for recommending related security sources for bug reports and public discussion forum posts.Masteroppgave i Programutvikling samarbeid med HVLPROG399MAMN-PRO

    The Co-Evolution of Test Maintenance and Code Maintenance through the lens of Fine-Grained Semantic Changes

    Full text link
    Automatic testing is a widely adopted technique for improving software quality. Software developers add, remove and update test methods and test classes as part of the software development process as well as during the evolution phase, following the initial release. In this work we conduct a large scale study of 61 popular open source projects and report the relationships we have established between test maintenance, production code maintenance, and semantic changes (e.g, statement added, method removed, etc.). performed in developers' commits. We build predictive models, and show that the number of tests in a software project can be well predicted by employing code maintenance profiles (i.e., how many commits were performed in each of the maintenance activities: corrective, perfective, adaptive). Our findings also reveal that more often than not, developers perform code fixes without performing complementary test maintenance in the same commit (e.g., update an existing test or add a new one). When developers do perform test maintenance, it is likely to be affected by the semantic changes they perform as part of their commit. Our work is based on studying 61 popular open source projects, comprised of over 240,000 commits consisting of over 16,000,000 semantic change type instances, performed by over 4,000 software engineers.Comment: postprint, ICSME 201

    Dependency Management 2.0 – A Semantic Web Enabled Approach

    Get PDF
    Software development and evolution are highly distributed processes that involve a multitude of supporting tools and resources. Application programming interfaces are commonly used by software developers to reduce development cost and complexity by reusing code developed by third-parties or published by the open source community. However, these application programming interfaces have also introduced new challenges to the Software Engineering community (e.g., software vulnerabilities, API incompatibilities, and software license violations) that not only extend beyond the traditional boundaries of individual projects but also involve different software artifacts. As a result, there is the need for a technology-independent representation of software dependency semantics and the ability to seamlessly integrate this representation with knowledge from other software artifacts. The Semantic Web and its supporting technology stack have been widely promoted to model, integrate, and support interoperability among heterogeneous data sources. This dissertation takes advantage of the Semantic Web and its enabling technology stack for knowledge modeling and integration. The thesis introduces five major contributions: (1) We present a formal Software Build System Ontology – SBSON, which captures concepts and properties for software build and dependency management systems. This formal knowledge representation allows us to take advantage of Semantic Web inference services forming the basis for a more flexibility API dependency analysis compared to traditional proprietary analysis approaches. (2) We conducted a user survey which involved 53 open source developers to allow us to gain insights on how actual developers manage API breaking changes. (3) We introduced a novel approach which integrates our SBSON model with knowledge about source code usage and changes within the Maven ecosystem to support API consumers and producers in managing (assessing and minimizing) the impacts of breaking changes. (4) A Security Vulnerability Analysis Framework (SV-AF) is introduced, which integrates builds system, source code, versioning system, and vulnerability ontologies to trace and assess the impact of security vulnerabilities across project boundaries. (5) Finally, we introduce an Ontological Trustworthiness Assessment Model (OntTAM). OntTAM is an integration of our build, source code, vulnerability and license ontologies which supports a holistic analysis and assessment of quality attributes related to the trustworthiness of libraries and APIs in open source systems. Several case studies are presented to illustrate the applicability and flexibility of our modelling approach, demonstrating that our knowledge modeling approach can seamlessly integrate and reuse knowledge extracted from existing build and dependency management systems with other existing heterogeneous data sources found in the software engineering domain. As part of our case studies, we also demonstrate how this unified knowledge model can enable new types of project dependency analysis

    REI:An integrated measure for software reusability

    Get PDF
    To capitalize upon the benefits of software reuse, an efficient selection among candidate reusable assets should be performed in terms of functional fitness and adaptability. The reusability of assets is usually measured through reusability indices. However, these do not capture all facets of reusability, such as structural characteristics, external quality attributes, and documentation. In this paper, we propose a reusability index (REI) as a synthesis of various software metrics and evaluate its ability to quantify reuse, based on IEEE Standard on Software Metrics Validity. The proposed index is compared with existing ones through a case study on 80 reusable open-source assets. To illustrate the applicability of the proposed index, we performed a pilot study, where real-world reuse decisions have been compared with decisions imposed by the use of metrics (including REI). The results of the study suggest that the proposed index presents the highest predictive and discriminative power; it is the most consistent in ranking reusable assets and the most strongly correlated to their levels of reuse. The findings of the paper are discussed to understand the most important aspects in reusability assessment (interpretation of results), and interesting implications for research and practice are provided

    Enhancing Trust –A Unified Meta-Model for Software Security Vulnerability Analysis

    Get PDF
    Over the last decade, a globalization of the software industry has taken place which has facilitated the sharing and reuse of code across existing project boundaries. At the same time, such global reuse also introduces new challenges to the Software Engineering community, with not only code implementation being shared across systems but also any vulnerabilities it is exposed to as well. Hence, vulnerabilities found in APIs no longer affect only individual projects but instead might spread across projects and even global software ecosystem borders. Tracing such vulnerabilities on a global scale becomes an inherently difficult task, with many of the resources required for the analysis not only growing at unprecedented rates but also being spread across heterogeneous resources. Software developers are struggling to identify and locate the required data to take full advantage of these resources. The Semantic Web and its supporting technology stack have been widely promoted to model, integrate, and support interoperability among heterogeneous data sources. This dissertation introduces four major contributions to address these challenges: (1) It provides a literature review of the use of software vulnerabilities databases (SVDBs) in the Software Engineering community. (2) Based on findings from this literature review, we present SEVONT, a Semantic Web based modeling approach to support a formal and semi-automated approach for unifying vulnerability information resources. SEVONT introduces a multi-layer knowledge model which not only provides a unified knowledge representation, but also captures software vulnerability information at different abstract levels to allow for seamless integration, analysis, and reuse of the modeled knowledge. The modeling approach takes advantage of Formal Concept Analysis (FCA) to guide knowledge engineers in identifying reusable knowledge concepts and modeling them. (3) A Security Vulnerability Analysis Framework (SV-AF) is introduced, which is an instantiation of the SEVONT knowledge model to support evidence-based vulnerability detection. The framework integrates vulnerability ontologies (and data) with existing Software Engineering ontologies allowing for the use of Semantic Web reasoning services to trace and assess the impact of security vulnerabilities across project boundaries. Several case studies are presented to illustrate the applicability and flexibility of our modelling approach, demonstrating that the presented knowledge modeling approach cannot only unify heterogeneous vulnerability data sources but also enables new types of vulnerability analysis
    • …
    corecore