26,476 research outputs found

    Data quality: Some comments on the NASA software defect datasets

    Get PDF
    Background-Self-evidently empirical analyses rely upon the quality of their data. Likewise, replications rely upon accurate reporting and using the same rather than similar versions of datasets. In recent years, there has been much interest in using machine learners to classify software modules into defect-prone and not defect-prone categories. The publicly available NASA datasets have been extensively used as part of this research. Objective-This short note investigates the extent to which published analyses based on the NASA defect datasets are meaningful and comparable. Method-We analyze the five studies published in the IEEE Transactions on Software Engineering since 2007 that have utilized these datasets and compare the two versions of the datasets currently in use. Results-We find important differences between the two versions of the datasets, implausible values in one dataset and generally insufficient detail documented on dataset preprocessing. Conclusions-It is recommended that researchers 1) indicate the provenance of the datasets they use, 2) report any preprocessing in sufficient detail to enable meaningful replication, and 3) invest effort in understanding the data prior to applying machine learners

    The Co-Evolution of Test Maintenance and Code Maintenance through the lens of Fine-Grained Semantic Changes

    Full text link
    Automatic testing is a widely adopted technique for improving software quality. Software developers add, remove and update test methods and test classes as part of the software development process as well as during the evolution phase, following the initial release. In this work we conduct a large scale study of 61 popular open source projects and report the relationships we have established between test maintenance, production code maintenance, and semantic changes (e.g, statement added, method removed, etc.). performed in developers' commits. We build predictive models, and show that the number of tests in a software project can be well predicted by employing code maintenance profiles (i.e., how many commits were performed in each of the maintenance activities: corrective, perfective, adaptive). Our findings also reveal that more often than not, developers perform code fixes without performing complementary test maintenance in the same commit (e.g., update an existing test or add a new one). When developers do perform test maintenance, it is likely to be affected by the semantic changes they perform as part of their commit. Our work is based on studying 61 popular open source projects, comprised of over 240,000 commits consisting of over 16,000,000 semantic change type instances, performed by over 4,000 software engineers.Comment: postprint, ICSME 201

    Ways of Applying Artificial Intelligence in Software Engineering

    Full text link
    As Artificial Intelligence (AI) techniques have become more powerful and easier to use they are increasingly deployed as key components of modern software systems. While this enables new functionality and often allows better adaptation to user needs it also creates additional problems for software engineers and exposes companies to new risks. Some work has been done to better understand the interaction between Software Engineering and AI but we lack methods to classify ways of applying AI in software systems and to analyse and understand the risks this poses. Only by doing so can we devise tools and solutions to help mitigate them. This paper presents the AI in SE Application Levels (AI-SEAL) taxonomy that categorises applications according to their point of AI application, the type of AI technology used and the automation level allowed. We show the usefulness of this taxonomy by classifying 15 papers from previous editions of the RAISE workshop. Results show that the taxonomy allows classification of distinct AI applications and provides insights concerning the risks associated with them. We argue that this will be important for companies in deciding how to apply AI in their software applications and to create strategies for its use

    Sketch-To-Solution: An Exploration of Viscous CFD with Automatic Grids

    Get PDF
    Numerical simulation of the Reynolds-averaged NavierStokes (RANS) equations has become a critical tool for the design of aerospace vehicles. However, the issues that affect the grid convergence of three dimensional RANS solutions are not completely understood, as documented in the AIAA Drag Prediction Workshop series. Grid adaption methods have the potential for increasing the automation and discretization error control of RANS solutions to impact the aerospace design and certification process. The realization of the CFD Vision 2030 Study includes automated management of errors and uncertainties of physics-based, predictive modeling that can set the stage for ensuring a vehicle is in compliance with a regulation or specification by using analysis without demonstration in flight test (i.e., certification or qualification by analysis). For example, the Cart3D inviscid analysis package has automated Cartesian cut-cell gridding with output-based error control. Fueled by recent advances in the fields of anisotropic grid adaptation, error estimation, and geometry modeling, a similar work flow is explored for viscous CFD simulations; where a CFD application engineer provides geometry, boundary conditions, and flow parameters, and the sketch-to-solution process yields a CFD simulation through automatic, error-based, grid adaptation

    Snoring: A Noise Defect Prediction Datasets

    Get PDF
    Defect prediction aims at identifying software artifacts that are likely to exhibit a defect. The main purpose of defect prediction is to reduce the cost of testing and code review, by letting developers focus on speciļ¬c artifacts. Several researchers have worked on improving the accuracy of defect estimation models using techniques such as tuning, re-balancing, or feature selection. Ultimately, the reliability of a prediction model depends on the quality of the dataset. Therefore eļ¬€ort has been spent in identifying sources of noise in the datasets, and how to deal with them, including defect misclassiļ¬cation and defect origin. A key component of defect prediction approaches is the attribution of a defect to a projects release. Although developers might be able to attribute a defect to a speciļ¬c release, in most cases a defect is attributed to the release after which the defect has been discovered. However, in many circumstances, it can happen that a defect is only discovered several releases after its introduction. This might introduce a bias in the dataset, i.e., treating the intermediate releases as defect-free and the latter as defect-prone. We call this phenomenon a ā€œsleeping defectā€. We call ā€œsnoringā€ the phenomenon in which classes are aļ¬€ected by sleeping defects only, that would be treated as defect-free until the defect is discovered. In this work, we analyze, on data from more than 4,000 bugs and 600 releases of 20 open source projects from the Apache ecosystem for investigating: 1)the magnitude of the sleeping defects, 2) the magnitude of the snoring classes, 3)if snoring impacts the evaluation of classiļ¬ers, 4)if snoring impacts classiļ¬er accuracy, and 5)if removing the last releases of data is beneļ¬cial in reducing the negative impact of the snoring noise on classiļ¬ers accuracy. Our results show that, on average across projects: 1)most of the defects in a project slept for more than 19% of the existing releases, 2)the missing rate is more than 50% unless we remove more than 20% of the releases, 3) the relative error in measuring the classiļ¬er accuracy achieved by using a dataset with snoring is about 100% in all accuracy metrics other than AUC, 4) the presence of snoring decreases the accuracy in each of the 15 classiļ¬ers, in each of the 6 accuracy metrics. For instance, Recall, F1, Kappa and Matthews decreases by about 80%, and 5) removing one release of data is better than removing no data in all accuracy metrics. For instance, Recall, F1, Kappa and Matthews increase by about 30%
    • ā€¦
    corecore