6,551 research outputs found

    An Innovative Approach for Predicting Software Defects by Handling Class Imbalance Problem

    Get PDF
    From last decade unbalanced data has gained attention as a major challenge for enhancing software quality and reliability. Due to evolution in advanced software development tools and processes, today’s developed software product is much larger and complicated in nature. The software business faces a major issue in maintaining software performance and efficiency as well as cost of handling software issues after deployment of software product. The effectiveness of defect prediction model has been hampered by unbalanced data in terms of data analysis, biased result, model accuracy and decision making. Predicting defects before they affect your software product is one way to cut costs required to maintain software quality. In this study we are proposing model using two level approach for class imbalance problem which will enhance accuracy of prediction model. In the first level, model will balance predictive class at data level by applying sampling method. Second level we will use Random Forest machine learning approach which will create strong classifier for software defect. Hence, we can enhance software defect prediction model accuracy by handling class imbalance issue at data and algorithm level

    An Exploratory Framework for Intelligent Labelling of Fault Datasets

    Get PDF
    Software fault prediction (SFP) has become a pivotal aspect in realm of software quality. Nevertheless, discipline of software quality suffers the starvation of fault datasets. Most of the research endeavors are focused on type of dataset, its granularity, metrics used and metrics extractors. However, sporadic attention has been exerted on development of fault datasets and their associated challenges. There are very few publicly available datasets limiting the possibilities of comprehensive experiments on way to improvising the quality of software. Current research targets to address the challenges pertinent to fault dataset collection and development if one is not available publicly. It also considers dynamic identification of available resources such as public dataset, open-source software archieves, metrics parsers and intelligent labeling techniques. A framework for dataset collection and development process has been furnished along with evaluation procedure for the identified resources

    Using Machine Learning Techniques to Improve Static Code Analysis Tools Usefulness

    Get PDF
    Indiana University-Purdue University Indianapolis (IUPUI)This dissertation proposes an approach to reduce the cost of manual inspections for as large a number of false positive warnings that are being reported by Static Code Analysis (SCA) tools as much as possible using Machine Learning (ML) techniques. The proposed approach neither assume to use the particular SCA tools nor depends on the specific programming language used to write the target source code or the application. To reduce the number of false positive warnings we first evaluated a number of SCA tools in terms of software engineering metrics using a highlighted synthetic source code named the Juliet test suite. From this evaluation, we concluded that the SCA tools report plenty of false positive warnings that need a manual inspection. Then we generated a number of datasets from the source code that forced the SCA tool to generate either true positive, false positive, or false negative warnings. The datasets, then, were used to train four of ML classifiers in order to classify the collected warnings from the synthetic source code. From the experimental results of the ML classifiers, we observed that the classifier that built using the Random Forests (RF) technique outperformed the rest of the classifiers. Lastly, using this classifier and an instance-based transfer learning technique, we ranked a number of warnings that were aggregated from various open-source software projects. The experimental results show that the proposed approach to reduce the cost of the manual inspection of the false positive warnings outperformed the random ranking algorithm and was highly correlated with the ranked list that the optimal ranking algorithm generated

    A Novel Developed Supervised Machine Learning System For Classification And Prediction of Software Faults Using NASA Dataset

    Get PDF
    The software systems of modern computers are extremely complex and versatile. Therefore, it is essential to regularly detect and correct software design faults. In order to devote resources effectively towards the creation of trustworthy software, software companies are increasingly engaging in the practise of predicting fault-prone modules in advance of testing. These software fault prediction methods rely on the thoroughness with which prior software versions' fault as well as related code has been retrievedTime, energy, and money are all saved as a result. Increases the company's initial success and bottom line greatly by satisfying its clientele. Numerous academics have poured into this area throughout the years in an effort to raise the bar for all software. Nowadays, The most often used approaches in this field are those based on machine learning (ML). The field of ML seeks to perfect software capable of evolving as well as adapting in response to fresh data. This paper introduces a fresh approach for doing ML by bringing together a number of different expert systems. In order to reach agreement on which aspects of a software system need to be tested, the proposed multi-classifier model pools the strengths of the most effective classifiers. Several top-performing classifiers for defect prediction are put through their paces in an experiential evaluation. We test our method on 16 publicly available datasets from the NASA Metric Data Programme (MDP) repository at the promise repository. Parameters of confusion, recall, precision, recognition accuracy, etc., are evaluated and contrasted with existing schemes in a software analysis performed with the help of the python simulation tool with findings. The experimental outcomes demonstrate that by combining LGBM, XGBoost, and Voting classifiers, using a multi classifier approach, we are capable to significantly improve software fault prediction performance. The results of the investigation show that the suggested method will lead to better practical outcomes in the prediction of device failures

    On Privacy and Utility while Improving Software Quality

    Get PDF
    Software development produces large amounts of data both from the process, as well as the usage of the software product. Software engineering data science turns this data into actionable insights for improving software quality. However, the processing of this data can raise privacy concerns for organizations, which are obligated by law, regulations and polices, to protect personal and business sensitive data. Early data privacy studies in sub-disciplines of software engineering found that applying privacy algorithms often degraded the usefulness of data. Hence, there is a recognized need for finding a balance between privacy and utility. A survey of data privacy solutions for software engineering data was conducted. Overall, researchers found that a combination of data minimization and obfuscation of data, produced results with high levels of privacy while allowing data to remain useful

    Empirical Evaluation of Mimic Software Project Data Sets for Software Effort Estimation

    Get PDF
    To conduct empirical research on industry software development, it is necessary to obtain data of real software projects from industry. However, only few such industry data sets are publicly available; and unfortunately, most of them are very old. In addition, most of today's software companies cannot make their data open, because software development involves many stakeholders, and thus, its data confidentiality must be strongly preserved. To that end, this study proposes a method for artificially generating a “mimic” software project data set, whose characteristics (such as average, standard deviation and correlation coefficients) are very similar to a given confidential data set. Instead of using the original (confidential) data set, researchers are expected to use the mimic data set to produce similar results as the original data set. The proposed method uses the Box-Muller transform for generating normally distributed random numbers; and exponential transformation and number reordering for data mimicry. To evaluate the efficacy of the proposed method, effort estimation is considered as potential application domain for employing mimic data. Estimation models are built from 8 reference data sets and their concerning mimic data. Our experiments confirmed that models built from mimic data sets show similar effort estimation performance as the models built from original data sets, which indicate the capability of the proposed method in generating representative samples

    The Importance of Accounting for Real-World Labelling When Predicting Software Vulnerabilities

    Get PDF
    Previous work on vulnerability prediction assume that predictive models are trained with respect to perfect labelling information (includes labels from future, as yet undiscovered vulnerabilities). In this paper we present results from a comprehensive empirical study of 1,898 real-world vulnerabilities reported in 74 releases of three security-critical open source systems (Linux Kernel, OpenSSL and Wiresark). Our study investigates the effectiveness of three previously proposed vulnerability prediction approaches, in two settings: with and without the unrealistic labelling assumption. The results reveal that the unrealistic labelling assumption can profoundly mis- lead the scientific conclusions drawn; suggesting highly effective and deployable prediction results vanish when we fully account for realistically available labelling in the experimental methodology. More precisely, MCC mean values of predictive effectiveness drop from 0.77, 0.65 and 0.43 to 0.08, 0.22, 0.10 for Linux Kernel, OpenSSL and Wiresark, respectively. Similar results are also obtained for precision, recall and other assessments of predictive efficacy. The community therefore needs to upgrade experimental and empirical methodology for vulnerability prediction evaluation and development to ensure robust and actionable scientific findings

    Proceedings of the Twenty-Third Annual Software Engineering Workshop

    Get PDF
    The Twenty-third Annual Software Engineering Workshop (SEW) provided 20 presentations designed to further the goals of the Software Engineering Laboratory (SEL) of the NASA-GSFC. The presentations were selected on their creativity. The sessions which were held on 2-3 of December 1998, centered on the SEL, Experimentation, Inspections, Fault Prediction, Verification and Validation, and Embedded Systems and Safety-Critical Systems

    Artificial intelligence for advanced manufacturing quality

    Get PDF
    100 p.This Thesis addresses the challenge of AI-based image quality control systems applied to manufacturing industry, aiming to improve this field through the use of advanced techniques for data acquisition and processing, in order to obtain robust, reliable and optimal systems. This Thesis presents contributions onthe use of complex data acquisition techniques, the application and design of specialised neural networks for the defect detection, and the integration and validation of these systems in production processes. It has been developed in the context of several applied research projects that provided a practical feedback of the usefulness of the proposed computational advances as well as real life data for experimental validation
    corecore