25,709 research outputs found

    Optimized Deeplearning Algorithm for Software Defects Prediction

    Get PDF
    Accurate software defect prediction (SDP) helps to enhance the quality of the software by identifying potential flaws early in the development process. However, existing approaches face challenges in achieving reliable predictions. To address this, a novel approach is proposed that combines a two-tier-deep learning framework. The proposed work includes four major phases:(a) pre-processing, (b) Dimensionality reduction, (c) Feature Extraction and (d) Two-fold deep learning-based SDP. The collected raw data is initially pre-processed using a data cleaning approach (handling null values and missing data) and a Decimal scaling normalisation approach. The dimensions of the pre-processed data are reduced using the newly developed Incremental Covariance Principal Component Analysis (ICPCA), and this approach aids in solving the “curse of dimensionality” issue. Then, onto the dimensionally reduced data, the feature extraction is performed using statistical features (standard deviation, skewness, variance, and kurtosis), Mutual information (MI), and Conditional entropy (CE). From the extracted features, the relevant ones are selected using the new Euclidean Distance with Mean Absolute Deviation (ED-MAD). Finally, the SDP (decision making) is carried out using the optimized Two-Fold Deep Learning Framework (O-TFDLF), which encapsulates the RBFN and optimized MLP, respectively. The weight of MLP is fine-tuned using the new Levy Flight Cat Mouse Optimisation (LCMO) method to improve the model's prediction accuracy. The final detected outcome (forecasting the presence/ absence of defect) is acquired from optimized MLP. The implementation has been performed using the MATLAB software. By using certain performance metrics such as Sensitivity, Accuracy, Precision, Specificity and MSE the proposed model’s performance is compared to that of existing models. The accuracy achieved for the proposed model is 93.37%

    Deep Incremental Learning of Imbalanced Data for Just-In-Time Software Defect Prediction

    Full text link
    This work stems from three observations on prior Just-In-Time Software Defect Prediction (JIT-SDP) models. First, prior studies treat the JIT-SDP problem solely as a classification problem. Second, prior JIT-SDP studies do not consider that class balancing processing may change the underlying characteristics of software changeset data. Third, only a single source of concept drift, the class imbalance evolution is addressed in prior JIT-SDP incremental learning models. We propose an incremental learning framework called CPI-JIT for JIT-SDP. First, in addition to a classification modeling component, the framework includes a time-series forecast modeling component in order to learn temporal interdependent relationship in the changesets. Second, the framework features a purposefully designed over-sampling balancing technique based on SMOTE and Principal Curves called SMOTE-PC. SMOTE-PC preserves the underlying distribution of software changeset data. In this framework, we propose an incremental deep neural network model called DeepICP. Via an evaluation using \numprojs software projects, we show that: 1) SMOTE-PC improves the model's predictive performance; 2) to some software projects it can be beneficial for defect prediction to harness temporal interdependent relationship of software changesets; and 3) principal curves summarize the underlying distribution of changeset data and reveals a new source of concept drift that the DeepICP model is proposed to adapt to

    GraphEvo: Evaluating Software Evolution Using Machine Learning Based Call Graph Analytics And Network Portrait Divergence

    Get PDF
    Title from PDF of title page, viewed September 9, 2022Dissertation advisor: Yugyung LeeVitaIncludes bibliographical references (pages 151-168)Dissertation (Ph.D)--Department of Computer Science and Electrical Engineering. University of Missouri--Kansas City, 2022Understanding software evolution is essential for software development tasks, including debugging, maintenance, and testing. Unfortunately, as software changes, it becomes more prominent and more complicated, which makes it harder to understand. Software Defect Prediction (SDP) in the codebase is one of the most common ways artificial intelligence (AI) is used to improve the quality of agile products. But graph-based software metrics are seldom used in the software. In this dissertation, we propose a graph-based software framework called GraphEvo based on deep learning modeling for graphs. We applied the recent network comparison advancement to software networks via information theory-based metric Network portrait divergence (NPD). NPD captures the structural changes to call graph-based software networks. The NPD-based method determines what significant software changes are, how much execution paths are affected, and how tests are improved concerning the code. All of these factors affect how reliable the software is. To ensure that NPD-based software works well, version controls and Pull Requests (PRs) are used. GraphEvo's most significant contributions are: (i) Find and show how software has changed over time using call graphs. (ii) Using a machine learning and deep learning techniques to understand the software and guess how many defects are in each code entity (such as a class). (iii) Use the NPD-based tooling to create a public bug dataset and machine learning to see how well it can predict software defects. (iv) Help with the PR review process by knowing how the changes to code and tests that go with them work. We compared the performance of GraphEvo (i) across 66 software releases from five popular Java open-source systems to show that it works, (ii) for 9 Java projects and deep learning to make an SDP model, (iii) for 19 Java projects of different sizes and types from GitHub and to add bug information from other places, and (iv) for 627 PRs from 14 Java projects to see how vital tests are in PRs. These comprehensive experiments show that GraphEvo works well for debugging, maintaining, and testing software. We also received favorable responses from user studies, in which we asked software developers and testers what they thought of GraphEvo.Introduction -- Characterizing and understanding software evolution using call graphs -- Defect prediction using deep learning with NPD for software evolution -- NPD-based tooling, extendible defect dataset and its assessment -- Reviewing pull requests with path-based NPD and test

    Leveraging Machine Learning to Improve Software Reliability

    Get PDF
    Finding software faults is a critical task during the lifecycle of a software system. While traditional software quality control practices such as statistical defect prediction, static bug detection, regression test, and code review are often inefficient and time-consuming, which cannot keep up with the increasing complexity of modern software systems. We argue that machine learning with its capability in knowledge representation, learning, natural language processing, classification, etc., can be used to extract invaluable information from software artifacts that may be difficult to obtain with other research methodologies to improve existing software reliability practices such as statistical defect prediction, static bug detection, regression test, and code review. This thesis presents a suite of machine learning based novel techniques to improve existing software reliability practices for helping developers find software bugs more effective and efficient. First, it introduces a deep learning based defect prediction technique to improve existing statistical defect prediction models. To build accurate prediction models, previous studies focused on manually designing features that encode the statistical characteristics of programs. However, these features often fail to capture the semantic difference of programs, and such a capability is needed for building accurate prediction models. To bridge the gap between programs' semantics and defect prediction features, this thesis leverages deep learning techniques to learn a semantic representation of programs automatically from source code and further build and train defect prediction models by using these semantic features. We examine the effectiveness of the deep learning based prediction models on both the open-source and commercial projects. Results show that the learned semantic features can significantly outperform existing defect prediction models. Second, it introduces an n-gram language based static bug detection technique, i.e., Bugram, to detect new types of bugs with less false positives. Most of existing static bug detection techniques are based on programming rules inferred from source code. It is known that if a pattern does not appear frequently enough, rules are not learned, thus missing many bugs. To solve this issue, this thesis proposes Bugram, which leverages n-gram language models instead of rules to detect bugs. Specifically, Bugram models program tokens sequentially, using the n-gram language model. Token sequences from the program are then assessed according to their probability in the learned model, and low probability sequences are marked as potential bugs. The assumption is that low probability token sequences in a program are unusual, which may indicate bugs, bad practices, or unusual/special uses of code of which developers may want to be aware. We examine the effectiveness of our approach on the latest versions of 16 open-source projects. Results show that Bugram detected 25 new bugs, 23 of which cannot be detected by existing rule-based bug detection approaches, which suggests that Bugram is complementary to existing bug detection approaches to detect more bugs and generates less false positives. Third, it introduces a machine learning based regression test prioritization technique, i.e., QTEP, to find and run test cases that could reveal bugs earlier. Existing test case prioritization techniques mainly focus on maximizing coverage information between source code and test cases to schedule test cases for finding bugs earlier. While they often do not consider the likely distribution of faults in the source code. However, software faults are not often equally distributed in source code, e.g., around 80\% faults are located in about 20\% source code. Intuitively, test cases that cover the faulty source code should have higher priorities, since they are more likely to find faults. To solve this issue, this thesis proposes QTEP, which leverages machine learning models to evaluate source code quality and then adapt existing test case prioritization algorithms by considering the weighted source code quality. Evaluation on seven open-source projects shows that QTEP can significantly outperform existing test case prioritization techniques to find failed test cases early. Finally, it introduces a machine learning based approach to identifying risky code review requests. Code review has been widely adopted in the development process of both the proprietary and open-source software, which helps improve the maintenance and quality of software before the code changes being merged into the source code repository. Our observation on code review requests from four large-scale projects reveals that around 20\% changes cannot pass the first round code review and require non-trivial revision effort (i.e., risky changes). In addition, resolving these risky changes requires 3X more time and 1.6X more reviewers than the regular changes (i.e., changes pass the first code review) on average. This thesis presents the first study to characterize these risky changes and automatically identify these risky changes with machine learning classifiers. Evaluation on one proprietary project and three large-scale open-source projects (i.e., Qt, Android, and OpenStack) shows that our approach is effective in identifying risky code review requests. Taken together, the results of the four studies provide evidence that machine learning can help improve traditional software reliability such as statistical defect prediction, static bug detection, regression test, and code review

    Leveraging Machine Learning to Improve Software Reliability

    Get PDF
    Finding software faults is a critical task during the lifecycle of a software system. While traditional software quality control practices such as statistical defect prediction, static bug detection, regression test, and code review are often inefficient and time-consuming, which cannot keep up with the increasing complexity of modern software systems. We argue that machine learning with its capability in knowledge representation, learning, natural language processing, classification, etc., can be used to extract invaluable information from software artifacts that may be difficult to obtain with other research methodologies to improve existing software reliability practices such as statistical defect prediction, static bug detection, regression test, and code review. This thesis presents a suite of machine learning based novel techniques to improve existing software reliability practices for helping developers find software bugs more effective and efficient. First, it introduces a deep learning based defect prediction technique to improve existing statistical defect prediction models. To build accurate prediction models, previous studies focused on manually designing features that encode the statistical characteristics of programs. However, these features often fail to capture the semantic difference of programs, and such a capability is needed for building accurate prediction models. To bridge the gap between programs' semantics and defect prediction features, this thesis leverages deep learning techniques to learn a semantic representation of programs automatically from source code and further build and train defect prediction models by using these semantic features. We examine the effectiveness of the deep learning based prediction models on both the open-source and commercial projects. Results show that the learned semantic features can significantly outperform existing defect prediction models. Second, it introduces an n-gram language based static bug detection technique, i.e., Bugram, to detect new types of bugs with less false positives. Most of existing static bug detection techniques are based on programming rules inferred from source code. It is known that if a pattern does not appear frequently enough, rules are not learned, thus missing many bugs. To solve this issue, this thesis proposes Bugram, which leverages n-gram language models instead of rules to detect bugs. Specifically, Bugram models program tokens sequentially, using the n-gram language model. Token sequences from the program are then assessed according to their probability in the learned model, and low probability sequences are marked as potential bugs. The assumption is that low probability token sequences in a program are unusual, which may indicate bugs, bad practices, or unusual/special uses of code of which developers may want to be aware. We examine the effectiveness of our approach on the latest versions of 16 open-source projects. Results show that Bugram detected 25 new bugs, 23 of which cannot be detected by existing rule-based bug detection approaches, which suggests that Bugram is complementary to existing bug detection approaches to detect more bugs and generates less false positives. Third, it introduces a machine learning based regression test prioritization technique, i.e., QTEP, to find and run test cases that could reveal bugs earlier. Existing test case prioritization techniques mainly focus on maximizing coverage information between source code and test cases to schedule test cases for finding bugs earlier. While they often do not consider the likely distribution of faults in the source code. However, software faults are not often equally distributed in source code, e.g., around 80\% faults are located in about 20\% source code. Intuitively, test cases that cover the faulty source code should have higher priorities, since they are more likely to find faults. To solve this issue, this thesis proposes QTEP, which leverages machine learning models to evaluate source code quality and then adapt existing test case prioritization algorithms by considering the weighted source code quality. Evaluation on seven open-source projects shows that QTEP can significantly outperform existing test case prioritization techniques to find failed test cases early. Finally, it introduces a machine learning based approach to identifying risky code review requests. Code review has been widely adopted in the development process of both the proprietary and open-source software, which helps improve the maintenance and quality of software before the code changes being merged into the source code repository. Our observation on code review requests from four large-scale projects reveals that around 20\% changes cannot pass the first round code review and require non-trivial revision effort (i.e., risky changes). In addition, resolving these risky changes requires 3X more time and 1.6X more reviewers than the regular changes (i.e., changes pass the first code review) on average. This thesis presents the first study to characterize these risky changes and automatically identify these risky changes with machine learning classifiers. Evaluation on one proprietary project and three large-scale open-source projects (i.e., Qt, Android, and OpenStack) shows that our approach is effective in identifying risky code review requests. Taken together, the results of the four studies provide evidence that machine learning can help improve traditional software reliability such as statistical defect prediction, static bug detection, regression test, and code review

    Transfer Learning based Low Shot Classifier for Software Defect Prediction

    Get PDF
    Background: The rapid growth and increasing complexity of software applications are causing challenges in maintaining software quality within constraints of time and resources. This challenge led to the emergence of a new field of study known as Software Defect Prediction (SDP), which focuses on predicting future defect in advance, thereby reducing costs and improving productivity in software industry. Objective: This study aimed to address data distribution disparities when applying transfer learning in multi-project scenarios, and to mitigate performance issues resulting from data scarcity in SDP. Methods: The proposed approach, namely Transfer Learning based Low Shot Classifier (TLLSC), combined transfer learning and low shot learning approaches to create an SDP model. This model was designed for application in both new projects and those with minimal historical defect data. Results: Experiments were conducted using standard datasets from projects within the National Aeronautics and Space Administration (NASA) and Software Research Laboratory (SOFTLAB) repository. TLLSC showed an average increase in F1-Measure of 31.22%, 27.66%, and 27.54% for project AR3, AR4, and AR5, respectively. These results surpassed those from Transfer Component Analysis (TCA+), Canonical Correlation Analysis (CCA+), and Kernel Canonical Correlation Analysis plus (KCCA+). Conclusion: The results of the comparison between TLLSC and state-of-the-art algorithms, namely TCA+, CCA+, and KCCA+ from the existing literature consistently showed that TLLSC performed better in terms of F1-Measure. Keywords: Just-in-time, Defect Prediction, Deep Learning, Transfer Learning, Low Shot Learnin

    Cloud-based bug tracking software defects analysis using deep learning

    Get PDF
    Cloud technology is not immune to bugs and issue tracking. A dedicated system is required that will extremely error prone and less cumbersome and must command a high degree of collaboration, flexibility of operations and smart decision making. One of the primary goals of software engineering is to provide high-quality software within a specified budget and period for cloud-based technology. However, defects found in Cloud-Based Bug Tracking software's can result in quality reduction as well as delay in the delivery process. Therefore, software testing plays a vital role in ensuring the quality of software in the cloud, but software testing requires higher time and cost with the increase of complexity of user requirements. This issue is even cumbersome in the embedded software design. Early detection of defect-prone components in general and embedded software helps to recognize which components require higher attention during testing and thereby allocate the available resources effectively and efficiently. This research was motivated by the demand of minimizing the time and cost required for Cloud-Based Bug Tracking Software testing for both embedded and general-purpose software while ensuring the delivery of high-quality software products without any delays emanating from the cloud. Not withstanding that several machine learning techniques have been widely applied for building software defect prediction models in general, achieving higher prediction accuracy is still a challenging task. Thus, the primary aim of this research is to investigate how deep learning methods can be used for Cloud-Based Bug Tracking Software defect detection with a higher accuracy. The research conducted an experiment with four different configurations of Multi-Layer Perceptron neural network using five publicly available software defect datasets. Results of the experiments show that the best possible network configuration for software defect detection model using Multi-Layer Perceptron can be the prediction model with two hidden layers having 25 neurons in the first hidden layer and 5 neurons in the second hidden layer

    In-situ crack and keyhole pore detection in laser directed energy deposition through acoustic signal and deep learning

    Full text link
    Cracks and keyhole pores are detrimental defects in alloys produced by laser directed energy deposition (LDED). Laser-material interaction sound may hold information about underlying complex physical events such as crack propagation and pores formation. However, due to the noisy environment and intricate signal content, acoustic-based monitoring in LDED has received little attention. This paper proposes a novel acoustic-based in-situ defect detection strategy in LDED. The key contribution of this study is to develop an in-situ acoustic signal denoising, feature extraction, and sound classification pipeline that incorporates convolutional neural networks (CNN) for online defect prediction. Microscope images are used to identify locations of the cracks and keyhole pores within a part. The defect locations are spatiotemporally registered with acoustic signal. Various acoustic features corresponding to defect-free regions, cracks, and keyhole pores are extracted and analysed in time-domain, frequency-domain, and time-frequency representations. The CNN model is trained to predict defect occurrences using the Mel-Frequency Cepstral Coefficients (MFCCs) of the lasermaterial interaction sound. The CNN model is compared to various classic machine learning models trained on the denoised acoustic dataset and raw acoustic dataset. The validation results shows that the CNN model trained on the denoised dataset outperforms others with the highest overall accuracy (89%), keyhole pore prediction accuracy (93%), and AUC-ROC score (98%). Furthermore, the trained CNN model can be deployed into an in-house developed software platform for online quality monitoring. The proposed strategy is the first study to use acoustic signals with deep learning for insitu defect detection in LDED process.Comment: 36 Pages, 16 Figures, accepted at journal Additive Manufacturin
    corecore