9,499 research outputs found

    Class Imbalance Reduction and Centroid based Relevant Project Selection for Cross Project Defect Prediction

    Get PDF
    Cross-Project Defect Prediction (CPDP) is the process of predicting defects in a target project using information from other projects. This can assist developers in prioritizing their testing efforts and finding flaws. Transfer Learning (TL) has been frequently used at CPDP to improve prediction performance by reducing the disparity in data distribution between the source and target projects. Software Defect Prediction (SDP) is a common study topic in software engineering that plays a critical role in software quality assurance. To address the cross-project class imbalance problem, Centroid-based PF-SMOTE for Imbalanced data is used. In this paper, we used a Centroid-based PF-SMOTE to balance the datasets and Centroid based relevant data selection for Cross Project Defect Prediction. These methods use the mean of all attributes in a dataset and calculating the difference between mean of all datasets. For experimentation, the open source software defect datasets namely, AEEM, Re-Link, and NASA, are considered

    Transfer Learning based Low Shot Classifier for Software Defect Prediction

    Get PDF
    Background: The rapid growth and increasing complexity of software applications are causing challenges in maintaining software quality within constraints of time and resources. This challenge led to the emergence of a new field of study known as Software Defect Prediction (SDP), which focuses on predicting future defect in advance, thereby reducing costs and improving productivity in software industry. Objective: This study aimed to address data distribution disparities when applying transfer learning in multi-project scenarios, and to mitigate performance issues resulting from data scarcity in SDP. Methods: The proposed approach, namely Transfer Learning based Low Shot Classifier (TLLSC), combined transfer learning and low shot learning approaches to create an SDP model. This model was designed for application in both new projects and those with minimal historical defect data. Results: Experiments were conducted using standard datasets from projects within the National Aeronautics and Space Administration (NASA) and Software Research Laboratory (SOFTLAB) repository. TLLSC showed an average increase in F1-Measure of 31.22%, 27.66%, and 27.54% for project AR3, AR4, and AR5, respectively. These results surpassed those from Transfer Component Analysis (TCA+), Canonical Correlation Analysis (CCA+), and Kernel Canonical Correlation Analysis plus (KCCA+). Conclusion: The results of the comparison between TLLSC and state-of-the-art algorithms, namely TCA+, CCA+, and KCCA+ from the existing literature consistently showed that TLLSC performed better in terms of F1-Measure. Keywords: Just-in-time, Defect Prediction, Deep Learning, Transfer Learning, Low Shot Learnin

    Federated Transfer Learning with Multimodal Data

    Full text link
    Smart cars, smartphones and other devices in the Internet of Things (IoT), which usually have more than one sensors, produce multimodal data. Federated Learning supports collecting a wealth of multimodal data from different devices without sharing raw data. Transfer Learning methods help transfer knowledge from some devices to others. Federated Transfer Learning methods benefit both Federated Learning and Transfer Learning. This newly proposed Federated Transfer Learning framework aims at connecting data islands with privacy protection. Our construction is based on Federated Learning and Transfer Learning. Compared with previous Federated Transfer Learnings, where each user should have data with identical modalities (either all unimodal or all multimodal), our new framework is more generic, it allows a hybrid distribution of user data. The core strategy is to use two different but inherently connected training methods for our two types of users. Supervised Learning is adopted for users with only unimodal data (Type 1), while Self-Supervised Learning is applied to user with multimodal data (Type 2) for both the feature of each modality and the connection between them. This connection knowledge of Type 2 will help Type 1 in later stages of training. Training in the new framework can be divided in three steps. In the first step, users who have data with the identical modalities are grouped together. For example, user with only sound signals are in group one, and those with only images are in group two, and users with multimodal data are in group three, and so on. In the second step, Federated Learning is executed within the groups, where Supervised Learning and Self-Supervised Learning are used depending on the group's nature. Most of the Transfer Learning happens in the third step, where the related parts in the network obtained from the previous steps are aggregated (federated).Comment: 73 pages, 54 figures, master thesi

    How Far Does the Predictive Decision Impact the Software Project? The Cost, Service Time, and Failure Analysis from a Cross-Project Defect Prediction Model

    Full text link
    Context: Cross-project defect prediction (CPDP) models are being developed to optimize the testing resources. Objectives: Proposing an ensemble classification framework for CPDP as many existing models are lacking with better performances and analysing the main objectives of CPDP from the outcomes of the proposed classification framework. Method: For the classification task, we propose a bootstrap aggregation based hybrid-inducer ensemble learning (HIEL) technique that uses probabilistic weighted majority voting (PWMV) strategy. To know the impact of HIEL on the software project, we propose three project-specific performance measures such as percent of perfect cleans (PPC), percent of non-perfect cleans (PNPC), and false omission rate (FOR) from the predictions to calculate the amount of saved cost, remaining service time, and percent of the failures in the target project. Results: On many target projects from PROMISE, NASA, and AEEEM repositories, the proposed model outperformed recent works such as TDS, TCA+, HYDRA, TPTL, and CODEP in terms of F-measure. In terms of AUC, the TCA+ and HYDRA models stand as strong competitors to the HIEL model. Conclusion: For better predictions, we recommend ensemble learning approaches for the CPDP models. And, to estimate the benefits from the CPDP models, we recommend the above project-specific performance measures

    Deep learning in the wild

    Get PDF
    Invited paperDeep learning with neural networks is applied by an increasing number of people outside of classic research environments, due to the vast success of the methodology on a wide range of machine perception tasks. While this interest is fueled by beautiful success stories, practical work in deep learning on novel tasks without existing baselines remains challenging. This paper explores the specific challenges arising in the realm of real world tasks, based on case studies from research & development in conjunction with industry, and extracts lessons learned from them. It thus fills a gap between the publication of latest algorithmic and methodical developments, and the usually omitted nitty-gritty of how to make them work. Specifically, we give insight into deep learning projects on face matching, print media monitoring, industrial quality control, music scanning, strategy game playing, and automated machine learning, thereby providing best practices for deep learning in practice

    AAAI Workshop on Artificial Intelligence with Biased or Scarce Data (AIBSD)

    Get PDF
    This book is a collection of the accepted papers presented at the Workshop on Artificial Intelligence with Biased or Scarce Data (AIBSD) in conjunction with the 36th AAAI Conference on Artificial Intelligence 2022. During AIBSD 2022, the attendees addressed the existing issues of data bias and scarcity in Artificial Intelligence and discussed potential solutions in real-world scenarios. A set of papers presented at AIBSD 2022 is selected for further publication and included in this book

    A survey on generative adversarial networks for imbalance problems in computer vision tasks

    Get PDF
    Any computer vision application development starts off by acquiring images and data, then preprocessing and pattern recognition steps to perform a task. When the acquired images are highly imbalanced and not adequate, the desired task may not be achievable. Unfortunately, the occurrence of imbalance problems in acquired image datasets in certain complex real-world problems such as anomaly detection, emotion recognition, medical image analysis, fraud detection, metallic surface defect detection, disaster prediction, etc., are inevitable. The performance of computer vision algorithms can significantly deteriorate when the training dataset is imbalanced. In recent years, Generative Adversarial Neural Networks (GANs) have gained immense attention by researchers across a variety of application domains due to their capability to model complex real-world image data. It is particularly important that GANs can not only be used to generate synthetic images, but also its fascinating adversarial learning idea showed good potential in restoring balance in imbalanced datasets. In this paper, we examine the most recent developments of GANs based techniques for addressing imbalance problems in image data. The real-world challenges and implementations of synthetic image generation based on GANs are extensively covered in this survey. Our survey first introduces various imbalance problems in computer vision tasks and its existing solutions, and then examines key concepts such as deep generative image models and GANs. After that, we propose a taxonomy to summarize GANs based techniques for addressing imbalance problems in computer vision tasks into three major categories: 1. Image level imbalances in classification, 2. object level imbalances in object detection and 3. pixel level imbalances in segmentation tasks. We elaborate the imbalance problems of each group, and provide GANs based solutions in each group. Readers will understand how GANs based techniques can handle the problem of imbalances and boost performance of the computer vision algorithms