311 research outputs found

    A Cross-project Defect Prediction Model Using Feature Transfer and Ensemble Learning

    Get PDF
    Cross-project defect prediction (CPDP) trains the prediction models with existing data from other projects (the source projects) and uses the trained model to predict the target projects. To solve two major problems in CPDP, namely, variability in data distribution and class imbalance, in this paper we raise a CPDP model combining feature transfer and ensemble learning, with two stages of feature transfer and the classification. The feature transfer method is based on Pearson correlation coefficient, which reduces the dimension of feature space and the difference of feature distribution between items. The class imbalance is solved by SMOTE and Voting on both algorithm and data levels. The experimental results on 20 source-target projects show that our method can yield significant improvement on CPDP

    Transfer Learning based Low Shot Classifier for Software Defect Prediction

    Get PDF
    Background: The rapid growth and increasing complexity of software applications are causing challenges in maintaining software quality within constraints of time and resources. This challenge led to the emergence of a new field of study known as Software Defect Prediction (SDP), which focuses on predicting future defect in advance, thereby reducing costs and improving productivity in software industry. Objective: This study aimed to address data distribution disparities when applying transfer learning in multi-project scenarios, and to mitigate performance issues resulting from data scarcity in SDP. Methods: The proposed approach, namely Transfer Learning based Low Shot Classifier (TLLSC), combined transfer learning and low shot learning approaches to create an SDP model. This model was designed for application in both new projects and those with minimal historical defect data. Results: Experiments were conducted using standard datasets from projects within the National Aeronautics and Space Administration (NASA) and Software Research Laboratory (SOFTLAB) repository. TLLSC showed an average increase in F1-Measure of 31.22%, 27.66%, and 27.54% for project AR3, AR4, and AR5, respectively. These results surpassed those from Transfer Component Analysis (TCA+), Canonical Correlation Analysis (CCA+), and Kernel Canonical Correlation Analysis plus (KCCA+). Conclusion: The results of the comparison between TLLSC and state-of-the-art algorithms, namely TCA+, CCA+, and KCCA+ from the existing literature consistently showed that TLLSC performed better in terms of F1-Measure. Keywords: Just-in-time, Defect Prediction, Deep Learning, Transfer Learning, Low Shot Learnin

    Proactive Fault Tolerance Through Cloud Failure Prediction Using Machine Learning

    Get PDF
    One of the crucial aspects of cloud infrastructure is fault tolerance, and its primary responsibility is to address the situations that arise when different architectural parts fail. A sizeable cloud data center must deliver high service dependability and availability while minimizing failure incidence. However, modern large cloud data centers continue to have significant failure rates owing to a variety of factors, including hardware and software faults, which often lead to task and job failures. To reduce unexpected loss, it is critical to forecast task or job failures with high accuracy before they occur. This research examines the performance of four machine learning (ML) algorithms for forecasting failure in a real-time cloud environment to increase system availability using real-time data gathered from the Google Cluster Workload Traces 2019. We applied four distinct supervised machine learning algorithms are logistic regression, KNN, SVM, decision tree, and logistic regression classifiers. Confusion matrices as well as ROC curves were used to assess the reliability and robustness of each algorithm. This study will assist cloud service providers developing a robust fault tolerance design by optimizing device selection, consequently boosting system availability and eliminating unexpected system downtime

    A Review Of Training Data Selection In Software Defect Prediction

    Get PDF
    The publicly available dataset poses a challenge in selecting the suitable data to train a defect prediction model to predict defect on other projects. Using a cross-project training dataset without a careful selection will degrade the defect prediction performance. Consequently, training data selection is an essential step to develop a defect prediction model. This paper aims to synthesize the state-of-the-art for training data selection methods published from 2009 to 2019. The existing approaches addressing the training data selection issue fall into three groups, which are nearest neighbour, cluster-based, and evolutionary method. According to the results in the literature, the cluster-based method tends to outperform the nearest neighbour method. On the other hand, the research on evolutionary techniques gives promising results but is still scarce. Therefore, the review concludes that there is still some open area for further investigation in training data selection. We also present research direction within this are

    Analyzing Data Mining Statistical Models of Bio Medical

    Get PDF
    The main goal of this thesis is to investigate the preformance of different data mining models on Biomedical datasets (heart disease data). I used different data mining models as, neural networks, support vector machine and logictic regression will be executed on these datasets. Some performance metrics such as accurary, precision and recall will be calculated and recorded. I compare the data mining models by using the recorded values of the performance metrics to find the best model for the datasets

    Application of a generative adversarial network for multi-featured fermentation data synthesis and artificial neural network (ANN) modeling of bitter gourd–grape beverage production.

    Get PDF
    Artificial neural networks (ANNs) have in recent times found increasing application in predictive modelling of various food processing operations including fermentation, as they have the ability to learn nonlinear complex relationships in high dimensional datasets, which might otherwise be outside the scope of conventional regression models. Nonetheless, a major limiting factor of ANNs is that they require quite a large amount of training data for better performance. Obtaining such an amount of data from biological processes is usually difficult for many reasons. To resolve this problem, methods are proposed to inflate existing data by artificially synthesizing additional valid data samples. In this paper, we present a generative adversarial network (GAN) able to synthesize an infinite amount of realistic multi-dimensional regression data from limited experimental data (n = 20). Rigorous testing showed that the synthesized data (n = 200) significantly conserved the variances and distribution patterns of the real data. Further, the synthetic data was used to generalize a deep neural network. The model trained on the artificial data showed a lower loss (2.029 ± 0.124) and converged to a solution faster than its counterpart trained on real data (2.1614 ± 0.117)

    A Machine Vision Method for Correction of Eccentric Error: Based on Adaptive Enhancement Algorithm

    Full text link
    In the procedure of surface defects detection for large-aperture aspherical optical elements, it is of vital significance to adjust the optical axis of the element to be coaxial with the mechanical spin axis accurately. Therefore, a machine vision method for eccentric error correction is proposed in this paper. Focusing on the severe defocus blur of reference crosshair image caused by the imaging characteristic of the aspherical optical element, which may lead to the failure of correction, an Adaptive Enhancement Algorithm (AEA) is proposed to strengthen the crosshair image. AEA is consisted of existed Guided Filter Dark Channel Dehazing Algorithm (GFA) and proposed lightweight Multi-scale Densely Connected Network (MDC-Net). The enhancement effect of GFA is excellent but time-consuming, and the enhancement effect of MDC-Net is slightly inferior but strongly real-time. As AEA will be executed dozens of times during each correction procedure, its real-time performance is very important. Therefore, by setting the empirical threshold of definition evaluation function SMD2, GFA and MDC-Net are respectively applied to highly and slightly blurred crosshair images so as to ensure the enhancement effect while saving as much time as possible. AEA has certain robustness in time-consuming performance, which takes an average time of 0.2721s and 0.0963s to execute GFA and MDC-Net separately on ten 200pixels 200pixels Region of Interest (ROI) images with different degrees of blur. And the eccentricity error can be reduced to within 10um by our method
    • …
    corecore