21 research outputs found

    Heterogeneous Cross-Project Defect Prediction using Encoder and Transfer Learning

    Get PDF
    Heterogeneous cross-project defect prediction (HCPDP) aims to predict defects in new software projects using defect data from previous software projects where the source and target projects have some different metrics. Most existing methods only find linear relationships in the software defect features and datasets. Additionally, these methods use multiple defect datasets from different projects as source datasets. In this paper, we propose a novel method called heterogeneous cross-project defect prediction using encoder and transfer learning (ETL). ETL uses encoders to extract the important features from source and target datasets. Also, to minimize negative transfer during transfer learning, we used an augmented dataset that contains pseudo-labels and the source dataset. Additionally, we have used very limited data to train the model. To evaluate the performance of the ETL approach, 16 datasets from four publicly available software defect projects were used. Furthermore, we compared the proposed method with four HCPDP methods namely EGW, HDP&amp;#x005F;KS, CTKCCA and EMKCA, and one WPDP method from existing literature. The proposed method on average outperforms the baseline methods in terms of PD, PF, F1-score, G-mean and AUC.</p

    Lightning Prediction for Space Launch Using Machine Learning Based Off of Electric Field Mills and Lightning Detection and Ranging Data

    Get PDF
    Kennedy Space Center and Cape Canaveral Air Station, FL, where the Air Force conducts space launches, are in an area of frequent lightning strikes, which is main obstacle in meeting launch goals. The 45th Weather Squadron (45th WS) ensures that any weather safety requirements are met during pre-launch and actual space launch. Using only summer months from three years’ worth of lightning detection and ranging (LDAR) and electric field mill (EFM) data from KSC, several feedforward neural networks are constructed. Separate models are built for each EFM and trained by adjusting parameters to forecast lightning 30 minutes out in the surrounding area of each field mill

    Designing a streaming algorithm for outlier detection in data mining—an incrementa approach

    Get PDF
    To design an algorithm for detecting outliers over streaming data has become an important task in many common applications, arising in areas such as fraud detections, network analysis, environment monitoring and so forth. Due to the fact that real-time data may arrive in the form of streams rather than batches, properties such as concept drift, temporal context, transiency, and uncertainty need to be considered. In addition, data processing needs to be incremental with limited memory resource, and scalable. These facts create big challenges for existing outlier detection algorithms in terms of their accuracies when they are implemented in an incremental fashion, especially in the streaming environment. To address these problems, we first propose C_KDE_WR, which uses sliding window and kernel function to process the streaming data online, and reports its results demonstrating high throughput on handling real-time streaming data, implemented in a CUDA framework on Graphics Processing Unit (GPU). We also present another algorithm, C_LOF, based on a very popular and effective outlier detection algorithm called Local Outlier Factor (LOF) which unfortunately works only on batched data. Using a novel incremental approach that compensates the drawback of high complexity in LOF, we show how to implement it in a streaming context and to obtain results in a timely manner. Like C_KDE_WR, C_LOF also employs sliding-window and statistical-summary to help making decision based on the data in the current window. It also addresses all those challenges of streaming data as addressed in C_KDE_WR. In addition, we report the comparative evaluation on the accuracy of C_KDE_WR with the state-of-the-art SOD_GPU using Precision, Recall and F-score metrics. Furthermore, a t-test is also performed to demonstrate the significance of the improvement. We further report the testing results of C_LOF on different parameter settings and drew ROC and PR curve with their area under the curve (AUC) and Average Precision (AP) values calculated respectively. Experimental results show that C_LOF can overcome the masquerading problem, which often exists in outlier detection on streaming data. We provide complexity analysis and report experiment results on the accuracy of both C_KDE_WR and C_LOF algorithms in order to evaluate their effectiveness as well as their efficiencies