3,405 research outputs found
An empirical evaluation of imbalanced data strategies from a practitioner's point of view
This research tested the following well known strategies to deal with binary
imbalanced data on 82 different real life data sets (sampled to imbalance rates
of 5%, 3%, 1%, and 0.1%): class weight, SMOTE, Underbagging, and a baseline
(just the base classifier). As base classifiers we used SVM with RBF kernel,
random forests, and gradient boosting machines and we measured the quality of
the resulting classifier using 6 different metrics (Area under the curve,
Accuracy, F-measure, G-mean, Matthew's correlation coefficient and Balanced
accuracy). The best strategy strongly depends on the metric used to measure the
quality of the classifier. For AUC and accuracy class weight and the baseline
perform better; for F-measure and MCC, SMOTE performs better; and for G-mean
and balanced accuracy, underbagging
Predicting Pancreatic Cancer Using Support Vector Machine
This report presents an approach to predict pancreatic cancer using Support Vector Machine Classification algorithm. The research objective of this project it to predict pancreatic cancer on just genomic, just clinical and combination of genomic and clinical data. We have used real genomic data having 22,763 samples and 154 features per sample. We have also created Synthetic Clinical data having 400 samples and 7 features per sample in order to predict accuracy of just clinical data. To validate the hypothesis, we have combined synthetic clinical data with subset of features from real genomic data. In our results, we observed that prediction accuracy, precision, recall with just genomic data is 80.77%, 20%, 4%. Prediction accuracy, precision, recall with just synthetic clinical data is 93.33%, 95%, 30%. While prediction accuracy, precision, recall for combination of real genomic and synthetic clinical data is 90.83%, 10%, 5%. The combination of real genomic and synthetic clinical data decreased the accuracy since the genomic data is weakly correlated. Thus we conclude that the combination of genomic and clinical data does not improve pancreatic cancer prediction accuracy. A dataset with more significant genomic features might help to predict pancreatic cancer more accurately
Tree-based Intelligent Intrusion Detection System in Internet of Vehicles
The use of autonomous vehicles (AVs) is a promising technology in Intelligent
Transportation Systems (ITSs) to improve safety and driving efficiency.
Vehicle-to-everything (V2X) technology enables communication among vehicles and
other infrastructures. However, AVs and Internet of Vehicles (IoV) are
vulnerable to different types of cyber-attacks such as denial of service,
spoofing, and sniffing attacks. In this paper, an intelligent intrusion
detection system (IDS) is proposed based on tree-structure machine learning
models. The results from the implementation of the proposed intrusion detection
system on standard data sets indicate that the system has the ability to
identify various cyber-attacks in the AV networks. Furthermore, the proposed
ensemble learning and feature selection approaches enable the proposed system
to achieve high detection rate and low computational cost simultaneously.Comment: Accepted in IEEE Global Communications Conference (GLOBECOM) 201
- …