36,608 research outputs found
Performance Comparison of Ensemble Learning and Supervised Algorithms in Classifying Multi-label Network Traffic Flow
A research article was published by Engineering, Technology & Applied Science Research (ETASR) Volume: 12 | Issue: 3 |June 2022Network traffic classification is of significant importance. It helps identify network anomalies and assists in taking measures to avoid them. However, classifying network traffic correctly is a challenging task. This study aims to compare ensemble learning methods with normal supervised classification to come up with improved classification methods. Three types of network traffic were classified (Benign, Malicious, and Outliers). The data were collected experimentally by using Paessler Router Traffic Grapher software and online and were analyzed by R software. The datasets were used to train five supervised models (k-nearest neighbors, mixture discriminant analysis, Naïve Bayes, C5.0 classification model, and regularized discriminant analysis). The models were trained by 70% of the samples and the rest 30% were used for validation. The same samples were used separately in predicting individual accuracy. The results were compared to the ensemble learning models which were built with the use of the same datasets. Among the five supervised classifiers, k-nearest neighbors and C5.0 classification scored the highest accuracy of 0.868 and 0.761. The ensemble learning classifiers Bagging (Random Forest) and Boosting (eXtreme Gradient Boosting) had accuracy of 0.904 and 0.902 respectively. The results show that the ensemble learning method has higher accuracy compared to the normal supervised classifiers. Therefore, it can be used to detect malicious activities in network traffic as well as anomalies with improved accuracy
High-Resolution Road Vehicle Collision Prediction for the City of Montreal
Road accidents are an important issue of our modern societies, responsible
for millions of deaths and injuries every year in the world. In Quebec only, in
2018, road accidents are responsible for 359 deaths and 33 thousands of
injuries. In this paper, we show how one can leverage open datasets of a city
like Montreal, Canada, to create high-resolution accident prediction models,
using big data analytics. Compared to other studies in road accident
prediction, we have a much higher prediction resolution, i.e., our models
predict the occurrence of an accident within an hour, on road segments defined
by intersections. Such models could be used in the context of road accident
prevention, but also to identify key factors that can lead to a road accident,
and consequently, help elaborate new policies.
We tested various machine learning methods to deal with the severe class
imbalance inherent to accident prediction problems. In particular, we
implemented the Balanced Random Forest algorithm, a variant of the Random
Forest machine learning algorithm in Apache Spark. Interestingly, we found that
in our case, Balanced Random Forest does not perform significantly better than
Random Forest.
Experimental results show that 85% of road vehicle collisions are detected by
our model with a false positive rate of 13%. The examples identified as
positive are likely to correspond to high-risk situations. In addition, we
identify the most important predictors of vehicle collisions for the area of
Montreal: the count of accidents on the same road segment during previous
years, the temperature, the day of the year, the hour and the visibility
In-depth comparative evaluation of supervised machine learning approaches for detection of cybersecurity threats
This paper describes the process and results of analyzing CICIDS2017, a modern, labeled data set for testing intrusion detection systems. The data set is divided into several days, each pertaining to different attack classes (Dos, DDoS, infiltration, botnet, etc.). A pipeline has been created that includes nine supervised learning algorithms. The goal was binary classification of benign versus attack traffic. Cross-validated parameter optimization, using a voting mechanism that includes five classification metrics, was employed to select optimal parameters. These results were interpreted to discover whether certain parameter choices were dominant for most (or all) of the attack classes. Ultimately, every algorithm was retested with optimal parameters to obtain the final classification scores. During the review of these results, execution time, both on consumerand corporate-grade equipment, was taken into account as an additional requirement. The work detailed in this paper establishes a novel supervised machine learning performance baseline for CICIDS2017
Grand Challenge: Real-time Destination and ETA Prediction for Maritime Traffic
In this paper, we present our approach for solving the DEBS Grand Challenge
2018. The challenge asks to provide a prediction for (i) a destination and the
(ii) arrival time of ships in a streaming-fashion using Geo-spatial data in the
maritime context. Novel aspects of our approach include the use of ensemble
learning based on Random Forest, Gradient Boosting Decision Trees (GBDT),
XGBoost Trees and Extremely Randomized Trees (ERT) in order to provide a
prediction for a destination while for the arrival time, we propose the use of
Feed-forward Neural Networks. In our evaluation, we were able to achieve an
accuracy of 97% for the port destination classification problem and 90% (in
mins) for the ETA prediction
SQL Injection Detection Using Machine Learning Techniques and Multiple Data Sources
SQL Injection continues to be one of the most damaging security exploits in terms of personal information exposure as well as monetary loss. Injection attacks are the number one vulnerability in the most recent OWASP Top 10 report, and the number of these attacks continues to increase. Traditional defense strategies often involve static, signature-based IDS (Intrusion Detection System) rules which are mostly effective only against previously observed attacks but not unknown, or zero-day, attacks. Much current research involves the use of machine learning techniques, which are able to detect unknown attacks, but depending on the algorithm can be costly in terms of performance. In addition, most current intrusion detection strategies involve collection of traffic coming into the web application either from a network device or from the web application host, while other strategies collect data from the database server logs. In this project, we are collecting traffic from two points: the web application host, and a Datiphy appliance node located between the webapp host and the associated MySQL database server. In our analysis of these two datasets, and another dataset that is correlated between the two, we have been able to demonstrate that accuracy obtained with the correlated dataset using algorithms such as rule-based and decision tree are nearly the same as those with a neural network algorithm, but with greatly improved performance
Time series classification based on fractal properties
The article considers classification task of fractal time series by the meta
algorithms based on decision trees. Binomial multiplicative stochastic cascades
are used as input time series. Comparative analysis of the classification
approaches based on different features is carried out. The results indicate the
advantage of the machine learning methods over the traditional estimating the
degree of self-similarity.Comment: 4 pages, 2 figures, 3 equations, 1 tabl
iTeleScope: Intelligent Video Telemetry and Classification in Real-Time using Software Defined Networking
Video continues to dominate network traffic, yet operators today have poor
visibility into the number, duration, and resolutions of the video streams
traversing their domain. Current approaches are inaccurate, expensive, or
unscalable, as they rely on statistical sampling, middle-box hardware, or
packet inspection software. We present {\em iTelescope}, the first intelligent,
inexpensive, and scalable SDN-based solution for identifying and classifying
video flows in real-time. Our solution is novel in combining dynamic flow rules
with telemetry and machine learning, and is built on commodity OpenFlow
switches and open-source software. We develop a fully functional system, train
it in the lab using multiple machine learning algorithms, and validate its
performance to show over 95\% accuracy in identifying and classifying video
streams from many providers including Youtube and Netflix. Lastly, we conduct
tests to demonstrate its scalability to tens of thousands of concurrent
streams, and deploy it live on a campus network serving several hundred real
users. Our system gives unprecedented fine-grained real-time visibility of
video streaming performance to operators of enterprise and carrier networks at
very low cost.Comment: 12 pages, 16 figure
Tree-based Intelligent Intrusion Detection System in Internet of Vehicles
The use of autonomous vehicles (AVs) is a promising technology in Intelligent
Transportation Systems (ITSs) to improve safety and driving efficiency.
Vehicle-to-everything (V2X) technology enables communication among vehicles and
other infrastructures. However, AVs and Internet of Vehicles (IoV) are
vulnerable to different types of cyber-attacks such as denial of service,
spoofing, and sniffing attacks. In this paper, an intelligent intrusion
detection system (IDS) is proposed based on tree-structure machine learning
models. The results from the implementation of the proposed intrusion detection
system on standard data sets indicate that the system has the ability to
identify various cyber-attacks in the AV networks. Furthermore, the proposed
ensemble learning and feature selection approaches enable the proposed system
to achieve high detection rate and low computational cost simultaneously.Comment: Accepted in IEEE Global Communications Conference (GLOBECOM) 201
- …