458 research outputs found
Malware distributions and graph structure of the Web
Knowledge about the graph structure of the Web is important for understanding
this complex socio-technical system and for devising proper policies supporting
its future development. Knowledge about the differences between clean and
malicious parts of the Web is important for understanding potential treats to
its users and for devising protection mechanisms. In this study, we conduct
data science methods on a large crawl of surface and deep Web pages with the
aim to increase such knowledge. To accomplish this, we answer the following
questions. Which theoretical distributions explain important local
characteristics and network properties of websites? How are these
characteristics and properties different between clean and malicious
(malware-affected) websites? What is the prediction power of local
characteristics and network properties to classify malware websites? To the
best of our knowledge, this is the first large-scale study describing the
differences in global properties between malicious and clean parts of the Web.
In other words, our work is building on and bridging the gap between
\textit{Web science} that tackles large-scale graph representations and
\textit{Web cyber security} that is concerned with malicious activities on the
Web. The results presented herein can also help antivirus vendors in devising
approaches to improve their detection algorithms
Early Identification of Abused Domains in TLD through Passive DNS Applying Machine Learning Techniques
DNS is vital for the proper functioning of the Internet. However, users use this structure for domain registration and abuse. These domains are used as tools for these users to carry out the most varied attacks. Thus, early detection of abused domains prevents more people from falling into scams. In this work, an approach for identifying abused domains was developed using passive DNS collected from an authoritative DNS server TLD along with the data enriched through geolocation, thus enabling a global view of the domains. Therefore, the system monitors the domain’s first seven days of life after its first DNS query, in which two behavior checks are performed, the first with three days and the second with seven days. The generated models apply the machine learning algorithm LightGBM, and because of the unbalanced data, the combination of Cluster Centroids and K-Means SMOTE techniques were used. As a result, it obtained an average AUC of 0.9673 for the three-day model and an average AUC of 0.9674 for the seven-day model. Finally, the validation of three and seven days in a test environment reached a TPR of 0.8656 and 0.8682, respectively. It was noted that the system has a satisfactory performance for the early identification of abused domains and the importance of a TLD to identify these domains
Tree-based Intelligent Intrusion Detection System in Internet of Vehicles
The use of autonomous vehicles (AVs) is a promising technology in Intelligent
Transportation Systems (ITSs) to improve safety and driving efficiency.
Vehicle-to-everything (V2X) technology enables communication among vehicles and
other infrastructures. However, AVs and Internet of Vehicles (IoV) are
vulnerable to different types of cyber-attacks such as denial of service,
spoofing, and sniffing attacks. In this paper, an intelligent intrusion
detection system (IDS) is proposed based on tree-structure machine learning
models. The results from the implementation of the proposed intrusion detection
system on standard data sets indicate that the system has the ability to
identify various cyber-attacks in the AV networks. Furthermore, the proposed
ensemble learning and feature selection approaches enable the proposed system
to achieve high detection rate and low computational cost simultaneously.Comment: Accepted in IEEE Global Communications Conference (GLOBECOM) 201
Robust Botnet Detection Techniques for Mobile and Network Environments
Cybercrime costs large amounts of money and resources every year. This is because it is usually carried out using different methods and at different scales. The use of botnets is one of the most common successful cybercrime methods. A botnet is a group of devices that are used together to carry out malicious attacks (they are connected via a network). With the widespread usage of handheld devices such as smartphones and tablets, networked devices are no longer limited to personal computers and laptops. Therefore, the size of networks (and therefore botnets) can be large. This means it is not surprising for malicious users to target different types of devices and platforms as cyber-attack victims or use them to launch cyber-attacks. Thus, robust automatic methods of botnet detection on different platforms are required.
This thesis addresses this problem by introducing robust methods for botnet family detection on Android devices as well as by generally analysing network traffic. As for botnet detection on Android, this thesis proposes an approach to identify botnet Android botnet apps by means of source code mining. The approach analyses the source code via reverse engineering and data mining techniques for several examples of malicious and non-malicious apps. Two methods are used to build datasets. In the first, text mining is performed on the source code and several datasets are constructed, and in the second, one dataset is created by extracting source code metrics using an open-source tool.
Additionally, this thesis introduces a novel transfer learning approach for the detection of botnet families by means of network traffic analysis. This approach is a key contribution to knowledge because it adds insight into how similar instances can exist in datasets that belong to different botnet families and that these instances can be leveraged to enhance model quality (especially for botnet families with small datasets). This novel approach is denoted Similarity Based Instance Transfer, or SBIT. Furthermore, the thesis presents a proposed extended version designed to overcome a weakness in the original algorithm. The extended version is called CB-SBIT (Class Balanced Similarity Based Instance Transfer)
Low-Quality Training Data Only? A Robust Framework for Detecting Encrypted Malicious Network Traffic
Machine learning (ML) is promising in accurately detecting malicious flows in
encrypted network traffic; however, it is challenging to collect a training
dataset that contains a sufficient amount of encrypted malicious data with
correct labels. When ML models are trained with low-quality training data, they
suffer degraded performance. In this paper, we aim at addressing a real-world
low-quality training dataset problem, namely, detecting encrypted malicious
traffic generated by continuously evolving malware. We develop RAPIER that
fully utilizes different distributions of normal and malicious traffic data in
the feature space, where normal data is tightly distributed in a certain area
and the malicious data is scattered over the entire feature space to augment
training data for model training. RAPIER includes two pre-processing modules to
convert traffic into feature vectors and correct label noises. We evaluate our
system on two public datasets and one combined dataset. With 1000 samples and
45% noises from each dataset, our system achieves the F1 scores of 0.770,
0.776, and 0.855, respectively, achieving average improvements of 352.6%,
284.3%, and 214.9% over the existing methods, respectively. Furthermore, We
evaluate RAPIER with a real-world dataset obtained from a security enterprise.
RAPIER effectively achieves encrypted malicious traffic detection with the best
F1 score of 0.773 and improves the F1 score of existing methods by an average
of 272.5%
Network Traffic Based Botnet Detection Using Machine Learning
The field of information and computer security is rapidly developing in today’s world as the number of security risks is continuously being explored every day. The moment a new software or a product is launched in the market, a new exploit or vulnerability is exposed and exploited by the attackers or malicious users for different motives. Many attacks are distributed in nature and carried out by botnets that cause widespread disruption of network activity by carrying out DDoS (Distributed Denial of Service) attacks, email spamming, click fraud, information and identity theft, virtual deceit and distributed resource usage for cryptocurrency mining. Botnet detection is still an active area of research as no single technique is available that can detect the entire ecosystem of a botnet like Neris, Rbot, and Virut. They tend to have different configurations and heavily armored by malware writers to evade detection systems by employing sophisticated evasion techniques. This report provides a detailed overview of a botnet and its characteristics and the existing work that is done in the domain of botnet detection. The study aims to evaluate the preprocessing techniques like variance thresholding and one-hot encoding to clean the botnet dataset and feature selection technique like filter, wrapper and embedded method to boost the machine learning model performance. This study addresses the dataset imbalance issues through techniques like undersampling, oversampling, ensemble learning and gradient boosting by using random forest, decision tree, AdaBoost and XGBoost. Lastly, the optimal model is then trained and tested on the dataset of different attacks to study its performance
- …