458 research outputs found

    Malware distributions and graph structure of the Web

    Full text link
    Knowledge about the graph structure of the Web is important for understanding this complex socio-technical system and for devising proper policies supporting its future development. Knowledge about the differences between clean and malicious parts of the Web is important for understanding potential treats to its users and for devising protection mechanisms. In this study, we conduct data science methods on a large crawl of surface and deep Web pages with the aim to increase such knowledge. To accomplish this, we answer the following questions. Which theoretical distributions explain important local characteristics and network properties of websites? How are these characteristics and properties different between clean and malicious (malware-affected) websites? What is the prediction power of local characteristics and network properties to classify malware websites? To the best of our knowledge, this is the first large-scale study describing the differences in global properties between malicious and clean parts of the Web. In other words, our work is building on and bridging the gap between \textit{Web science} that tackles large-scale graph representations and \textit{Web cyber security} that is concerned with malicious activities on the Web. The results presented herein can also help antivirus vendors in devising approaches to improve their detection algorithms

    Early Identification of Abused Domains in TLD through Passive DNS Applying Machine Learning Techniques

    Get PDF
    DNS is vital for the proper functioning of the Internet. However, users use this structure for domain registration and abuse. These domains are used as tools for these users to carry out the most varied attacks. Thus, early detection of abused domains prevents more people from falling into scams. In this work, an approach for identifying abused domains was developed using passive DNS collected from an authoritative DNS server TLD along with the data enriched through geolocation, thus enabling a global view of the domains. Therefore, the system monitors the domain’s first seven days of life after its first DNS query, in which two behavior checks are performed, the first with three days and the second with seven days. The generated models apply the machine learning algorithm LightGBM, and because of the unbalanced data, the combination of Cluster Centroids and K-Means SMOTE techniques were used. As a result, it obtained an average AUC of 0.9673 for the three-day model and an average AUC of 0.9674 for the seven-day model. Finally, the validation of three and seven days in a test environment reached a TPR of 0.8656 and 0.8682, respectively. It was noted that the system has a satisfactory performance for the early identification of abused domains and the importance of a TLD to identify these domains

    Tree-based Intelligent Intrusion Detection System in Internet of Vehicles

    Full text link
    The use of autonomous vehicles (AVs) is a promising technology in Intelligent Transportation Systems (ITSs) to improve safety and driving efficiency. Vehicle-to-everything (V2X) technology enables communication among vehicles and other infrastructures. However, AVs and Internet of Vehicles (IoV) are vulnerable to different types of cyber-attacks such as denial of service, spoofing, and sniffing attacks. In this paper, an intelligent intrusion detection system (IDS) is proposed based on tree-structure machine learning models. The results from the implementation of the proposed intrusion detection system on standard data sets indicate that the system has the ability to identify various cyber-attacks in the AV networks. Furthermore, the proposed ensemble learning and feature selection approaches enable the proposed system to achieve high detection rate and low computational cost simultaneously.Comment: Accepted in IEEE Global Communications Conference (GLOBECOM) 201

    Robust Botnet Detection Techniques for Mobile and Network Environments

    Get PDF
    Cybercrime costs large amounts of money and resources every year. This is because it is usually carried out using different methods and at different scales. The use of botnets is one of the most common successful cybercrime methods. A botnet is a group of devices that are used together to carry out malicious attacks (they are connected via a network). With the widespread usage of handheld devices such as smartphones and tablets, networked devices are no longer limited to personal computers and laptops. Therefore, the size of networks (and therefore botnets) can be large. This means it is not surprising for malicious users to target different types of devices and platforms as cyber-attack victims or use them to launch cyber-attacks. Thus, robust automatic methods of botnet detection on different platforms are required. This thesis addresses this problem by introducing robust methods for botnet family detection on Android devices as well as by generally analysing network traffic. As for botnet detection on Android, this thesis proposes an approach to identify botnet Android botnet apps by means of source code mining. The approach analyses the source code via reverse engineering and data mining techniques for several examples of malicious and non-malicious apps. Two methods are used to build datasets. In the first, text mining is performed on the source code and several datasets are constructed, and in the second, one dataset is created by extracting source code metrics using an open-source tool. Additionally, this thesis introduces a novel transfer learning approach for the detection of botnet families by means of network traffic analysis. This approach is a key contribution to knowledge because it adds insight into how similar instances can exist in datasets that belong to different botnet families and that these instances can be leveraged to enhance model quality (especially for botnet families with small datasets). This novel approach is denoted Similarity Based Instance Transfer, or SBIT. Furthermore, the thesis presents a proposed extended version designed to overcome a weakness in the original algorithm. The extended version is called CB-SBIT (Class Balanced Similarity Based Instance Transfer)

    Low-Quality Training Data Only? A Robust Framework for Detecting Encrypted Malicious Network Traffic

    Full text link
    Machine learning (ML) is promising in accurately detecting malicious flows in encrypted network traffic; however, it is challenging to collect a training dataset that contains a sufficient amount of encrypted malicious data with correct labels. When ML models are trained with low-quality training data, they suffer degraded performance. In this paper, we aim at addressing a real-world low-quality training dataset problem, namely, detecting encrypted malicious traffic generated by continuously evolving malware. We develop RAPIER that fully utilizes different distributions of normal and malicious traffic data in the feature space, where normal data is tightly distributed in a certain area and the malicious data is scattered over the entire feature space to augment training data for model training. RAPIER includes two pre-processing modules to convert traffic into feature vectors and correct label noises. We evaluate our system on two public datasets and one combined dataset. With 1000 samples and 45% noises from each dataset, our system achieves the F1 scores of 0.770, 0.776, and 0.855, respectively, achieving average improvements of 352.6%, 284.3%, and 214.9% over the existing methods, respectively. Furthermore, We evaluate RAPIER with a real-world dataset obtained from a security enterprise. RAPIER effectively achieves encrypted malicious traffic detection with the best F1 score of 0.773 and improves the F1 score of existing methods by an average of 272.5%

    Network Traffic Based Botnet Detection Using Machine Learning

    Get PDF
    The field of information and computer security is rapidly developing in today’s world as the number of security risks is continuously being explored every day. The moment a new software or a product is launched in the market, a new exploit or vulnerability is exposed and exploited by the attackers or malicious users for different motives. Many attacks are distributed in nature and carried out by botnets that cause widespread disruption of network activity by carrying out DDoS (Distributed Denial of Service) attacks, email spamming, click fraud, information and identity theft, virtual deceit and distributed resource usage for cryptocurrency mining. Botnet detection is still an active area of research as no single technique is available that can detect the entire ecosystem of a botnet like Neris, Rbot, and Virut. They tend to have different configurations and heavily armored by malware writers to evade detection systems by employing sophisticated evasion techniques. This report provides a detailed overview of a botnet and its characteristics and the existing work that is done in the domain of botnet detection. The study aims to evaluate the preprocessing techniques like variance thresholding and one-hot encoding to clean the botnet dataset and feature selection technique like filter, wrapper and embedded method to boost the machine learning model performance. This study addresses the dataset imbalance issues through techniques like undersampling, oversampling, ensemble learning and gradient boosting by using random forest, decision tree, AdaBoost and XGBoost. Lastly, the optimal model is then trained and tested on the dataset of different attacks to study its performance
    • …
    corecore