10 research outputs found

    An evaluation of DGA classifiers

    Get PDF
    Domain Generation Algorithms (DGAs) are a popular technique used by contemporary malware for command-and-control (C&C) purposes. Such malware utilizes DGAs to create a set of domain names that, when resolved, provide information necessary to establish a link to a C&C server. Automated discovery of such domain names in real-time DNS traffic is critical for network security as it allows to detect infection, and, in some cases, take countermeasures to disrupt the communication and identify infected machines. Detection of the specific DGA malware family provides the administrator valuable information about the kind of infection and steps that need to be taken. In this paper we compare and evaluate machine learning methods that classify domain names as benign or DGA, and label the latter according to their malware family. Unlike previous work, we select data for test and training sets according to observation time and known seeds. This allows us to assess the robustness of the trained classifiers for detecting domains generated by the same families at a different time or when seeds change. Our study includes tree ensemble models based on human-engineered features and deep neural networks that learn features automatically from domain names. We find that all state-of-the-art classifiers are significantly better at catching domain names from malware families with a time-dependent seed compared to time-invariant DGAs. In addition, when applying the trained classifiers on a day of real traffic, we find that many domain names unjustifiably are flagged as malicious, thereby revealing the shortcomings of relying on a standard whitelist for training a production grade DGA detection system

    Inline detection of DGA domains using side information

    Get PDF
    Malware applications typically use a command and control (C&C) server to manage bots to perform malicious activities. Domain Generation Algorithms (DGAs) are popular methods for generating pseudo-random domain names that can be used to establish a communication between an infected bot and the C&C server. In recent years, machine learning based systems have been widely used to detect DGAs. There are several well known state-of-the-art classifiers in the literature that can detect DGA domain names in real-time applications with high predictive performance. However, these DGA classifiers are highly vulnerable to adversarial attacks in which adversaries purposely craft domain names to evade DGA detection classifiers. In our work, we focus on hardening DGA classifiers against adversarial attacks. To this end, we train and evaluate state-of-the-art deep learning and random forest (RF) classifiers for DGA detection using side information that is harder for adversaries to manipulate than the domain name itself. Additionally, the side information features are selected such that they are easily obtainable in practice to perform inline DGA detection. The performance and robustness of these models is assessed by exposing them to one day of real-traffic data as well as domains generated by adversarial attack algorithms. We found that the DGA classifiers that rely on both the domain name and side information have high performance and are more robust against adversaries

    CharBot: A Simple and Effective Method for Evading DGA Classifiers

    Full text link
    Domain generation algorithms (DGAs) are commonly leveraged by malware to create lists of domain names which can be used for command and control (C&C) purposes. Approaches based on machine learning have recently been developed to automatically detect generated domain names in real-time. In this work, we present a novel DGA called CharBot which is capable of producing large numbers of unregistered domain names that are not detected by state-of-the-art classifiers for real-time detection of DGAs, including the recently published methods FANCI (a random forest based on human-engineered features) and LSTM.MI (a deep learning approach). CharBot is very simple, effective and requires no knowledge of the targeted DGA classifiers. We show that retraining the classifiers on CharBot samples is not a viable defense strategy. We believe these findings show that DGA classifiers are inherently vulnerable to adversarial attacks if they rely only on the domain name string to make a decision. Designing a robust DGA classifier may, therefore, necessitate the use of additional information besides the domain name alone. To the best of our knowledge, CharBot is the simplest and most efficient black-box adversarial attack against DGA classifiers proposed to date

    CharBot : a simple and effective method for evading DGA classifiers

    No full text
    Domain generation algorithms (DGAs) are commonly leveraged by malware to create lists of domain names, which can be used for command and control (C&C) purposes. Approaches based on machine learning have recently been developed to automatically detect generated domain names in real-time. In this paper, we present a novel DGA called CharBot, which is capable of producing large numbers of unregistered domain names that are not detected by state-of-the-art classifiers for real-time detection of the DGAs, including the recently published methods FANCI (a random forest based on human-engineered features) and LSTM.MI (a deep learning approach). The CharBot is very simple, effective, and requires no knowledge of the targeted DGA classifiers. We show that retraining the classifiers on CharBot samples is not a viable defense strategy. We believe these findings show that DGA classifiers are inherently vulnerable to adversarial attacks if they rely only on the domain name string to make a decision. Designing a robust DGA classifier may, therefore, necessitate the use of additional information besides the domain name alone. To the best of our knowledge, the CharBot is the simplest and most efficient black-box adversarial attack against DGA classifiers proposed to date

    Analyzing the Real-World Applicability of DGA Classifiers

    Full text link
    Separating benign domains from domains generated by DGAs with the help of a binary classifier is a well-studied problem for which promising performance results have been published. The corresponding multiclass task of determining the exact DGA that generated a domain enabling targeted remediation measures is less well studied. Selecting the most promising classifier for these tasks in practice raises a number of questions that have not been addressed in prior work so far. These include the questions on which traffic to train in which network and when, just as well as how to assess robustness against adversarial attacks. Moreover, it is unclear which features lead a classifier to a decision and whether the classifiers are real-time capable. In this paper, we address these issues and thus contribute to bringing DGA detection classifiers closer to practical use. In this context, we propose one novel classifier based on residual neural networks for each of the two tasks and extensively evaluate them as well as previously proposed classifiers in a unified setting. We not only evaluate their classification performance but also compare them with respect to explainability, robustness, and training and classification speed. Finally, we show that our newly proposed binary classifier generalizes well to other networks, is time-robust, and able to identify previously unknown DGAs.Comment: Accepted at The 15th International Conference on Availability, Reliability and Security (ARES 2020

    Hardening Inline DGA Classifiers Against Adversarial Attacks

    No full text
    Thesis (Master's)--University of Washington, 2019Domain Generation Algorithms (DGAs) are widely used by cybercriminals to generate domain names on-the-go for C&C (command-and-control) purposes of establishing communication with the bots and instructing them to perform malicious activities. It is therefore important to detect domains generated by DGAs to block the communication between the bot and C&C. In recent years, Machine Learning based DGA detection systems are widely used to address this problem. However, it is found that classifiers that rely only on the domain name to detect DGAs are highly vulnerable to adversarial attacks. Adversarial attacks are intentionally devised by an attacker to fool a classifier and cause it to produce erroneous results. This is a serious concern as it degrades the performance of DGA detection classifiers. In this thesis, we aim to defend DGA detection classifiers against adversarial attacks, without compromising the performance of existing state-of-the-art classifiers in the literature. One such technique is to use side information features obtained from the DNS query/response that cannot be easily manipulated by the adversary. Although there are past research works that use DNS features for a retrospective analysis of DNS traffic, to the best of our knowledge, there are no studies that leverage such data for inline detection of DGA domains. In our work, we train machine learning models based on tree ensembles and deep learning for DGA detection using side information (in addition to the domain name), which can be easily obtained in practice without relying on external data sources such as WHOIS. Besides, we also disregard methods that analyze past DNS data to extract side information features, thereby resulting in a relatively lightweight computation for detecting DGA domains in real-time DNS applications. In the end, we also perform an empirical evaluation by applying the best performing classifiers trained using side information on one day of passive DNS traffic to compare its performance against well known state-of-the-art classifier that relies only on a domain name for DGA detection. Results show that classifiers trained using a combination of lexical and side information features, not only provide high performance but are also more robust to adversarial attacks than the classifiers that rely only on the domain name for inline DGA detection

    Inline Detection of DGA Domains Using Side Information

    Get PDF
    Malware applications typically use a command and control (C&C) server to manage bots to perform malicious activities. Domain Generation Algorithms (DGAs) are popular methods for generating pseudo-random domain names that can be used to establish a communication between an infected bot and the C&C server. In recent years, machine learning based systems have been widely used to detect DGAs. There are several well known state-of-the-art classifiers in the literature that can detect DGA domain names in real-time applications with high predictive performance. However, these DGA classifiers are highly vulnerable to adversarial attacks in which adversaries purposely craft domain names to evade DGA detection classifiers. In our work, we focus on hardening DGA classifiers against adversarial attacks. To this end, we train and evaluate state-of-the-art deep learning and random forest (RF) classifiers for DGA detection using side information that is harder for adversaries to manipulate than the domain name itself. Additionally, the side information features are selected such that they are easily obtainable in practice to perform inline DGA detection. The performance and robustness of these models is assessed by exposing them to one day of real-traffic data as well as domains generated by adversarial attack algorithms. We found that the DGA classifiers that rely on both the domain name and side information have high performance and are more robust against adversaries
    corecore