146 research outputs found
Automatic Detection of Malware-Generated Domains with Recurrent Neural Models
Modern malware families often rely on domain-generation algorithms (DGAs) to
determine rendezvous points to their command-and-control server. Traditional
defence strategies (such as blacklisting domains or IP addresses) are
inadequate against such techniques due to the large and continuously changing
list of domains produced by these algorithms. This paper demonstrates that a
machine learning approach based on recurrent neural networks is able to detect
domain names generated by DGAs with high precision. The neural models are
estimated on a large training set of domains generated by various malwares.
Experimental results show that this data-driven approach can detect
malware-generated domain names with a F_1 score of 0.971. To put it
differently, the model can automatically detect 93 % of malware-generated
domain names for a false positive rate of 1:100.Comment: Submitted to NISK 201
Weakly supervised deep learning for the detection of domain generation algorithms
Domain generation algorithms (DGAs) have become commonplace in malware that seeks to establish command and control communication between an infected machine and the botmaster. DGAs dynamically and consistently generate large volumes of malicious domain names, only a few of which are registered by the botmaster, within a short time window around their generation time, and subsequently resolved when the malware on the infected machine tries to access them. Deep neural networks that can classify domain names as benign or malicious are of great interest in the real-time defense against DGAs. In contrast with traditional machine learning models, deep networks do not rely on human engineered features. Instead, they can learn features automatically from data, provided that they are supplied with sufficiently large amounts of suitable training data. Obtaining cleanly labeled ground truth data is difficult and time consuming. Heuristically labeled data could potentially provide a source of training data for weakly supervised training of DGA detectors. We propose a set of heuristics for automatically labeling domain names monitored in real traffic, and then train and evaluate classifiers with the proposed heuristically labeled dataset. We show through experiments on a dataset with 50 million domain names that such heuristically labeled data is very useful in practice to improve the predictive accuracy of deep learning-based DGA classifiers, and that these deep neural networks significantly outperform a random forest classifier with human engineered features
An evaluation of DGA classifiers
Domain Generation Algorithms (DGAs) are a popular technique used by contemporary malware for command-and-control (C&C) purposes. Such malware utilizes DGAs to create a set of domain names that, when resolved, provide information necessary to establish a link to a C&C server. Automated discovery of such domain names in real-time DNS traffic is critical for network security as it allows to detect infection, and, in some cases, take countermeasures to disrupt the communication and identify infected machines. Detection of the specific DGA malware family provides the administrator valuable information about the kind of infection and steps that need to be taken. In this paper we compare and evaluate machine learning methods that classify domain names as benign or DGA, and label the latter according to their malware family. Unlike previous work, we select data for test and training sets according to observation time and known seeds. This allows us to assess the robustness of the trained classifiers for detecting domains generated by the same families at a different time or when seeds change. Our study includes tree ensemble models based on human-engineered features and deep neural networks that learn features automatically from domain names. We find that all state-of-the-art classifiers are significantly better at catching domain names from malware families with a time-dependent seed compared to time-invariant DGAs. In addition, when applying the trained classifiers on a day of real traffic, we find that many domain names unjustifiably are flagged as malicious, thereby revealing the shortcomings of relying on a standard whitelist for training a production grade DGA detection system
CharBot: A Simple and Effective Method for Evading DGA Classifiers
Domain generation algorithms (DGAs) are commonly leveraged by malware to
create lists of domain names which can be used for command and control (C&C)
purposes. Approaches based on machine learning have recently been developed to
automatically detect generated domain names in real-time. In this work, we
present a novel DGA called CharBot which is capable of producing large numbers
of unregistered domain names that are not detected by state-of-the-art
classifiers for real-time detection of DGAs, including the recently published
methods FANCI (a random forest based on human-engineered features) and LSTM.MI
(a deep learning approach). CharBot is very simple, effective and requires no
knowledge of the targeted DGA classifiers. We show that retraining the
classifiers on CharBot samples is not a viable defense strategy. We believe
these findings show that DGA classifiers are inherently vulnerable to
adversarial attacks if they rely only on the domain name string to make a
decision. Designing a robust DGA classifier may, therefore, necessitate the use
of additional information besides the domain name alone. To the best of our
knowledge, CharBot is the simplest and most efficient black-box adversarial
attack against DGA classifiers proposed to date
Practical Attacks Against Graph-based Clustering
Graph modeling allows numerous security problems to be tackled in a general
way, however, little work has been done to understand their ability to
withstand adversarial attacks. We design and evaluate two novel graph attacks
against a state-of-the-art network-level, graph-based detection system. Our
work highlights areas in adversarial machine learning that have not yet been
addressed, specifically: graph-based clustering techniques, and a global
feature space where realistic attackers without perfect knowledge must be
accounted for (by the defenders) in order to be practical. Even though less
informed attackers can evade graph clustering with low cost, we show that some
practical defenses are possible.Comment: ACM CCS 201
OnionBots: Subverting Privacy Infrastructure for Cyber Attacks
Over the last decade botnets survived by adopting a sequence of increasingly
sophisticated strategies to evade detection and take overs, and to monetize
their infrastructure. At the same time, the success of privacy infrastructures
such as Tor opened the door to illegal activities, including botnets,
ransomware, and a marketplace for drugs and contraband. We contend that the
next waves of botnets will extensively subvert privacy infrastructure and
cryptographic mechanisms. In this work we propose to preemptively investigate
the design and mitigation of such botnets. We first, introduce OnionBots, what
we believe will be the next generation of resilient, stealthy botnets.
OnionBots use privacy infrastructures for cyber attacks by completely
decoupling their operation from the infected host IP address and by carrying
traffic that does not leak information about its source, destination, and
nature. Such bots live symbiotically within the privacy infrastructures to
evade detection, measurement, scale estimation, observation, and in general all
IP-based current mitigation techniques. Furthermore, we show that with an
adequate self-healing network maintenance scheme, that is simple to implement,
OnionBots achieve a low diameter and a low degree and are robust to
partitioning under node deletions. We developed a mitigation technique, called
SOAP, that neutralizes the nodes of the basic OnionBots. We also outline and
discuss a set of techniques that can enable subsequent waves of Super
OnionBots. In light of the potential of such botnets, we believe that the
research community should proactively develop detection and mitigation methods
to thwart OnionBots, potentially making adjustments to privacy infrastructure.Comment: 12 pages, 8 figure
- …