794 research outputs found
Practical Attacks Against Graph-based Clustering
Graph modeling allows numerous security problems to be tackled in a general
way, however, little work has been done to understand their ability to
withstand adversarial attacks. We design and evaluate two novel graph attacks
against a state-of-the-art network-level, graph-based detection system. Our
work highlights areas in adversarial machine learning that have not yet been
addressed, specifically: graph-based clustering techniques, and a global
feature space where realistic attackers without perfect knowledge must be
accounted for (by the defenders) in order to be practical. Even though less
informed attackers can evade graph clustering with low cost, we show that some
practical defenses are possible.Comment: ACM CCS 201
CharBot: A Simple and Effective Method for Evading DGA Classifiers
Domain generation algorithms (DGAs) are commonly leveraged by malware to
create lists of domain names which can be used for command and control (C&C)
purposes. Approaches based on machine learning have recently been developed to
automatically detect generated domain names in real-time. In this work, we
present a novel DGA called CharBot which is capable of producing large numbers
of unregistered domain names that are not detected by state-of-the-art
classifiers for real-time detection of DGAs, including the recently published
methods FANCI (a random forest based on human-engineered features) and LSTM.MI
(a deep learning approach). CharBot is very simple, effective and requires no
knowledge of the targeted DGA classifiers. We show that retraining the
classifiers on CharBot samples is not a viable defense strategy. We believe
these findings show that DGA classifiers are inherently vulnerable to
adversarial attacks if they rely only on the domain name string to make a
decision. Designing a robust DGA classifier may, therefore, necessitate the use
of additional information besides the domain name alone. To the best of our
knowledge, CharBot is the simplest and most efficient black-box adversarial
attack against DGA classifiers proposed to date
Artificial intelligence in the cyber domain: Offense and defense
Artificial intelligence techniques have grown rapidly in recent years, and their applications in practice can be seen in many fields, ranging from facial recognition to image analysis. In the cybersecurity domain, AI-based techniques can provide better cyber defense tools and help adversaries improve methods of attack. However, malicious actors are aware of the new prospects too and will probably attempt to use them for nefarious purposes. This survey paper aims at providing an overview of how artificial intelligence can be used in the context of cybersecurity in both offense and defense.Web of Science123art. no. 41
Weakly supervised deep learning for the detection of domain generation algorithms
Domain generation algorithms (DGAs) have become commonplace in malware that seeks to establish command and control communication between an infected machine and the botmaster. DGAs dynamically and consistently generate large volumes of malicious domain names, only a few of which are registered by the botmaster, within a short time window around their generation time, and subsequently resolved when the malware on the infected machine tries to access them. Deep neural networks that can classify domain names as benign or malicious are of great interest in the real-time defense against DGAs. In contrast with traditional machine learning models, deep networks do not rely on human engineered features. Instead, they can learn features automatically from data, provided that they are supplied with sufficiently large amounts of suitable training data. Obtaining cleanly labeled ground truth data is difficult and time consuming. Heuristically labeled data could potentially provide a source of training data for weakly supervised training of DGA detectors. We propose a set of heuristics for automatically labeling domain names monitored in real traffic, and then train and evaluate classifiers with the proposed heuristically labeled dataset. We show through experiments on a dataset with 50 million domain names that such heuristically labeled data is very useful in practice to improve the predictive accuracy of deep learning-based DGA classifiers, and that these deep neural networks significantly outperform a random forest classifier with human engineered features
Inline Detection of Domain Generation Algorithms with Context-Sensitive Word Embeddings
Domain generation algorithms (DGAs) are frequently employed by malware to
generate domains used for connecting to command-and-control (C2) servers.
Recent work in DGA detection leveraged deep learning architectures like
convolutional neural networks (CNNs) and character-level long short-term memory
networks (LSTMs) to classify domains. However, these classifiers perform poorly
with wordlist-based DGA families, which generate domains by pseudorandomly
concatenating dictionary words. We propose a novel approach that combines
context-sensitive word embeddings with a simple fully-connected classifier to
perform classification of domains based on word-level information. The word
embeddings were pre-trained on a large unrelated corpus and left frozen during
the training on domain data. The resulting small number of trainable parameters
enabled extremely short training durations, while the transfer of language
knowledge stored in the representations allowed for high-performing models with
small training datasets. We show that this architecture reliably outperformed
existing techniques on wordlist-based DGA families with just 30 DGA training
examples and achieved state-of-the-art performance with around 100 DGA training
examples, all while requiring an order of magnitude less time to train compared
to current techniques. Of special note is the technique's performance on the
matsnu DGA: the classifier attained a 89.5% detection rate with a 1:1,000 false
positive rate (FPR) after training on only 30 examples of the DGA domains, and
a 91.2% detection rate with a 1:10,000 FPR after 90 examples. Considering that
some of these DGAs have wordlists of several hundred words, our results
demonstrate that this technique does not rely on the classifier learning the
DGA wordlists. Instead, the classifier is able to learn the semantic signatures
of the wordlist-based DGA families.Comment: 6 pages, 5 figures, 2 table
- …