1,781 research outputs found
A Survey on Malicious Domains Detection through DNS Data Analysis
Malicious domains are one of the major resources required for adversaries to
run attacks over the Internet. Due to the important role of the Domain Name
System (DNS), extensive research has been conducted to identify malicious
domains based on their unique behavior reflected in different phases of the
life cycle of DNS queries and responses. Existing approaches differ
significantly in terms of intuitions, data analysis methods as well as
evaluation methodologies. This warrants a thorough systematization of the
approaches and a careful review of the advantages and limitations of every
group.
In this paper, we perform such an analysis. In order to achieve this goal, we
present the necessary background knowledge on DNS and malicious activities
leveraging DNS. We describe a general framework of malicious domain detection
techniques using DNS data. Applying this framework, we categorize existing
approaches using several orthogonal viewpoints, namely (1) sources of DNS data
and their enrichment, (2) data analysis methods, and (3) evaluation strategies
and metrics. In each aspect, we discuss the important challenges that the
research community should address in order to fully realize the power of DNS
data analysis to fight against attacks leveraging malicious domains.Comment: 35 pages, to appear in ACM CSU
Detection of Malicious and Low Throughput Data Exfiltration Over the DNS Protocol
In the presence of security countermeasures, a malware designed for data
exfiltration must do so using a covert channel to achieve its goal. Among
existing covert channels stands the domain name system (DNS) protocol. Although
the detection of covert channels over the DNS has been thoroughly studied in
the last decade, previous research dealt with a specific subclass of covert
channels, namely DNS tunneling. While the importance of tunneling detection is
not undermined, an entire class of low throughput DNS exfiltration malware
remained overlooked. The goal of this study is to propose a method for
detecting both tunneling and low-throughput data exfiltration over the DNS.
Towards this end, we propose a solution composed of a supervised feature
selection method, and an interchangeable, and adjustable anomaly detection
model trained on legitimate traffic. In the first step, a one-class classifier
is applied for detecting domain-specific traffic that does not conform with the
normal behavior. Then, in the second step, in order to reduce the false
positive rate resulting from the attempt to detect the low-throughput data
exfiltration we apply a rule-based filter that filters data exchange over DNS
used by legitimate services. Our solution was evaluated on a medium-scale
recursive DNS server logs, and involved more than 75,000 legitimate uses and
almost 2,000 attacks. Evaluation results shows that while DNS tunneling is
covered with at least 99% recall rate and less than 0.01% false positive rate,
the detection of low throughput exfiltration is more difficult. While not
preventing it completely, our solution limits a malware attempting to avoid
detection with at most a 1kb/h of payload under the limitations of the DNS
syntax (equivalent to five credit cards details, or ten user credentials per
hour) which reduces the effectiveness of the attack.Comment: 5 figs. 7 table
Characterizing Certain DNS DDoS Attacks
This paper details data science research in the area of Cyber Threat
Intelligence applied to a specific type of Distributed Denial of Service (DDoS)
attack. We study a DDoS technique prevalent in the Domain Name System (DNS) for
which little malware have been recovered. Using data from a globally
distributed set of a passive collectors (pDNS), we create a statistical
classifier to identify these attacks and then use unsupervised learning to
investigate the attack events and the malware that generates them. The first
known major study of this technique, we discovered that current attacks have
little resemblance to published descriptions and identify several previously
unpublished features of the attacks. Through a combination of text and time
series features, we are able to characterize the dominant malware and
demonstrate that the number of global-scale attack systems is relatively small.Comment: 25 pages, 21 figure
CONDENSER: A Graph-Based Approachfor Detecting Botnets
Botnets represent a global problem and are responsible for causing large
financial and operational damage to their victims. They are implemented with
evasion in mind, and aim at hiding their architecture and authors, making them
difficult to detect in general. These kinds of networks are mainly used for
identity theft, virtual extortion, spam campaigns and malware dissemination.
Botnets have a great potential in warfare and terrorist activities, making it
of utmost importance to take action against. We present CONDENSER, a method for
identifying data generated by botnet activity. We start by selecting
appropriate the features from several data feeds, namely DNS non-existent
domain responses and live communication packages directed to command and
control servers that we previously sinkholed. By using machine learning
algorithms and a graph based representation of data, then allows one to
identify botnet activity, helps identifying anomalous traffic, quickly detect
new botnets and improve activities of tracking known botnets. Our main
contributions are threefold: first, the use of a machine learning classifier
for classifying domain names as being generated by domain generation algorithms
(DGA); second, a clustering algorithm using the set of selected features that
groups network communication with similar patterns; third, a graph based
knowledge representation framework where we store processed data, allowing us
to perform queries.Comment: BotConf 201
A Study of Newly Observed Hostnames and DNS Tunneling in the Wild
The domain name system (DNS) is a crucial backbone of the Internet and
millions of new domains are created on a daily basis. While the vast majority
of these domains are legitimate, adversaries also register new hostnames to
carry out nefarious purposes, such as scams, phishing, or other types of
attacks. In this paper, we present insights on the global utilization of DNS
through a measurement study examining exclusively newly observed hostnames via
passive DNS data analysis. We analyzed more than two billion such hostnames
collected over a period of two months. Surprisingly, we find that only three
second-level domains are responsible for more than half of all newly observed
hostnames every day. More specifically, we found that Google's Accelerated
Mobile Pages (AMP) project, the music streaming service Spotify, and a DNS
tunnel provider generate the majority of new domains on the Internet. DNS
tunneling is a covert channel technique to transfer arbitrary information over
DNS via DNS queries and answers. This technique is often (ab)used by attackers
to transfer data in a stealthy way, bypassing traditional network security
systems. We find that potential DNS tunnels cause a significant fraction of the
global DNS requests for new hostnames: our analysis reveals that nearly all
resource record type NULL requests and more than a third of all TXT requests
can be attributed to DNS tunnels.
Motivated by these empirical measurement results, we propose and implement a
method to identify DNS tunnels via a step-wise filtering approach that relies
on general characteristics of such tunnels (e.g., number of subdomains or
resource record type). Using our approach on empirical data, we successfully
identified 273 suspicious domains related to DNS tunnels, including two known
APT campaigns (Wekby and APT32)
Fast Flux Service Network Detection via Data Mining on Passive DNS Traffic
In the last decade, the use of fast flux technique has become established as
a common practice to organise botnets in Fast Flux Service Networks (FFSNs),
which are platforms able to sustain illegal online services with very high
availability. In this paper, we report on an effective fast flux detection
algorithm based on the passive analysis of the Domain Name System (DNS) traffic
of a corporate network. The proposed method is based on the near-real-time
identification of different metrics that measure a wide range of fast flux key
features; the metrics are combined via a simple but effective mathematical and
data mining approach. The proposed solution has been evaluated in a one-month
experiment over an enterprise network, with the injection of pcaps associated
with different malware campaigns, that leverage FFSNs and cover a wide variety
of attack scenarios. An in-depth analysis of a list of fast flux domains
confirmed the reliability of the metrics used in the proposed algorithm and
allowed for the identification of many IPs that turned out to be part of two
notorious FFSNs, namely Dark Cloud and SandiFlux, to the description of which
we therefore contribute. All the fast flux domains were detected with a very
low false positive rate; a comparison of performance indicators with previous
works show a remarkable improvement.Comment: This is a pre-print of an article published in the proceedings of
21st International Conference, ISC 2018, Guildford, UK, September 9-12, 2018.
The final authenticated version is available online at:
https://doi.org/10.1007/978-3-319-99136-8_2
Automatic Investigation Framework for Android Malware Cyber-Infrastructures
The popularity of Android system, not only in the handset devices but also in
IoT devices, makes it a very attractive destination for malware. Indeed,
malware is expanding at a similar rate targeting such devices that rely, in
most cases, on Internet to work properly. The state-of-the-art malware
mitigation solutions mainly focus on the detection of the actual malicious
Android apps using dy- namic and static analyses features to distinguish
malicious apps from benign ones. However, there is a small coverage for the In-
ternet/network dimension of the Android malicious apps. In this paper, we
present ToGather, an automatic investigation framework that takes the Android
malware samples, as input, and produces a situation awareness about the
malicious cyber infrastructure of these samples families. ToGather leverages
the state-of-the-art graph theory techniques to generate an actionable and
granular intelligence to mitigate the threat imposed by the malicious Internet
activity of the Android malware apps. We experiment ToGather on real malware
samples from various Android families, and the obtained results are interesting
and very promisingComment: 12 Page
Entropy-based Prediction of Network Protocols in the Forensic Analysis of DNS Tunnels
DNS tunneling techniques are often used for malicious purposes but network
security mechanisms have struggled to detect these. Network forensic analysis
has thus been used but has proved slow and effort intensive as Network
Forensics Analysis Tools struggle to deal with undocumented or new network
tunneling techniques. In this paper we present a method to aid forensic
analysis through automating the inference of protocols tunneled within DNS
tunneling techniques. We analyze the internal packet structure of DNS tunneling
techniques and characterize the information entropy of different network
protocols and their DNS tunneled equivalents. From this, we present our
protocol prediction method that uses entropy distribution averaging. Finally we
apply our method on a dataset to measure its performance and show that it has a
prediction accuracy of 75%. Our method also preserves privacy as it does not
parse the actual tunneled content, rather it only calculates the information
entropy
Detecting DGA domains with recurrent neural networks and side information
Modern malware typically makes use of a domain generation algorithm (DGA) to
avoid command and control domains or IPs being seized or sinkholed. This means
that an infected system may attempt to access many domains in an attempt to
contact the command and control server. Therefore, the automatic detection of
DGA domains is an important task, both for the sake of blocking malicious
domains and identifying compromised hosts. However, many DGAs use English
wordlists to generate plausibly clean-looking domain names; this makes
automatic detection difficult. In this work, we devise a notion of difficulty
for DGA families called the smashword score; this measures how much a DGA
family looks like English words. We find that this measure accurately reflects
how much a DGA family's domains look like they are made from natural English
words. We then describe our new modeling approach, which is a combination of a
novel recurrent neural network architecture with domain registration side
information. Our experiments show the model is capable of effectively
identifying domains generated by difficult DGA families. Our experiments also
show that our model outperforms existing approaches, and is able to reliably
detect difficult DGA families such as matsnu, suppobox, rovnix, and others. The
model's performance compared to the state of the art is best for DGA families
that resemble English words. We believe that this model could either be used in
a standalone DGA domain detector---such as an endpoint security
application---or alternately the model could be used as a part of a larger
malware detection system.Comment: Accepted to ARES 201
Detection under Privileged Information
For well over a quarter century, detection systems have been driven by models
learned from input features collected from real or simulated environments. An
artifact (e.g., network event, potential malware sample, suspicious email) is
deemed malicious or non-malicious based on its similarity to the learned model
at runtime. However, the training of the models has been historically limited
to only those features available at runtime. In this paper, we consider an
alternate learning approach that trains models using "privileged"
information--features available at training time but not at runtime--to improve
the accuracy and resilience of detection systems. In particular, we adapt and
extend recent advances in knowledge transfer, model influence, and distillation
to enable the use of forensic or other data unavailable at runtime in a range
of security domains. An empirical evaluation shows that privileged information
increases precision and recall over a system with no privileged information: we
observe up to 7.7% relative decrease in detection error for fast-flux bot
detection, 8.6% for malware traffic detection, 7.3% for malware classification,
and 16.9% for face recognition. We explore the limitations and applications of
different privileged information techniques in detection systems. Such
techniques provide a new means for detection systems to learn from data that
would otherwise not be available at runtime.Comment: A short version of this paper is accepted to ASIACCS 201
- …