1,781 research outputs found

    A Survey on Malicious Domains Detection through DNS Data Analysis

    Full text link
    Malicious domains are one of the major resources required for adversaries to run attacks over the Internet. Due to the important role of the Domain Name System (DNS), extensive research has been conducted to identify malicious domains based on their unique behavior reflected in different phases of the life cycle of DNS queries and responses. Existing approaches differ significantly in terms of intuitions, data analysis methods as well as evaluation methodologies. This warrants a thorough systematization of the approaches and a careful review of the advantages and limitations of every group. In this paper, we perform such an analysis. In order to achieve this goal, we present the necessary background knowledge on DNS and malicious activities leveraging DNS. We describe a general framework of malicious domain detection techniques using DNS data. Applying this framework, we categorize existing approaches using several orthogonal viewpoints, namely (1) sources of DNS data and their enrichment, (2) data analysis methods, and (3) evaluation strategies and metrics. In each aspect, we discuss the important challenges that the research community should address in order to fully realize the power of DNS data analysis to fight against attacks leveraging malicious domains.Comment: 35 pages, to appear in ACM CSU

    Detection of Malicious and Low Throughput Data Exfiltration Over the DNS Protocol

    Full text link
    In the presence of security countermeasures, a malware designed for data exfiltration must do so using a covert channel to achieve its goal. Among existing covert channels stands the domain name system (DNS) protocol. Although the detection of covert channels over the DNS has been thoroughly studied in the last decade, previous research dealt with a specific subclass of covert channels, namely DNS tunneling. While the importance of tunneling detection is not undermined, an entire class of low throughput DNS exfiltration malware remained overlooked. The goal of this study is to propose a method for detecting both tunneling and low-throughput data exfiltration over the DNS. Towards this end, we propose a solution composed of a supervised feature selection method, and an interchangeable, and adjustable anomaly detection model trained on legitimate traffic. In the first step, a one-class classifier is applied for detecting domain-specific traffic that does not conform with the normal behavior. Then, in the second step, in order to reduce the false positive rate resulting from the attempt to detect the low-throughput data exfiltration we apply a rule-based filter that filters data exchange over DNS used by legitimate services. Our solution was evaluated on a medium-scale recursive DNS server logs, and involved more than 75,000 legitimate uses and almost 2,000 attacks. Evaluation results shows that while DNS tunneling is covered with at least 99% recall rate and less than 0.01% false positive rate, the detection of low throughput exfiltration is more difficult. While not preventing it completely, our solution limits a malware attempting to avoid detection with at most a 1kb/h of payload under the limitations of the DNS syntax (equivalent to five credit cards details, or ten user credentials per hour) which reduces the effectiveness of the attack.Comment: 5 figs. 7 table

    Characterizing Certain DNS DDoS Attacks

    Full text link
    This paper details data science research in the area of Cyber Threat Intelligence applied to a specific type of Distributed Denial of Service (DDoS) attack. We study a DDoS technique prevalent in the Domain Name System (DNS) for which little malware have been recovered. Using data from a globally distributed set of a passive collectors (pDNS), we create a statistical classifier to identify these attacks and then use unsupervised learning to investigate the attack events and the malware that generates them. The first known major study of this technique, we discovered that current attacks have little resemblance to published descriptions and identify several previously unpublished features of the attacks. Through a combination of text and time series features, we are able to characterize the dominant malware and demonstrate that the number of global-scale attack systems is relatively small.Comment: 25 pages, 21 figure

    CONDENSER: A Graph-Based Approachfor Detecting Botnets

    Full text link
    Botnets represent a global problem and are responsible for causing large financial and operational damage to their victims. They are implemented with evasion in mind, and aim at hiding their architecture and authors, making them difficult to detect in general. These kinds of networks are mainly used for identity theft, virtual extortion, spam campaigns and malware dissemination. Botnets have a great potential in warfare and terrorist activities, making it of utmost importance to take action against. We present CONDENSER, a method for identifying data generated by botnet activity. We start by selecting appropriate the features from several data feeds, namely DNS non-existent domain responses and live communication packages directed to command and control servers that we previously sinkholed. By using machine learning algorithms and a graph based representation of data, then allows one to identify botnet activity, helps identifying anomalous traffic, quickly detect new botnets and improve activities of tracking known botnets. Our main contributions are threefold: first, the use of a machine learning classifier for classifying domain names as being generated by domain generation algorithms (DGA); second, a clustering algorithm using the set of selected features that groups network communication with similar patterns; third, a graph based knowledge representation framework where we store processed data, allowing us to perform queries.Comment: BotConf 201

    A Study of Newly Observed Hostnames and DNS Tunneling in the Wild

    Full text link
    The domain name system (DNS) is a crucial backbone of the Internet and millions of new domains are created on a daily basis. While the vast majority of these domains are legitimate, adversaries also register new hostnames to carry out nefarious purposes, such as scams, phishing, or other types of attacks. In this paper, we present insights on the global utilization of DNS through a measurement study examining exclusively newly observed hostnames via passive DNS data analysis. We analyzed more than two billion such hostnames collected over a period of two months. Surprisingly, we find that only three second-level domains are responsible for more than half of all newly observed hostnames every day. More specifically, we found that Google's Accelerated Mobile Pages (AMP) project, the music streaming service Spotify, and a DNS tunnel provider generate the majority of new domains on the Internet. DNS tunneling is a covert channel technique to transfer arbitrary information over DNS via DNS queries and answers. This technique is often (ab)used by attackers to transfer data in a stealthy way, bypassing traditional network security systems. We find that potential DNS tunnels cause a significant fraction of the global DNS requests for new hostnames: our analysis reveals that nearly all resource record type NULL requests and more than a third of all TXT requests can be attributed to DNS tunnels. Motivated by these empirical measurement results, we propose and implement a method to identify DNS tunnels via a step-wise filtering approach that relies on general characteristics of such tunnels (e.g., number of subdomains or resource record type). Using our approach on empirical data, we successfully identified 273 suspicious domains related to DNS tunnels, including two known APT campaigns (Wekby and APT32)

    Fast Flux Service Network Detection via Data Mining on Passive DNS Traffic

    Full text link
    In the last decade, the use of fast flux technique has become established as a common practice to organise botnets in Fast Flux Service Networks (FFSNs), which are platforms able to sustain illegal online services with very high availability. In this paper, we report on an effective fast flux detection algorithm based on the passive analysis of the Domain Name System (DNS) traffic of a corporate network. The proposed method is based on the near-real-time identification of different metrics that measure a wide range of fast flux key features; the metrics are combined via a simple but effective mathematical and data mining approach. The proposed solution has been evaluated in a one-month experiment over an enterprise network, with the injection of pcaps associated with different malware campaigns, that leverage FFSNs and cover a wide variety of attack scenarios. An in-depth analysis of a list of fast flux domains confirmed the reliability of the metrics used in the proposed algorithm and allowed for the identification of many IPs that turned out to be part of two notorious FFSNs, namely Dark Cloud and SandiFlux, to the description of which we therefore contribute. All the fast flux domains were detected with a very low false positive rate; a comparison of performance indicators with previous works show a remarkable improvement.Comment: This is a pre-print of an article published in the proceedings of 21st International Conference, ISC 2018, Guildford, UK, September 9-12, 2018. The final authenticated version is available online at: https://doi.org/10.1007/978-3-319-99136-8_2

    Automatic Investigation Framework for Android Malware Cyber-Infrastructures

    Full text link
    The popularity of Android system, not only in the handset devices but also in IoT devices, makes it a very attractive destination for malware. Indeed, malware is expanding at a similar rate targeting such devices that rely, in most cases, on Internet to work properly. The state-of-the-art malware mitigation solutions mainly focus on the detection of the actual malicious Android apps using dy- namic and static analyses features to distinguish malicious apps from benign ones. However, there is a small coverage for the In- ternet/network dimension of the Android malicious apps. In this paper, we present ToGather, an automatic investigation framework that takes the Android malware samples, as input, and produces a situation awareness about the malicious cyber infrastructure of these samples families. ToGather leverages the state-of-the-art graph theory techniques to generate an actionable and granular intelligence to mitigate the threat imposed by the malicious Internet activity of the Android malware apps. We experiment ToGather on real malware samples from various Android families, and the obtained results are interesting and very promisingComment: 12 Page

    Entropy-based Prediction of Network Protocols in the Forensic Analysis of DNS Tunnels

    Full text link
    DNS tunneling techniques are often used for malicious purposes but network security mechanisms have struggled to detect these. Network forensic analysis has thus been used but has proved slow and effort intensive as Network Forensics Analysis Tools struggle to deal with undocumented or new network tunneling techniques. In this paper we present a method to aid forensic analysis through automating the inference of protocols tunneled within DNS tunneling techniques. We analyze the internal packet structure of DNS tunneling techniques and characterize the information entropy of different network protocols and their DNS tunneled equivalents. From this, we present our protocol prediction method that uses entropy distribution averaging. Finally we apply our method on a dataset to measure its performance and show that it has a prediction accuracy of 75%. Our method also preserves privacy as it does not parse the actual tunneled content, rather it only calculates the information entropy

    Detecting DGA domains with recurrent neural networks and side information

    Full text link
    Modern malware typically makes use of a domain generation algorithm (DGA) to avoid command and control domains or IPs being seized or sinkholed. This means that an infected system may attempt to access many domains in an attempt to contact the command and control server. Therefore, the automatic detection of DGA domains is an important task, both for the sake of blocking malicious domains and identifying compromised hosts. However, many DGAs use English wordlists to generate plausibly clean-looking domain names; this makes automatic detection difficult. In this work, we devise a notion of difficulty for DGA families called the smashword score; this measures how much a DGA family looks like English words. We find that this measure accurately reflects how much a DGA family's domains look like they are made from natural English words. We then describe our new modeling approach, which is a combination of a novel recurrent neural network architecture with domain registration side information. Our experiments show the model is capable of effectively identifying domains generated by difficult DGA families. Our experiments also show that our model outperforms existing approaches, and is able to reliably detect difficult DGA families such as matsnu, suppobox, rovnix, and others. The model's performance compared to the state of the art is best for DGA families that resemble English words. We believe that this model could either be used in a standalone DGA domain detector---such as an endpoint security application---or alternately the model could be used as a part of a larger malware detection system.Comment: Accepted to ARES 201

    Detection under Privileged Information

    Full text link
    For well over a quarter century, detection systems have been driven by models learned from input features collected from real or simulated environments. An artifact (e.g., network event, potential malware sample, suspicious email) is deemed malicious or non-malicious based on its similarity to the learned model at runtime. However, the training of the models has been historically limited to only those features available at runtime. In this paper, we consider an alternate learning approach that trains models using "privileged" information--features available at training time but not at runtime--to improve the accuracy and resilience of detection systems. In particular, we adapt and extend recent advances in knowledge transfer, model influence, and distillation to enable the use of forensic or other data unavailable at runtime in a range of security domains. An empirical evaluation shows that privileged information increases precision and recall over a system with no privileged information: we observe up to 7.7% relative decrease in detection error for fast-flux bot detection, 8.6% for malware traffic detection, 7.3% for malware classification, and 16.9% for face recognition. We explore the limitations and applications of different privileged information techniques in detection systems. Such techniques provide a new means for detection systems to learn from data that would otherwise not be available at runtime.Comment: A short version of this paper is accepted to ASIACCS 201
    • …
    corecore