655 research outputs found

    CharBot: A Simple and Effective Method for Evading DGA Classifiers

    Full text link
    Domain generation algorithms (DGAs) are commonly leveraged by malware to create lists of domain names which can be used for command and control (C&C) purposes. Approaches based on machine learning have recently been developed to automatically detect generated domain names in real-time. In this work, we present a novel DGA called CharBot which is capable of producing large numbers of unregistered domain names that are not detected by state-of-the-art classifiers for real-time detection of DGAs, including the recently published methods FANCI (a random forest based on human-engineered features) and LSTM.MI (a deep learning approach). CharBot is very simple, effective and requires no knowledge of the targeted DGA classifiers. We show that retraining the classifiers on CharBot samples is not a viable defense strategy. We believe these findings show that DGA classifiers are inherently vulnerable to adversarial attacks if they rely only on the domain name string to make a decision. Designing a robust DGA classifier may, therefore, necessitate the use of additional information besides the domain name alone. To the best of our knowledge, CharBot is the simplest and most efficient black-box adversarial attack against DGA classifiers proposed to date

    Malware distributions and graph structure of the Web

    Full text link
    Knowledge about the graph structure of the Web is important for understanding this complex socio-technical system and for devising proper policies supporting its future development. Knowledge about the differences between clean and malicious parts of the Web is important for understanding potential treats to its users and for devising protection mechanisms. In this study, we conduct data science methods on a large crawl of surface and deep Web pages with the aim to increase such knowledge. To accomplish this, we answer the following questions. Which theoretical distributions explain important local characteristics and network properties of websites? How are these characteristics and properties different between clean and malicious (malware-affected) websites? What is the prediction power of local characteristics and network properties to classify malware websites? To the best of our knowledge, this is the first large-scale study describing the differences in global properties between malicious and clean parts of the Web. In other words, our work is building on and bridging the gap between \textit{Web science} that tackles large-scale graph representations and \textit{Web cyber security} that is concerned with malicious activities on the Web. The results presented herein can also help antivirus vendors in devising approaches to improve their detection algorithms

    An evaluation of DGA classifiers

    Get PDF
    Domain Generation Algorithms (DGAs) are a popular technique used by contemporary malware for command-and-control (C&C) purposes. Such malware utilizes DGAs to create a set of domain names that, when resolved, provide information necessary to establish a link to a C&C server. Automated discovery of such domain names in real-time DNS traffic is critical for network security as it allows to detect infection, and, in some cases, take countermeasures to disrupt the communication and identify infected machines. Detection of the specific DGA malware family provides the administrator valuable information about the kind of infection and steps that need to be taken. In this paper we compare and evaluate machine learning methods that classify domain names as benign or DGA, and label the latter according to their malware family. Unlike previous work, we select data for test and training sets according to observation time and known seeds. This allows us to assess the robustness of the trained classifiers for detecting domains generated by the same families at a different time or when seeds change. Our study includes tree ensemble models based on human-engineered features and deep neural networks that learn features automatically from domain names. We find that all state-of-the-art classifiers are significantly better at catching domain names from malware families with a time-dependent seed compared to time-invariant DGAs. In addition, when applying the trained classifiers on a day of real traffic, we find that many domain names unjustifiably are flagged as malicious, thereby revealing the shortcomings of relying on a standard whitelist for training a production grade DGA detection system

    Detection of algorithmically generated malicious domain names using masked N-grams

    Get PDF
    Malware detection is a challenge that has increased in complexity in the last few years. A widely adopted strategy is to detect malware by means of analyzing network traffic, capturing the communications with their command and control (C&C) servers. However, some malware families have shifted to a stealthier communication strategy, since anti-malware companies maintain blacklists of known malicious locations. Instead of using static IP addresses or domain names, they algorithmically generate domain names that may host their C&C servers. Hence, blacklist approaches become ineffective since the number of domain names to block is large and varies from time to time. In this paper, we introduce a machine learning approach using Random Forest that relies on purely lexical features of the domain names to detect algorithmically generated domains. In particular, we propose using masked N-grams, together with other statistics obtained from the domain name. Furthermore, we provide a dataset built for experimentation that contains regular and algorithmically generated domain names, coming from different malware families. We also classify these families according to their type of domain generation algorithm. Our findings show that masked N-grams provide detection accuracy that is comparable to that of other existing techniques, but with much better performance

    Scalable Techniques for Anomaly Detection

    Get PDF
    Computer networks are constantly being attacked by malicious entities for various reasons. Network based attacks include but are not limited to, Distributed Denial of Service (DDoS), DNS based attacks, Cross-site Scripting (XSS) etc. Such attacks have exploited either the network protocol or the end-host software vulnerabilities for perpetration. Current network traffic analysis techniques employed for detection and/or prevention of these anomalies suffer from significant delay or have only limited scalability because of their huge resource requirements. This dissertation proposes more scalable techniques for network anomaly detection. We propose using DNS analysis for detecting a wide variety of network anomalies. The use of DNS is motivated by the fact that DNS traffic comprises only 2-3% of total network traffic reducing the burden on anomaly detection resources. Our motivation additionally follows from the observation that almost any Internet activity (legitimate or otherwise) is marked by the use of DNS. We propose several techniques for DNS traffic analysis to distinguish anomalous DNS traffic patterns which in turn identify different categories of network attacks. First, we present MiND, a system to detect misdirected DNS packets arising due to poisoned name server records or due to local infections such as caused by worms like DNSChanger. MiND validates misdirected DNS packets using an externally collected database of authoritative name servers for second or third-level domains. We deploy this tool at the edge of a university campus network for evaluation. Secondly, we focus on domain-fluxing botnet detection by exploiting the high entropy inherent in the set of domains used for locating the Command and Control (C&C) server. We apply three metrics namely the Kullback-Leibler divergence, the Jaccard Index, and the Edit distance, to different groups of domain names present in Tier-1 ISP DNS traces obtained from South Asia and South America. Our evaluation successfully detects existing domain-fluxing botnets such as Conficker and also recognizes new botnets. We extend this approach by utilizing DNS failures to improve the latency of detection. Alternatively, we propose a system which uses temporal and entropy-based correlation between successful and failed DNS queries, for fluxing botnet detection. We also present an approach which computes the reputation of domains in a bipartite graph of hosts within a network, and the domains accessed by them. The inference technique utilizes belief propagation, an approximation algorithm for marginal probability estimation. The computation of reputation scores is seeded through a small fraction of domains found in black and white lists. An application of this technique, on an HTTP-proxy dataset from a large enterprise, shows a high detection rate with low false positive rates
    • …
    corecore