396 research outputs found

    Using Markov Models and Statistics to Learn, Extract, Fuse, and Detect Patterns in Raw Data

    Full text link
    Many systems are partially stochastic in nature. We have derived data driven approaches for extracting stochastic state machines (Markov models) directly from observed data. This chapter provides an overview of our approach with numerous practical applications. We have used this approach for inferring shipping patterns, exploiting computer system side-channel information, and detecting botnet activities. For contrast, we include a related data-driven statistical inferencing approach that detects and localizes radiation sources.Comment: Accepted by 2017 International Symposium on Sensor Networks, Systems and Securit

    Network Traffic Analysis Using Stochastic Grammars

    Get PDF
    Network traffic analysis is widely used to infer information from Internet traffic. This is possible even if the traffic is encrypted. Previous work uses traffic characteristics, such as port numbers, packet sizes, and frequency, without looking for more subtle patterns in the network traffic. In this work, we use stochastic grammars, hidden Markov models (HMMs) and probabilistic context-free grammars (PCFGs), as pattern recognition tools for traffic analysis. HMMs are widely used for pattern recognition and detection. We use a HMM inference approach. With inferred HMMs, we use confidence intervals (CI) to detect if a data sequence matches the HMM. To compare HMMs, we define a normalized Markov metric. A statistical test is used to determine model equivalence. Our metric systematically removes the least likely events from both HMMs until the remaining models are statistically equivalent. This defines the distance between models. We extend the use of HMMs to PCFGs, which have more expressive power. We estimate PCFG production probabilities from data. A statistical test is used for detection. We present three applications of HMM and PCFG detection to network traffic analysis. First, we infer the presence of protocol tunneling through Tor (the onion router) anonymization network. The Markov metric quantifies the similarity of network traffic HMMs in Tor to identify the protocol. It also measures communication noise in Tor network. We use HMMs to detect centralized botnet traffic. We infer HMMs from botnet traffic data and detect botnet infections. Experimental results show that HMMs can accurately detect Zeus botnet traffic. To hide their locations better, newer botnets have P2P control structures. Hierarchical P2P botnets contain recursive and hierarchical patterns. We use PCFGs to detect P2P botnet traffic. Experimentation on real-world traffic data shows that PCFGs can accurately differentiate between P2P botnet traffic and normal Internet traffic

    Master of Puppets: Analyzing And Attacking A Botnet For Fun And Profit

    Full text link
    A botnet is a network of compromised machines (bots), under the control of an attacker. Many of these machines are infected without their owners' knowledge, and botnets are the driving force behind several misuses and criminal activities on the Internet (for example spam emails). Depending on its topology, a botnet can have zero or more command and control (C&C) servers, which are centralized machines controlled by the cybercriminal that issue commands and receive reports back from the co-opted bots. In this paper, we present a comprehensive analysis of the command and control infrastructure of one of the world's largest proprietary spamming botnets between 2007 and 2012: Cutwail/Pushdo. We identify the key functionalities needed by a spamming botnet to operate effectively. We then develop a number of attacks against the command and control logic of Cutwail that target those functionalities, and make the spamming operations of the botnet less effective. This analysis was made possible by having access to the source code of the C&C software, as well as setting up our own Cutwail C&C server, and by implementing a clone of the Cutwail bot. With the help of this tool, we were able to enumerate the number of bots currently registered with the C&C server, impersonate an existing bot to report false information to the C&C server, and manipulate spamming statistics of an arbitrary bot stored in the C&C database. Furthermore, we were able to make the control server inaccessible by conducting a distributed denial of service (DDoS) attack. Our results may be used by law enforcement and practitioners to develop better techniques to mitigate and cripple other botnets, since many of findings are generic and are due to the workflow of C&C communication in general

    Command & Control: Understanding, Denying and Detecting - A review of malware C2 techniques, detection and defences

    Full text link
    In this survey, we first briefly review the current state of cyber attacks, highlighting significant recent changes in how and why such attacks are performed. We then investigate the mechanics of malware command and control (C2) establishment: we provide a comprehensive review of the techniques used by attackers to set up such a channel and to hide its presence from the attacked parties and the security tools they use. We then switch to the defensive side of the problem, and review approaches that have been proposed for the detection and disruption of C2 channels. We also map such techniques to widely-adopted security controls, emphasizing gaps or limitations (and success stories) in current best practices.Comment: Work commissioned by CPNI, available at c2report.org. 38 pages. Listing abstract compressed from version appearing in repor

    Characterization and modeling of top spam botnets

    Get PDF
    The increasing impact of the Internet in the global economy has transformed Botnets into one of the most relevant security threats for citizens, organizations and governments. Despite the significant efforts that have been made over the last years to understand this phenomenon and develop detection techniques and countermeasures, this continues to be a field with big challenges to address. Several approaches can be taken to study Botnets: analyze its source code, which can be a hard task because it is usually unavailable; study the control mechanism, particularly the activity of its Command and Control server(s); study its behavior, by measuring real traffic and collecting relevant statistics. In this work, we have installed some of the most popular spam Botnets, captured the originated traffic and characterized it in order to identify the main trends/patterns of their activity. From the intensive statistics that were collected, it was possible to conclude that there are distinct features between Botnets that can be explored to build efficient detection methodologies. Based on this study, the second part of the paper proposes a generic and systematic model to describe the network dynamics whenever a Botnet threat is detected, defining all actors, dimensions, states and actions that need to be taken into account at each moment. We believe that this type of modeling approach is the basis for developing systematic and integrated frameworks and strategies to predict and fight Botnet threats in an efficient way.This research was supported by Fundação para a Ciência e a Tecnologia, under research project PTDC/EEA-TEL/101880/2008

    On the Generation of Cyber Threat Intelligence: Malware and Network Traffic Analyses

    Get PDF
    In recent years, malware authors drastically changed their course on the subject of threat design and implementation. Malware authors, namely, hackers or cyber-terrorists perpetrate new forms of cyber-crimes involving more innovative hacking techniques. Being motivated by financial or political reasons, attackers target computer systems ranging from personal computers to organizations’ networks to collect and steal sensitive data as well as blackmail, scam people, or scupper IT infrastructures. Accordingly, IT security experts face new challenges, as they need to counter cyber-threats proactively. The challenge takes a continuous allure of a fight, where cyber-criminals are obsessed by the idea of outsmarting security defenses. As such, security experts have to elaborate an effective strategy to counter cyber-criminals. The generation of cyber-threat intelligence is of a paramount importance as stated in the following quote: “the field is owned by who owns the intelligence”. In this thesis, we address the problem of generating timely and relevant cyber-threat intelligence for the purpose of detection, prevention and mitigation of cyber-attacks. To do so, we initiate a research effort, which falls into: First, we analyze prominent cyber-crime toolkits to grasp the inner-secrets and workings of advanced threats. We dissect prominent malware like Zeus and Mariposa botnets to uncover their underlying techniques used to build a networked army of infected machines. Second, we investigate cyber-crime infrastructures, where we elaborate on the generation of a cyber-threat intelligence for situational awareness. We adapt a graph-theoretic approach to study infrastructures used by malware to perpetrate malicious activities. We build a scoring mechanism based on a page ranking algorithm to measure the badness of infrastructures’ elements, i.e., domains, IPs, domain owners, etc. In addition, we use the min-hashing technique to evaluate the level of sharing among cyber-threat infrastructures during a period of one year. Third, we use machine learning techniques to fingerprint malicious IP traffic. By fingerprinting, we mean detecting malicious network flows and their attribution to malware families. This research effort relies on a ground truth collected from the dynamic analysis of malware samples. Finally, we investigate the generation of cyber-threat intelligence from passive DNS streams. To this end, we design and implement a system that generates anomalies from passive DNS traffic. Due to the tremendous nature of DNS data, we build a system on top of a cluster computing framework, namely, Apache Spark [70]. The integrated analytic system has the ability to detect anomalies observed in DNS records, which are potentially generated by widespread cyber-threats

    Fast and Accurate Machine Learning-based Malware Detection via RC4 Ciphertext Analysis

    Get PDF
    Malware is dramatically increasing its viability while hiding its malicious intent and/or behavior by employing ciphers. So far, many efforts have been made to detect malware and prevent it from damaging users by monitoring network packets. However, conventional detection schemes analyzing network packets directly are hardly applicable to detect the advanced malware that encrypts the communication. Cryptoanalysis of each packet flowing over a network might be one feasible solution for the problem. However, the approach is computationally expensive and lacks accuracy, which is consequently not a practical solution. To tackle these problems, in this paper, we propose novel schemes that can accurately detect malware packets encrypted by RC4 without decryption in a timely manner. First, we discovered that a fixed encryption key generates unique statistical patterns on RC4 ciphertexts. Then, we detect malware packets of RC4 ciphertexts efficiently and accurately by utilizing the discovered statistical patterns of RC4 ciphertext given encryption key. Our proposed schemes directly analyze network packets without decrypting ciphertexts. Moreover, our analysis can be effectively executed with only a very small subset of the network packet. To the best of our knowledge, the unique signature has never been discussed in any previous research. Our intensive experimental results with both simulation data and actual malware show that our proposed schemes are extremely fast (23.06±1.52 milliseconds) and highly accurate (100%) on detecting a DarkComet malware with only a network packet of 36 bytes

    Packet tagging system for enhanced traffic profiling

    Get PDF
    This paper describes the design and implementation of a system for managing the tagging of traffic, in order to create detailed personal and applicational profiles. The ultimate goal of this separation is to facilitate the task of traffic auditing tools, namely in their struggle against botnets. The architecture was designed for domestic or enterprise facilities and uses the 802. IX authentication architecture as the base support infrastructure for dealing with unequivocal traffic binding to specific entities (persons or servers). Simultaneously, such binding uses virtual identities and encryption for preserving the privacy and protection of traffic originators from network eavesdroppers other than authorized traffic auditors. The traffic from each known originator is profiled with some detail, namely it includes a role tag and an application tag. Role tags are defined by originators and only partially follow a standard policy. On the contrary, application tags should follow a standard policy in order to reason about abnormal scenarios raised when correlating traffic from several instances of the same application. A first prototype was developed for Linux, using iptables and FreeRADIUS and conveying packet tagging information on a new IP option field
    corecore