13 research outputs found

    No NAT'd User left Behind: Fingerprinting Users behind NAT from NetFlow Records alone

    Full text link
    It is generally recognized that the traffic generated by an individual connected to a network acts as his biometric signature. Several tools exploit this fact to fingerprint and monitor users. Often, though, these tools assume to access the entire traffic, including IP addresses and payloads. This is not feasible on the grounds that both performance and privacy would be negatively affected. In reality, most ISPs convert user traffic into NetFlow records for a concise representation that does not include, for instance, any payloads. More importantly, large and distributed networks are usually NAT'd, thus a few IP addresses may be associated to thousands of users. We devised a new fingerprinting framework that overcomes these hurdles. Our system is able to analyze a huge amount of network traffic represented as NetFlows, with the intent to track people. It does so by accurately inferring when users are connected to the network and which IP addresses they are using, even though thousands of users are hidden behind NAT. Our prototype implementation was deployed and tested within an existing large metropolitan WiFi network serving about 200,000 users, with an average load of more than 1,000 users simultaneously connected behind 2 NAT'd IP addresses only. Our solution turned out to be very effective, with an accuracy greater than 90%. We also devised new tools and refined existing ones that may be applied to other contexts related to NetFlow analysis

    Detecting Poisoning Attacks on Hierarchical Malware Classification Systems

    Get PDF
    Anti-virus software based on unsupervised hierarchical clustering (HC) of malware samples has been shown to be vulnerable to poisoning attacks. In this kind of attack, a malicious player degrades anti-virus performance by submitting to the database samples specifically designed to collapse the classification hierarchy utilized by the anti-virus (and constructed through HC) or otherwise deform it in a way that would render it useless. Though each poisoning attack needs to be tailored to the particular HC scheme deployed, existing research seems to indicate that no particular HC method by itself is immune. We present results on applying a new notion of entropy for combinatorial dendrograms to the problem of controlling the influx of samples into the data base and deflecting poisoning attacks. In a nutshell, effective and tractable measures of change in hierarchy complexity are derived from the above, enabling on-the-fly flagging and rejection of potentially damaging samples. The information-theoretic underpinnings of these measures ensure their indifference to which particular poisoning algorithm is being used by the attacker, rendering them particularly attractive in this setting

    Segmentation and Model Generation for Large-Scale Cyber Attacks

    Get PDF
    Raw Cyber attack traffic can present more questions than answers to security analysts. Especially with large-scale observables it is difficult to identify which packets are relevant and what attack behaviors are present. Many existing works in Host or Flow Clustering attempt to group similar behaviors to expedite analysis; these works often phrase the problem directly as offline unsupervised machine learning. This work proposes online processing to simultaneously model coordinating actors and segment traffic that is relevant to a target of interest, all while it is being received. The goal is not just to aggregate similar attack behaviors, but to provide situational awareness by grouping potential coordinators and isolating an attack area of interest around a particular target. The clustering problem is recast as a supervised learning problem: classifying received traffic to the most likely attack model, and iteratively introducing new attack models to explain received traffic. A novel graphical prior probability is defined based on the macroscopic attack structure to improve classification. Malicious traffic captures provided by the Cooperative Association for Internet Data Analysis are used to demonstrate the accuracy of the online model generation and segmentation

    Probabilistic Modeling and Inference for Obfuscated Network Attack Sequences

    Get PDF
    Prevalent computing devices with networking capabilities have become critical network infrastructure for government, industry, academia and every-day life. As their value rises, the motivation driving network attacks on this infrastructure has shifted from the pursuit of notoriety to the pursuit of profit or political gains, leading to network attack on various scales. Facing diverse network attack strategies and overwhelming alters, much work has been devoted to correlate observed malicious events to pre-defined scenarios, attempting to deduce the attack plans based on expert models of how network attacks may transpire. We started the exploration of characterizing network attacks by investigating how temporal and spatial features of attack sequence can be used to describe different types of attack sources in real data set. Attack sequence models were built from real data set to describe different attack strategies. Based on the probabilistic attack sequence model, attack predictions were made to actively predict next possible actions. Experiments through attack predictions have revealed that sophisticated attackers can employ a number of obfuscation techniques to confuse the alert correlation engine or classifier. Unfortunately, most exiting work treats attack obfuscations by developing ad-hoc fixes to specific obfuscation technique. To this end, we developed an attack modeling framework that enables a systematical analysis of obfuscations. The proposed framework represents network attack strategies as general finite order Markov models and integrates it with different attack obfuscation models to form probabilistic graphical model models. A set of algorithms is developed to inference the network attack strategies given the models and the observed sequences, which are likely to be obfuscated. The algorithms enable an efficient analysis of the impact of different obfuscation techniques and attack strategies, by determining the expected classification accuracy of the obfuscated sequences. The algorithms are developed by integrating the recursion concept in dynamic programming and the Monte-Carlo method. The primary contributions of this work include the development of the formal framework and the algorithms to evaluate the impact of attack obfuscations. Several knowledge-driven attack obfuscation models are developed and analyzed to demonstrate the impact of different types of commonly used obfuscation techniques. The framework and algorithms developed in this work can also be applied to other contexts beyond network security. Any behavior sequences that might suffer from noise and require matching to pre-defined models can use this work to recover the most likely original sequence or evaluate quantitatively the expected classification accuracy one can achieve to separate the sequences

    Integrate Model and Instance Based Machine Learning for Network Intrusion Detection

    Get PDF
    Indiana University-Purdue University Indianapolis (IUPUI)In computer networks, the convenient internet access facilitates internet services, but at the same time also augments the spread of malicious software which could represent an attack or unauthorized access. Thereby, making the intrusion detection an important area to explore for detecting these unwanted activities. This thesis concentrates on combining the Model and Instance Based Machine Learning for detecting intrusions through a series of algorithms starting from clustering the similar hosts. Similar hosts have been found based on the supervised machine learning techniques like Support Vector Machines, Decision Trees and K Nearest Neighbors using our proposed Data Fusion algorithm. Maximal cliques of Graph Theory has been explored to find the clusters. A recursive way is proposed to merge the decision areas of best features. The idea is to implement a combination of model and instance based machine learning and analyze how it performs as compared to a conventional machine learning algorithm like Random Forest for intrusion detection. The system has been evaluated on three datasets by CTU-13. The results show that our proposed method gives better detection rate as compared to traditional methods which might overfit the data. The research work done in model merging, instance based learning, random forests, data mining and ensemble learning with regards to intrusion detection have been studied and taken as reference

    A reputation framework for behavioural history: developing and sharing reputations from behavioural history of network clients

    Get PDF
    The open architecture of the Internet has enabled its massive growth and success by facilitating easy connectivity between hosts. At the same time, the Internet has also opened itself up to abuse, e.g. arising out of unsolicited communication, both intentional and unintentional. It remains an open question as to how best servers should protect themselves from malicious clients whilst offering good service to innocent clients. There has been research on behavioural profiling and reputation of clients, mostly at the network level and also for email as an application, to detect malicious clients. However, this area continues to pose open research challenges. This thesis is motivated by the need for a generalised framework capable of aiding efficient detection of malicious clients while being able to reward clients with behaviour profiles conforming to the acceptable use and other relevant policies. The main contribution of this thesis is a novel, generalised, context-aware, policy independent, privacy preserving framework for developing and sharing client reputation based on behavioural history. The framework, augmenting existing protocols, allows fitting in of policies at various stages, thus keeping itself open and flexible to implementation. Locally recorded behavioural history of clients with known identities are translated to client reputations, which are then shared globally. The reputations enable privacy for clients by not exposing the details of their behaviour during interactions with the servers. The local and globally shared reputations facilitate servers in selecting service levels, including restricting access to malicious clients. We present results and analyses of simulations, with synthetic data and some proposed example policies, of client-server interactions and of attacks on our model. Suggestions presented for possible future extensions are drawn from our experiences with simulation

    Securing Enterprise Networks with Statistical Node Behavior Profiling

    Get PDF
    The substantial proliferation of the Internet has made it the most critical infrastructure in today\u27s world. However, it is still vulnerable to various kinds of attacks/malwares and poses a number of great security challenges. Furthermore, we have also witnessed in the past decade that there is always a fast self-evolution of attacks/malwares (e.g. from worms to botnets) against every success in network security. Network security thereby remains a hot topic in both research and industry and requires both continuous and great attention. In this research, we consider two fundamental areas in network security, malware detection and background traffic modeling, from a new view point of node behavior profiling under enterprise network environments. Our main objectives are to extend and enhance the current research in these two areas. In particular, central to our research is the node behavior profiling approach that groups the behaviors of different nodes by jointly considering time and spatial correlations. We also present an extensive study on botnets, which are believed to be the largest threat to the Internet. To better understand the botnet, we propose a botnet framework and predict a new P2P botnet that is much stronger and stealthier than the current ones. We then propose anomaly malware detection approaches based directly on the insights (statistical characteristics) from the node behavior study and apply them on P2P botnet detection. Further, by considering the worst case attack model where the botmaster knows all the parameter values used in detection, we propose a fast and optimized anomaly detection approach by formulating the detection problem as an optimization problem. In addition, we propose a novel traffic modeling structure using behavior profiles for NIDS evaluations. It is efficient and takes into account the node heterogeneity in traffic modeling. It is also compatible with most current modeling schemes and helpful in generating better realistic background traffic. Last but not least, we evaluate the proposed approaches using real user trace from enterprise networks and achieve encouraging results. Our contributions in this research include: 1) a new node behavior profiling approach to study the normal node behavior; 2) a framework for botnets; 3) a new P2P botnet and performance comparisons with other P2P botnets; 4) two anomaly detection approaches based on node behavior profiles; 4) a fast and optimized anomaly detection approach under the worst case attack model; 5) a new traffic modeling structure and 6) simulations and evaluations of the above approaches under real user data from enterprise networks. To the best of our knowledge, we are the first to propose the botnet framework, consider the worst case attack model and propose corresponding fast and optimized solution in botnet related research. We are also the first to propose efficient solutions in traffic modeling without the assumption of node homogeneity
    corecore