148 research outputs found
Network Security Modelling with Distributional Data
We investigate the detection of botnet command and control (C2) hosts in
massive IP traffic using machine learning methods. To this end, we use NetFlow
data -- the industry standard for monitoring of IP traffic -- and ML models
using two sets of features: conventional NetFlow variables and distributional
features based on NetFlow variables. In addition to using static summaries of
NetFlow features, we use quantiles of their IP-level distributions as input
features in predictive models to predict whether an IP belongs to known botnet
families. These models are used to develop intrusion detection systems to
predict traffic traces identified with malicious attacks. The results are
validated by matching predictions to existing denylists of published malicious
IP addresses and deep packet inspection. The usage of our proposed novel
distributional features, combined with techniques that enable modelling complex
input feature spaces result in highly accurate predictions by our trained
models.Comment: Accepted and presented in CAMLIS 2022,
https://www.camlis.org/2022-conference. arXiv admin note: text overlap with
arXiv:2108.0892
An anomaly detection framework for cyber-security data
Data-driven anomaly detection systems unrivalled potential as complementary defence systems to existing signature-based tools as the number of cyber attacks increases. In this manuscript an anomaly detection system is presented that detects any abnormal deviations from the normal behaviour of an individual device. Device behaviour is defined as the number of network traffic events involving the device of interest observed within a pre-specified time period. The behaviour of each device at normal state is modelled to depend on its observed historic behaviour. A number of statistical and machine learning approaches are explored for modelling this relationship and through a comparative study, the Quantile Regression Forests approach is found to have the best predictive power. Based on the prediction intervals of the Quantile Regression Forests an anomaly detection system is proposed that characterises as abnormal, any observed behaviour outside of these intervals. A series of experiments for contaminating normal device behaviour are presented for examining the performance of the anomaly detection system. Through the conducted analysis the proposed anomaly detection system is found to outperform two other detection systems. The presented work has been conducted on two enterprise networks
Spatiotemporal Patterns and Predictability of Cyberattacks
Y.C.L. was supported by Air Force Office of Scientific Research (AFOSR) under grant no. FA9550-10-1-0083 and Army Research Office (ARO) under grant no. W911NF-14-1-0504. S.X. was supported by Army Research Office (ARO) under grant no. W911NF-13-1-0141. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.Peer reviewedPublisher PD
The time-varying dependency patterns of NetFlow statistics
We investigate where and how key dependency structure between measures of network activity change throughout the course of daily activity. Our approach to data-mining is probabilistic in nature, we formulate the identification of dependency patterns as a regularised statistical estimation problem. The resulting model can be interpreted as a set of time-varying graphs and provides a useful visual interpretation of network activity. We believe this is the first application of dynamic graphical modelling to network traffic of this kind. Investigations are performed on 9 days of real-world network traffic across a subset of IP's. We demonstrate that dependency between features may change across time and discuss how these change at an intra and inter-day level. Such variation in feature dependency may have important consequences for the design and implementation of probabilistic intrusion detection systems
A Macroscopic Study of Network Security Threats at the Organizational Level.
Defenders of today's network are confronted with a large number of malicious activities such as spam, malware, and denial-of-service attacks. Although many studies have been performed on how to mitigate security threats, the interaction between attackers and defenders is like a game of Whac-a-Mole, in which the security community is chasing after attackers rather than helping defenders to build systematic defensive solutions. As a complement to these studies that focus on attackers or end hosts, this thesis studies security threats from the perspective of the organization, the central authority that manages and defends a group of end hosts. This perspective provides a balanced position to understand security problems and to deploy and evaluate defensive solutions.
This thesis explores how a macroscopic view of network security from an organization's perspective can be formed to help measure, understand, and mitigate security threats. To realize this goal, we bring together a broad collection of reputation blacklists. We first measure the properties of the malicious sources identified by these blacklists and their impact on an organization. We then aggregate the malicious sources to Internet organizations and characterize the maliciousness of organizations and their evolution over a period of two and half years. Next, we aim to understand the cause of different maliciousness levels in different organizations. By examining the relationship between eight security mismanagement symptoms and the maliciousness of organizations, we find a strong positive correlation between mismanagement and maliciousness. Lastly, motivated by the observation that there are organizations that have a significant fraction of their IP addresses involved in malicious activities, we evaluate the tradeoff of one type of mitigation solution at the organization level --- network takedowns.PhDComputer Science and EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/116714/1/jingzj_1.pd
Spatiotemporal patterns and predictability of cyberattacks
A relatively unexplored issue in cybersecurity science and engineering is
whether there exist intrinsic patterns of cyberattacks. Conventional wisdom
favors absence of such patterns due to the overwhelming complexity of the
modern cyberspace. Surprisingly, through a detailed analysis of an extensive
data set that records the time-dependent frequencies of attacks over a
relatively wide range of consecutive IP addresses, we successfully uncover
intrinsic spatiotemporal patterns underlying cyberattacks, where the term
"spatio" refers to the IP address space. In particular, we focus on analyzing
{\em macroscopic} properties of the attack traffic flows and identify two main
patterns with distinct spatiotemporal characteristics: deterministic and
stochastic. Strikingly, there are very few sets of major attackers committing
almost all the attacks, since their attack "fingerprints" and target selection
scheme can be unequivocally identified according to the very limited number of
unique spatiotemporal characteristics, each of which only exists on a
consecutive IP region and differs significantly from the others. We utilize a
number of quantitative measures, including the flux-fluctuation law, the Markov
state transition probability matrix, and predictability measures, to
characterize the attack patterns in a comprehensive manner. A general finding
is that the attack patterns possess high degrees of predictability, potentially
paving the way to anticipating and, consequently, mitigating or even preventing
large-scale cyberattacks using macroscopic approaches
- …