6 research outputs found
Poisson factorization for peer-based anomaly detection
Anomaly detection systems are a promising tool to identify compromised user credentials and malicious insiders in enterprise networks. Most existing approaches for modelling user behaviour rely on either independent observations for each user or on pre-defined user peer groups. A method is proposed based on recommender system algorithms to learn overlapping user peer groups and to use this learned structure to detect anomalous activity. Results analysing the authentication and process-running activities of thousands of users show that the proposed method can detect compromised user accounts during a red team exercise
Spectral embedding of weighted graphs
This paper concerns the statistical analysis of a weighted graph through
spectral embedding. Under a latent position model in which the expected
adjacency matrix has low rank, we prove uniform consistency and a central limit
theorem for the embedded nodes, treated as latent position estimates. In the
special case of a weighted stochastic block model, this result implies that the
embedding follows a Gaussian mixture model with each component representing a
community. We exploit this to formally evaluate different weight
representations of the graph using Chernoff information. For example, in a
network anomaly detection problem where we observe a p-value on each edge, we
recommend against directly embedding the matrix of p-values, and instead using
threshold or log p-values, depending on network sparsity and signal strength.Comment: 29 pages, 8 figure
A source separation approach to temporal graph modelling for computer networks
Detecting malicious activity within an enterprise computer network can be
framed as a temporal link prediction task: given a sequence of graphs
representing communications between hosts over time, the goal is to predict
which edges should--or should not--occur in the future. However, standard
temporal link prediction algorithms are ill-suited for computer network
monitoring as they do not take account of the peculiar short-term dynamics of
computer network activity, which exhibits sharp seasonal variations. In order
to build a better model, we propose a source separation-inspired description of
computer network activity: at each time step, the observed graph is a mixture
of subgraphs representing various sources of activity, and short-term dynamics
result from changes in the mixing coefficients. Both qualitative and
quantitative experiments demonstrate the validity of our approach
Automated Design of Network Security Metrics
Many abstract security measurements are based on characteristics of a graph that represents the network. These are typically simple and quick to compute but are often of little practical use in making real-world predictions. Practical network security is often measured using simulation or real-world exercises. These approaches better represent realistic outcomes but can be costly and time-consuming. This work aims to combine the strengths of these two approaches, developing efficient heuristics that accurately predict attack success. Hyper-heuristic machine learning techniques, trained on network attack simulation training data, are used to produce novel graph-based security metrics. These low-cost metrics serve as an approximation for simulation when measuring network security in real time. The approach is tested and verified using a simulation based on activity from an actual large enterprise network. The results demonstrate the potential of using hyper-heuristic techniques to rapidly evolve and react to emerging cybersecurity threats
Bayesian Models Applied to Cyber Security Anomaly Detection Problems
Cyber security is an important concern for all individuals, organisations and
governments globally. Cyber attacks have become more sophisticated, frequent
and dangerous than ever, and traditional anomaly detection methods have been
proved to be less effective when dealing with these new classes of cyber
threats. In order to address this, both classical and Bayesian models offer a
valid and innovative alternative to the traditional signature-based methods,
motivating the increasing interest in statistical research that it has been
observed in recent years. In this review we provide a description of some
typical cyber security challenges, typical types of data and statistical
methods, paying special attention to Bayesian approaches for these problems