2 research outputs found
Detecting Anomalies in Sequential Data with Higher-order Networks
A major branch of anomaly detection methods relies on dynamic networks: raw
sequence data is first converted to a series of networks, then critical change
points are identified in the evolving network structure. However, existing
approaches use first-order networks (FONs) to represent the underlying raw
data, which may lose important higher-order sequence patterns, making
higher-order anomalies undetectable in subsequent analysis. We present a novel
higher-order anomaly detection method that is both parameter-free and scalable,
building on an improved higher-order network (HON) construction algorithm. We
show the proposed higher-order anomaly detection algorithm is effective in
discovering variable orders of anomalies. Our data includes a synthetic 11
billion web clickstreams and a real-world taxi trajectory data
Learning the Markov order of paths in a network
We study the problem of learning the Markov order in categorical sequences
that represent paths in a network, i.e. sequences of variable lengths where
transitions between states are constrained to a known graph. Such data pose
challenges for standard Markov order detection methods and demand modelling
techniques that explicitly account for the graph constraint. Adopting a
multi-order modelling framework for paths, we develop a Bayesian learning
technique that (i) more reliably detects the correct Markov order compared to a
competing method based on the likelihood ratio test, (ii) requires considerably
less data compared to methods using AIC or BIC, and (iii) is robust against
partial knowledge of the underlying constraints. We further show that a
recently published method that uses a likelihood ratio test has a tendency to
overfit the true Markov order of paths, which is not the case for our Bayesian
technique. Our method is important for data scientists analyzing patterns in
categorical sequence data that are subject to (partially) known constraints,
e.g. sequences with forbidden words, mobility trajectories and click stream
data, or sequence data in bioinformatics. Addressing the key challenge of model
selection, our work is further relevant for the growing body of research that
emphasizes the need for higher-order models in network analysis.Comment: 15 pages, 7 figure