9 research outputs found
A linear time method for the detection of point and collective anomalies
The challenge of efficiently identifying anomalies in data sequences is an important statistical problem that now arises in many applications. Whilst there has been substantial work aimed at making statistical analyses robust to outliers, or point anomalies, there has been much less work on detecting anomalous segments, or collective anomalies. By bringing together ideas from changepoint detection and robust statistics, we introduce Collective And Point Anomalies (CAPA), a computationally efficient approach that is suitable when collective anomalies are characterised by either a change in mean, variance, or both, and distinguishes them from point anomalies. Theoretical results establish the consistency of CAPA at detecting collective anomalies and empirical results show that CAPA has close to linear computational cost as well as being more accurate at detecting and locating collective anomalies than other approaches. We demonstrate the utility of CAPA through its ability to detect exoplanets from light curve data from the Kepler telescope
gfpop: an R Package for Univariate Graph-Constrained Change-point Detection
In a world with data that change rapidly and abruptly, it is important to
detect those changes accurately. In this paper we describe an R package
implementing an algorithm recently proposed by Hocking et al. [2017] for
penalised maximum likelihood inference of constrained multiple change-point
models. This algorithm can be used to pinpoint the precise locations of abrupt
changes in large data sequences. There are many application domains for such
models, such as medicine, neuroscience or genomics. Often, practitioners have
prior knowledge about the changes they are looking for. For example in genomic
data, biologists sometimes expect peaks: up changes followed by down changes.
Taking advantage of such prior information can substantially improve the
accuracy with which we can detect and estimate changes. Hocking et al. [2017]
described a graph framework to encode many examples of such prior information
and a generic algorithm to infer the optimal model parameters, but implemented
the algorithm for just a single scenario. We present the gfpop package that
implements the algorithm in a generic manner in R/C++. gfpop works for a
user-defined graph that can encode the prior nformation of the types of change
and implements several loss functions (Gauss, Poisson, Binomial, Biweight and
Huber). We then illustrate the use of gfpop on isotonic simulations and several
applications in biology. For a number of graphs the algorithm runs in a matter
of seconds or minutes for 10^5 datapoints
Two-stage data segmentation permitting multiscale change points, heavy tails and dependence
The segmentation of a time series into piecewise stationary segments, a.k.a.
multiple change point analysis, is an important problem both in time series
analysis and signal processing. In the presence of multiscale change points
with both large jumps over short intervals and small changes over long
stationary intervals, multiscale methods achieve good adaptivity in their
localisation but at the same time, require the removal of false positives and
duplicate estimators via a model selection step. In this paper, we propose a
localised application of Schwarz information criterion which, as a generic
methodology, is applicable with any multiscale candidate generating procedure
fulfilling mild assumptions. We establish the theoretical consistency of the
proposed localised pruning method in estimating the number and locations of
multiple change points under general assumptions permitting heavy tails and
dependence. Further, we show that combined with a MOSUM-based candidate
generating procedure, it attains minimax optimality in terms of detection lower
bound and localisation for i.i.d. sub-Gaussian errors. A careful comparison
with the existing methods by means of (a) theoretical properties such as
generality, optimality and algorithmic complexity, (b) performance on simulated
datasets and run time, as well as (c) performance on real data applications,
confirm the overall competitiveness of the proposed methodology
Real-Time Detection of Demand Manipulation Attacks on a Power Grid
An increased usage in IoT devices across the globe has posed a threat to the power grid. When an attacker has access to multiple IoT devices within the same geographical location, they can possibly disrupt the power grid by regulating a botnet of high-wattage IoT devices. Based on the time and situation of the attack, an adversary needs access to a fixed number of IoT devices to synchronously switch on/off all of them, resulting in an imbalance between the supply and demand. When the frequency of the power generators drops below a threshold value, it can lead to the generators tripping and potentially failing. Attacks such as these can cause an imbalance in the grid frequency, line failures and cascades, can disrupt a black start or increase the operating cost. The challenge lies in early detection of abnormal demand peaks in a large section of the power grid from the power operator’s side, as it only takes seconds to cause a generator failure before any action could be taken.
Anomaly detection comes handy to flag the power operator of an anomalous behavior while such an attack is taking place. However, it is difficult to detect anomalies especially when such attacks are taking place obscurely and for prolonged time periods. With this motive, we compare different anomaly detection systems in terms of detecting these anomalies collectively. We generate attack data using real-world power consumption data across multiple apartments to assess the performance of various prediction-based detection techniques as well as commercial detection applications and observe the cases when the attacks were not detected. Using static thresholds for the detection process does not reliably detect attacks when they are performed in different times of the year and also lets the attacker exploit the system to create the attack obscurely. To combat the effects of using static thresholds, we propose a novel dynamic thresholding mechanism, which improves the attack detection reaching up to 100% detection rate, when used with prediction-based anomaly score techniques