91,860 research outputs found
Change detection in categorical evolving data streams
Detecting change in evolving data streams is a central issue for accurate adaptive learning. In real world applications, data streams have categorical features, and changes induced in the data distribution of these categorical features have not been considered extensively so far. Previous work on change detection focused on detecting changes in the accuracy of the learners, but without considering changes in the data distribution.
To cope with these issues, we propose a new unsupervised change detection method, called CDCStream (Change Detection in Categorical Data Streams), well suited for categorical data streams. The proposed method is able to detect changes in a batch incremental scenario. It is based on the two following characteristics: (i) a summarization strategy is proposed to compress the actual batch by extracting a descriptive summary and (ii) a new segmentation algorithm is proposed to highlight changes and issue warnings for a data stream. To evaluate our proposal we employ it in a learning task over real world data and we compare its results with state of the art methods. We also report qualitative evaluation in order to show the behavior of CDCStream
Incremental Principal Component Analysis Based Outliers Detection Methods for Spatiotemporal Data Streams
In this paper, we address outliers in spatiotemporal data streams obtained from sensors placed across geographically distributed locations. Outliers may appear in such sensor data due to various reasons such as instrumental error and environmental change. Real-time detection of these outliers is essential to prevent propagation of errors in subsequent analyses and results. Incremental Principal Component Analysis (IPCA) is one possible approach for detecting outliers in such type of spatiotemporal data streams. IPCA has been widely used in many real-time applications such as credit card fraud detection, pattern recognition, and image analysis. However, the suitability of applying IPCA for outlier detection in spatiotemporal data streams is unknown and needs to be investigated. To fill this research gap, this paper contributes by presenting two new IPCA-based outlier detection methods and performing a comparative analysis with the existing IPCA-based outlier detection methods to assess their suitability for spatiotemporal sensor data streams
Temporally adaptive monitoring procedures with applications in enterprise cyber-security
Due to the perpetual threat of cyber-attacks, enterprises must employ and develop new methods of detection as attack vectors evolve and advance. Enterprise computer networks produce a large volume and variety of data including univariate data streams, time series and network graph streams. Motivated by cyber-security, this thesis develops adaptive monitoring tools for univariate and network graph data streams, however, they are not limited to this domain.
In all domains, real data streams present several challenges for monitoring including trend, periodicity and change points. Streams often also have high volume and frequency. To deal with the non-stationarity in the data, the methods applied must be adaptive. Adaptability in the proposed procedures throughout the thesis is introduced using forgetting factors, weighting the data accordingly to recency. Secondly, methods applied must be computationally fast with a small or fixed computation burden and fixed storage requirements for timely processing. Throughout this thesis, sequential or sliding window approaches are employed to achieve this.
The first part of the thesis is centred around univariate monitoring procedures. A sequential adaptive parameter estimator is proposed using a Bayesian framework. This procedure is then extended for multiple change point detection, where, unlike existing change point procedures, the proposed method is capable of detecting abrupt changes in the presence of trend. We additionally present a time series model which combines short-term and long-term behaviours of a series for improved anomaly detection. Unlike existing methods which primarily focus on point anomalies detection (extreme outliers), our method is capable of also detecting contextual anomalies, when the data deviates from persistent patterns of the series such as seasonality.
Finally, a novel multi-type relational clustering methodology is proposed. As multiple relations exist between the different entities within a network (computers, users and ports), multiple network graphs can be generated. We propose simultaneously clustering over all graphs to produce a single clustering for each entity using Non-Negative Matrix Tri-Factorisation. Through simplifications, the proposed procedure is fast and scalable for large network graphs. Additionally, this methodology is extended for graph streams.
This thesis provides an assortment of tools for enterprise network monitoring with a focus on adaptability and scalability making them suitable for intrusion detection and situational awareness.Open Acces
Detection and localization of change-points in high-dimensional network traffic data
We propose a novel and efficient method, that we shall call TopRank in the
following paper, for detecting change-points in high-dimensional data. This
issue is of growing concern to the network security community since network
anomalies such as Denial of Service (DoS) attacks lead to changes in Internet
traffic. Our method consists of a data reduction stage based on record
filtering, followed by a nonparametric change-point detection test based on
-statistics. Using this approach, we can address massive data streams and
perform anomaly detection and localization on the fly. We show how it applies
to some real Internet traffic provided by France-T\'el\'ecom (a French Internet
service provider) in the framework of the ANR-RNRT OSCAR project. This approach
is very attractive since it benefits from a low computational load and is able
to detect and localize several types of network anomalies. We also assess the
performance of the TopRank algorithm using synthetic data and compare it with
alternative approaches based on random aggregation.Comment: Published in at http://dx.doi.org/10.1214/08-AOAS232 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Reservoir of Diverse Adaptive Learners and Stacking Fast Hoeffding Drift Detection Methods for Evolving Data Streams
The last decade has seen a surge of interest in adaptive learning algorithms
for data stream classification, with applications ranging from predicting ozone
level peaks, learning stock market indicators, to detecting computer security
violations. In addition, a number of methods have been developed to detect
concept drifts in these streams. Consider a scenario where we have a number of
classifiers with diverse learning styles and different drift detectors.
Intuitively, the current 'best' (classifier, detector) pair is application
dependent and may change as a result of the stream evolution. Our research
builds on this observation. We introduce the \mbox{Tornado} framework that
implements a reservoir of diverse classifiers, together with a variety of drift
detection algorithms. In our framework, all (classifier, detector) pairs
proceed, in parallel, to construct models against the evolving data streams. At
any point in time, we select the pair which currently yields the best
performance. We further incorporate two novel stacking-based drift detection
methods, namely the \mbox{FHDDMS} and \mbox{FHDDMS}_{add} approaches. The
experimental evaluation confirms that the current 'best' (classifier, detector)
pair is not only heavily dependent on the characteristics of the stream, but
also that this selection evolves as the stream flows. Further, our
\mbox{FHDDMS} variants detect concept drifts accurately in a timely fashion
while outperforming the state-of-the-art.Comment: 42 pages, and 14 figure
Adaptive Bernstein Change Detector for High-Dimensional Data Streams
Change detection is of fundamental importance when analyzing data streams.
Detecting changes both quickly and accurately enables monitoring and prediction
systems to react, e.g., by issuing an alarm or by updating a learning
algorithm. However, detecting changes is challenging when observations are
high-dimensional. In high-dimensional data, change detectors should not only be
able to identify when changes happen, but also in which subspace they occur.
Ideally, one should also quantify how severe they are. Our approach, ABCD, has
these properties. ABCD learns an encoder-decoder model and monitors its
accuracy over a window of adaptive size. ABCD derives a change score based on
Bernstein's inequality to detect deviations in terms of accuracy, which
indicate changes. Our experiments demonstrate that ABCD outperforms its best
competitor by at least 8% and up to 23% in F1-score on average. It can also
accurately estimate changes' subspace, together with a severity measure that
correlates with the ground truth
- …