9,238 research outputs found
Raising the ClaSS of Streaming Time Series Segmentation
Ubiquitous sensors today emit high frequency streams of numerical
measurements that reflect properties of human, animal, industrial, commercial,
and natural processes. Shifts in such processes, e.g. caused by external events
or internal state changes, manifest as changes in the recorded signals. The
task of streaming time series segmentation (STSS) is to partition the stream
into consecutive variable-sized segments that correspond to states of the
observed processes or entities. The partition operation itself must in
performance be able to cope with the input frequency of the signals. We
introduce ClaSS, a novel, efficient, and highly accurate algorithm for STSS.
ClaSS assesses the homogeneity of potential partitions using self-supervised
time series classification and applies statistical tests to detect significant
change points (CPs). In our experimental evaluation using two large benchmarks
and six real-world data archives, we found ClaSS to be significantly more
precise than eight state-of-the-art competitors. Its space and time complexity
is independent of segment sizes and linear only in the sliding window size. We
also provide ClaSS as a window operator with an average throughput of 538 data
points per second for the Apache Flink streaming engine
A Better Alternative to Piecewise Linear Time Series Segmentation
Time series are difficult to monitor, summarize and predict. Segmentation
organizes time series into few intervals having uniform characteristics
(flatness, linearity, modality, monotonicity and so on). For scalability, we
require fast linear time algorithms. The popular piecewise linear model can
determine where the data goes up or down and at what rate. Unfortunately, when
the data does not follow a linear model, the computation of the local slope
creates overfitting. We propose an adaptive time series model where the
polynomial degree of each interval vary (constant, linear and so on). Given a
number of regressors, the cost of each interval is its polynomial degree:
constant intervals cost 1 regressor, linear intervals cost 2 regressors, and so
on. Our goal is to minimize the Euclidean (l_2) error for a given model
complexity. Experimentally, we investigate the model where intervals can be
either constant or linear. Over synthetic random walks, historical stock market
prices, and electrocardiograms, the adaptive model provides a more accurate
segmentation than the piecewise linear model without increasing the
cross-validation error or the running time, while providing a richer vocabulary
to applications. Implementation issues, such as numerical stability and
real-world performance, are discussed.Comment: to appear in SIAM Data Mining 200
- …