8,739 research outputs found
A Better Alternative to Piecewise Linear Time Series Segmentation
Time series are difficult to monitor, summarize and predict. Segmentation
organizes time series into few intervals having uniform characteristics
(flatness, linearity, modality, monotonicity and so on). For scalability, we
require fast linear time algorithms. The popular piecewise linear model can
determine where the data goes up or down and at what rate. Unfortunately, when
the data does not follow a linear model, the computation of the local slope
creates overfitting. We propose an adaptive time series model where the
polynomial degree of each interval vary (constant, linear and so on). Given a
number of regressors, the cost of each interval is its polynomial degree:
constant intervals cost 1 regressor, linear intervals cost 2 regressors, and so
on. Our goal is to minimize the Euclidean (l_2) error for a given model
complexity. Experimentally, we investigate the model where intervals can be
either constant or linear. Over synthetic random walks, historical stock market
prices, and electrocardiograms, the adaptive model provides a more accurate
segmentation than the piecewise linear model without increasing the
cross-validation error or the running time, while providing a richer vocabulary
to applications. Implementation issues, such as numerical stability and
real-world performance, are discussed.Comment: to appear in SIAM Data Mining 200
Exploratory Analysis of Functional Data via Clustering and Optimal Segmentation
We propose in this paper an exploratory analysis algorithm for functional
data. The method partitions a set of functions into clusters and represents
each cluster by a simple prototype (e.g., piecewise constant). The total number
of segments in the prototypes, , is chosen by the user and optimally
distributed among the clusters via two dynamic programming algorithms. The
practical relevance of the method is shown on two real world datasets
An Optimal Linear Time Algorithm for Quasi-Monotonic Segmentation
Monotonicity is a simple yet significant qualitative characteristic. We
consider the problem of segmenting a sequence in up to K segments. We want
segments to be as monotonic as possible and to alternate signs. We propose a
quality metric for this problem using the l_inf norm, and we present an optimal
linear time algorithm based on novel formalism. Moreover, given a
precomputation in time O(n log n) consisting of a labeling of all extrema, we
compute any optimal segmentation in constant time. We compare experimentally
its performance to two piecewise linear segmentation heuristics (top-down and
bottom-up). We show that our algorithm is faster and more accurate.
Applications include pattern recognition and qualitative modeling.Comment: This is the extended version of our ICDM'05 paper (arXiv:cs/0702142
Studies in Astronomical Time Series Analysis. VI. Bayesian Block Representations
This paper addresses the problem of detecting and characterizing local
variability in time series and other forms of sequential data. The goal is to
identify and characterize statistically significant variations, at the same
time suppressing the inevitable corrupting observational errors. We present a
simple nonparametric modeling technique and an algorithm implementing it - an
improved and generalized version of Bayesian Blocks (Scargle 1998) - that finds
the optimal segmentation of the data in the observation interval. The structure
of the algorithm allows it to be used in either a real-time trigger mode, or a
retrospective mode. Maximum likelihood or marginal posterior functions to
measure model fitness are presented for events, binned counts, and measurements
at arbitrary times with known error distributions. Problems addressed include
those connected with data gaps, variable exposure, extension to piecewise
linear and piecewise exponential representations, multi-variate time series
data, analysis of variance, data on the circle, other data modes, and dispersed
data. Simulations provide evidence that the detection efficiency for weak
signals is close to a theoretical asymptotic limit derived by (Arias-Castro,
Donoho and Huo 2003). In the spirit of Reproducible Research (Donoho et al.
2008) all of the code and data necessary to reproduce all of the figures in
this paper are included as auxiliary material.Comment: Added some missing script files and updated other ancillary data
(code and data files). To be submitted to the Astophysical Journa
- …