Search CORE

8,739 research outputs found

A Better Alternative to Piecewise Linear Time Series Segmentation

Author: Lemire Daniel
Publication venue
Publication date: 01/01/2005
Field of study

Time series are difficult to monitor, summarize and predict. Segmentation organizes time series into few intervals having uniform characteristics (flatness, linearity, modality, monotonicity and so on). For scalability, we require fast linear time algorithms. The popular piecewise linear model can determine where the data goes up or down and at what rate. Unfortunately, when the data does not follow a linear model, the computation of the local slope creates overfitting. We propose an adaptive time series model where the polynomial degree of each interval vary (constant, linear and so on). Given a number of regressors, the cost of each interval is its polynomial degree: constant intervals cost 1 regressor, linear intervals cost 2 regressors, and so on. Our goal is to minimize the Euclidean (l_2) error for a given model complexity. Experimentally, we investigate the model where intervals can be either constant or linear. Over synthetic random walks, historical stock market prices, and electrocardiograms, the adaptive model provides a more accurate segmentation than the piecewise linear model without increasing the cross-validation error or the running time, while providing a richer vocabulary to applications. Implementation issues, such as numerical stability and real-world performance, are discussed.Comment: to appear in SIAM Data Mining 200

arXiv.org e-Print Archive

CiteSeerX

R-libre

Archipel - Université du Québec à Montréal

Exploratory Analysis of Functional Data via Clustering and Optimal Segmentation

Author: Abraham
Auger
Bellman
Bernard Hugueney
Coifman
Cottrell
Cottrell
Devroye
Fabrice Rossi
Frappart
Georges Hébrail
Healey
Huber
Jackson
Krier
Lin
Picard
Ramsay
Rose
Rossi
Saito
Stone
Tikhonov
Yves Lechevallier
Publication venue: 'Elsevier BV'
Publication date: 01/01/2010
Field of study

We propose in this paper an exploratory analysis algorithm for functional data. The method partitions a set of functions into

K

clusters and represents each cluster by a simple prototype (e.g., piecewise constant). The total number of segments in the prototypes,

P

, is chosen by the user and optimally distributed among the clusters via two dynamic programming algorithms. The practical relevance of the method is shown on two real world datasets

arXiv.org e-Print Archive

Base de publications de l'université Paris-Dauphine

Crossref

INRIA a CCSD electronic archive server

An Optimal Linear Time Algorithm for Quasi-Monotonic Segmentation

Author: Bingham E.
Brooks M.
Daniel Lemire
Edelsbrunner H.
Fitzgerald W.
Goldberger A. L.
Haiminen N.
Han J.
Lemire D.
Lemire D.
Lemire D.
Martin Brooks
Ramsay J. O.
Yuhong Yan
Publication venue: 'Informa UK Limited'
Publication date: 23/02/2007
Field of study

Monotonicity is a simple yet significant qualitative characteristic. We consider the problem of segmenting a sequence in up to K segments. We want segments to be as monotonic as possible and to alternate signs. We propose a quality metric for this problem using the l_inf norm, and we present an optimal linear time algorithm based on novel formalism. Moreover, given a precomputation in time O(n log n) consisting of a labeling of all extrema, we compute any optimal segmentation in constant time. We compare experimentally its performance to two piecewise linear segmentation heuristics (top-down and bottom-up). We show that our algorithm is faster and more accurate. Applications include pattern recognition and qualitative modeling.Comment: This is the extended version of our ICDM'05 paper (arXiv:cs/0702142

arXiv.org e-Print Archive

R-libre

NRC Publications Archive

Crossref

Studies in Astronomical Time Series Analysis. VI. Bayesian Block Representations

Author: Arias-Castro
Band
Brad Jackson
Capra
Claerbout
Coram
Coram
Diaconis
Diaconis
Donoho
Donoho
Donoho
Du
Efron
Fenimore
Gelman
Hogg
Horvath
Jackson
James Chiang
Jay P. Norris
Jeffrey D. Scargle
McLean
Norris
Norris
Papoulis
Prahl
Qin
Scargle
Scargle
Schmidt
Tompkins
Tong
Way
Xie
Publication venue: 'IOP Publishing'
Publication date: 06/08/2012
Field of study

This paper addresses the problem of detecting and characterizing local variability in time series and other forms of sequential data. The goal is to identify and characterize statistically significant variations, at the same time suppressing the inevitable corrupting observational errors. We present a simple nonparametric modeling technique and an algorithm implementing it - an improved and generalized version of Bayesian Blocks (Scargle 1998) - that finds the optimal segmentation of the data in the observation interval. The structure of the algorithm allows it to be used in either a real-time trigger mode, or a retrospective mode. Maximum likelihood or marginal posterior functions to measure model fitness are presented for events, binned counts, and measurements at arbitrary times with known error distributions. Problems addressed include those connected with data gaps, variable exposure, extension to piecewise linear and piecewise exponential representations, multi-variate time series data, analysis of variance, data on the circle, other data modes, and dispersed data. Simulations provide evidence that the detection efficiency for weak signals is close to a theoretical asymptotic limit derived by (Arias-Castro, Donoho and Huo 2003). In the spirit of Reproducible Research (Donoho et al. 2008) all of the code and data necessary to reproduce all of the figures in this paper are included as auxiliary material.Comment: Added some missing script files and updated other ancillary data (code and data files). To be submitted to the Astophysical Journa

arXiv.org e-Print Archive

Crossref

Boise State University - ScholarWorks

NASA Technical Reports Server