99 research outputs found
Detecting possibly frequent change-points: Wild Binary Segmentation 2 and steepest-drop model selection—rejoinder
Many existing procedures for detecting multiple change-points in data sequences fail in frequent-change-point scenarios. This article proposes a new change-point detection methodology designed to work well in both infrequent and frequent change-point settings. It is made up of two ingredients: one is “Wild Binary Segmentation 2” (WBS2), a recursive algorithm for producing what we call a ‘complete’ solution path to the change-point detection problem, i.e. a sequence of estimated nested models containing 0 , … , T- 1 change-points, where T is the data length. The other ingredient is a new model selection procedure, referred to as “Steepest Drop to Low Levels” (SDLL). The SDLL criterion acts on the WBS2 solution path, and, unlike many existing model selection procedures for change-point problems, it is not penalty-based, and only uses thresholding as a certain discrete secondary check. The resulting WBS2.SDLL procedure, combining both ingredients, is shown to be consistent, and to significantly outperform the competition in the frequent change-point scenarios tested. WBS2.SDLL is fast, easy to code and does not require the choice of a window or span parameter
Robust Narrowest Significance Pursuit: Inference for multiple change-points in the median
We propose Robust Narrowest Significance Pursuit (RNSP), a methodology for
detecting localized regions in data sequences which each must contain a
change-point in the median, at a prescribed global significance level. RNSP
works by fitting the postulated constant model over many regions of the data
using a new sign-multiresolution sup-norm-type loss, and greedily identifying
the shortest intervals on which the constancy is significantly violated. By
working with the signs of the data around fitted model candidates, RNSP fulfils
its coverage promises under minimal assumptions, requiring only sign-symmetry
and serial independence of the signs of the true residuals. In particular, it
permits their heterogeneity and arbitrarily heavy tails. The intervals of
significance returned by RNSP have a finite-sample character, are unconditional
in nature and do not rely on any assumptions on the true signal. Code
implementing RNSP is available at https://github.com/pfryz/nsp
Tail-greedy bottom-up data decompositions and fast mulitple change-point detection
This article proposes a ‘tail-greedy’, bottom-up transform for one-dimensional data, which results in a nonlinear but conditionally orthonormal, multiscale decomposition of the data with respect to an adaptively chosen Unbalanced Haar wavelet basis. The ‘tail-greediness’of the decomposition algorithm, whereby multiple greedy steps are taken in a single pass through the data, both enables fast computation and makes the algorithm applicable in the problem of consistent estimation of the number and locations of multiple changepoints in data. The resulting agglomerative change-point detection method avoids the disadvantages of the classical divisive binary segmentation, and offers very good practical performance. It is implemented in the R package breakfast, available from CRAN
Narrowest Significance Pursuit: inference for multiple change-points in linear models
We propose Narrowest Significance Pursuit (NSP), a general and flexible
methodology for automatically detecting localised regions in data sequences
which each must contain a change-point, at a prescribed global significance
level. Here, change-points are understood as abrupt changes in the parameters
of an underlying linear model. NSP works by fitting the postulated linear model
over many regions of the data, using a certain multiresolution sup-norm loss,
and identifying the shortest interval on which the linearity is significantly
violated. The procedure then continues recursively to the left and to the right
until no further intervals of significance can be found. The use of the
multiresolution sup-norm loss is a key feature of NSP, as it enables the
transfer of significance considerations to the domain of the unobserved true
residuals, a substantial simplification. It also guarantees important
stochastic bounds which directly yield exact desired coverage probabilities,
regardless of the form or number of the regressors.
NSP works with a wide range of distributional assumptions on the errors,
including Gaussian with known or unknown variance, some light-tailed
distributions, and some heavy-tailed, possibly heterogeneous distributions via
self-normalisation. It also works in the presence of autoregression. The
mathematics of NSP is, by construction, uncomplicated, and its key
computational component uses simple linear programming. In contrast to the
widely studied "post-selection inference" approach, NSP enables the opposite
viewpoint and paves the way for the concept of "post-inference selection".
Pre-CRAN R code implementing NSP is available at https://github.com/pfryz/nsp
NOVELIST estimator of large correlation and covariance matrices and their inverses
We propose a “NOVEL Integration of the Sample and Thresholded covariance estimators” (NOVELIST) to estimate the large covariance (correlation) and precision matrix. NOVELIST performs shrinkage of the sample covariance (correlation) towards its thresholded version. The sample covariance (correlation) component is non-sparse and can be low-rank in high dimensions. The thresholded sample covariance (correlation) component is sparse, and its addition ensures the stable invertibility of NOVELIST. The benefits of the NOVELIST estimator include simplicity, ease of implementation, computational efficiency and the fact that its application avoids eigenanalysis. We obtain an explicit convergence rate in the operator norm over a large class of covariance (correlation) matrices when the dimension p and the sample size n satisfy log p=n ! 0, and its improved version when p=n ! 0. In empirical comparisons with several popular estimators, the NOVELIST estimator performs well in estimating covariance and precision matrices over a wide range of models and sparsity classes. Real data applications are presented
Multiple change point detection under serial dependence:Wild contrast maximisation and gappy Schwarz algorithm
We propose a methodology for detecting multiple change points in the mean of
an otherwise stationary, autocorrelated, linear time series. It combines
solution path generation based on the wild contrast maximisation principle, and
an information criterion-based model selection strategy termed gappy Schwarz
algorithm. The former is well-suited to separating shifts in the mean from
fluctuations due to serial correlations, while the latter simultaneously
estimates the dependence structure and the number of change points without
performing the difficult task of estimating the level of the noise as
quantified e.g.\ by the long-run variance. We provide modular investigation
into their theoretical properties and show that the combined methodology, named
WCM.gSa, achieves consistency in estimating both the total number and the
locations of the change points. The good performance of WCM.gSa is demonstrated
via extensive simulation studies, and we further illustrate its usefulness by
applying the methodology to London air quality data
SHAH: SHape-Adaptive Haar wavelets for image processing
We propose the SHAH (SHape-Adaptive Haar) transform for images, which results in an orthonormal, adaptive decomposition of the image into Haar-wavelet-like components, arranged hierarchically according to decreasing importance, whose shapes reflect the features present in the image. The decomposition is as sparse as it can be for piecewise-constant images. It is performed via an stepwise bottom-up algorithm with quadratic computational complexity; however, nearly-linear variants also exist. SHAH is rapidly invertible. We show how to use SHAH for image denoising. Having performed the SHAH transform, the coefficients are hard- or soft-thresholded, and the inverse transform taken. The SHAH image denoising algorithm compares favourably to the state of the art for piecewise-constant images. A clear asset of the methodology is its very general scope: it can be used with any images or more generally with any data that can be represented as graphs or networks
Detecting multiple generalized change-points by isolating single ones
We introduce a new approach, called Isolate-Detect (ID), for the consistent estimation of the number and location of multiple generalized change-points in noisy data sequences. Examples of signal changes that ID can deal with are changes in the mean of a piecewise-constant signal and changes, continuous or not, in the linear trend. The number of change-points can increase with the sample size. Our method is based on an isolation technique, which prevents the consideration of intervals that contain more than one change-point. This isolation enhances ID’s accuracy as it allows for detection in the presence of frequent changes of possibly small magnitudes. In ID, model selection is carried out via thresholding, or an information criterion, or SDLL, or a hybrid involving the former two. The hybrid model selection leads to a general method with very good practical performance and minimal parameter choice. In the scenarios tested, ID is at least as accurate as the state-of-the-art methods; most of the times it outperforms them. ID is implemented in the R packages IDetect and breakfast, available from CRAN
Detecting linear trend changes in data sequences
We propose TrendSegment, a methodology for detecting multiple change-points corresponding to linear trend changes in one dimensional data. A core ingredient of TrendSegment is a new Tail-Greedy Unbalanced Wavelet transform: a conditionally orthonormal, bottom-up transformation of the data through an adaptively constructed unbalanced wavelet basis, which results in a sparse representation of the data. Due to its bottom-up nature, this multiscale decomposition focuses on local features in its early stages and on global features next which enables the detection of both long and short linear trend segments at once. To reduce the computational complexity, the proposed method merges multiple regions in a single pass over the data. We show the consistency of the estimated number and locations of change-points. The practicality of our approach is demonstrated through simulations and two real data examples, involving Iceland temperature data and sea ice extent of the Arctic and the Antarctic. Our methodology is implemented in the R package trendsegmentR, available from CRAN
- …