99 research outputs found

    Detecting possibly frequent change-points: Wild Binary Segmentation 2 and steepest-drop model selection—rejoinder

    Get PDF
    Many existing procedures for detecting multiple change-points in data sequences fail in frequent-change-point scenarios. This article proposes a new change-point detection methodology designed to work well in both infrequent and frequent change-point settings. It is made up of two ingredients: one is “Wild Binary Segmentation 2” (WBS2), a recursive algorithm for producing what we call a ‘complete’ solution path to the change-point detection problem, i.e. a sequence of estimated nested models containing 0 , … , T- 1 change-points, where T is the data length. The other ingredient is a new model selection procedure, referred to as “Steepest Drop to Low Levels” (SDLL). The SDLL criterion acts on the WBS2 solution path, and, unlike many existing model selection procedures for change-point problems, it is not penalty-based, and only uses thresholding as a certain discrete secondary check. The resulting WBS2.SDLL procedure, combining both ingredients, is shown to be consistent, and to significantly outperform the competition in the frequent change-point scenarios tested. WBS2.SDLL is fast, easy to code and does not require the choice of a window or span parameter

    Robust Narrowest Significance Pursuit: Inference for multiple change-points in the median

    Full text link
    We propose Robust Narrowest Significance Pursuit (RNSP), a methodology for detecting localized regions in data sequences which each must contain a change-point in the median, at a prescribed global significance level. RNSP works by fitting the postulated constant model over many regions of the data using a new sign-multiresolution sup-norm-type loss, and greedily identifying the shortest intervals on which the constancy is significantly violated. By working with the signs of the data around fitted model candidates, RNSP fulfils its coverage promises under minimal assumptions, requiring only sign-symmetry and serial independence of the signs of the true residuals. In particular, it permits their heterogeneity and arbitrarily heavy tails. The intervals of significance returned by RNSP have a finite-sample character, are unconditional in nature and do not rely on any assumptions on the true signal. Code implementing RNSP is available at https://github.com/pfryz/nsp

    Tail-greedy bottom-up data decompositions and fast mulitple change-point detection

    Get PDF
    This article proposes a ‘tail-greedy’, bottom-up transform for one-dimensional data, which results in a nonlinear but conditionally orthonormal, multiscale decomposition of the data with respect to an adaptively chosen Unbalanced Haar wavelet basis. The ‘tail-greediness’of the decomposition algorithm, whereby multiple greedy steps are taken in a single pass through the data, both enables fast computation and makes the algorithm applicable in the problem of consistent estimation of the number and locations of multiple changepoints in data. The resulting agglomerative change-point detection method avoids the disadvantages of the classical divisive binary segmentation, and offers very good practical performance. It is implemented in the R package breakfast, available from CRAN

    Narrowest Significance Pursuit: inference for multiple change-points in linear models

    Full text link
    We propose Narrowest Significance Pursuit (NSP), a general and flexible methodology for automatically detecting localised regions in data sequences which each must contain a change-point, at a prescribed global significance level. Here, change-points are understood as abrupt changes in the parameters of an underlying linear model. NSP works by fitting the postulated linear model over many regions of the data, using a certain multiresolution sup-norm loss, and identifying the shortest interval on which the linearity is significantly violated. The procedure then continues recursively to the left and to the right until no further intervals of significance can be found. The use of the multiresolution sup-norm loss is a key feature of NSP, as it enables the transfer of significance considerations to the domain of the unobserved true residuals, a substantial simplification. It also guarantees important stochastic bounds which directly yield exact desired coverage probabilities, regardless of the form or number of the regressors. NSP works with a wide range of distributional assumptions on the errors, including Gaussian with known or unknown variance, some light-tailed distributions, and some heavy-tailed, possibly heterogeneous distributions via self-normalisation. It also works in the presence of autoregression. The mathematics of NSP is, by construction, uncomplicated, and its key computational component uses simple linear programming. In contrast to the widely studied "post-selection inference" approach, NSP enables the opposite viewpoint and paves the way for the concept of "post-inference selection". Pre-CRAN R code implementing NSP is available at https://github.com/pfryz/nsp

    NOVELIST estimator of large correlation and covariance matrices and their inverses

    Get PDF
    We propose a “NOVEL Integration of the Sample and Thresholded covariance estimators” (NOVELIST) to estimate the large covariance (correlation) and precision matrix. NOVELIST performs shrinkage of the sample covariance (correlation) towards its thresholded version. The sample covariance (correlation) component is non-sparse and can be low-rank in high dimensions. The thresholded sample covariance (correlation) component is sparse, and its addition ensures the stable invertibility of NOVELIST. The benefits of the NOVELIST estimator include simplicity, ease of implementation, computational efficiency and the fact that its application avoids eigenanalysis. We obtain an explicit convergence rate in the operator norm over a large class of covariance (correlation) matrices when the dimension p and the sample size n satisfy log p=n ! 0, and its improved version when p=n ! 0. In empirical comparisons with several popular estimators, the NOVELIST estimator performs well in estimating covariance and precision matrices over a wide range of models and sparsity classes. Real data applications are presented

    Multiple change point detection under serial dependence:Wild contrast maximisation and gappy Schwarz algorithm

    Get PDF
    We propose a methodology for detecting multiple change points in the mean of an otherwise stationary, autocorrelated, linear time series. It combines solution path generation based on the wild contrast maximisation principle, and an information criterion-based model selection strategy termed gappy Schwarz algorithm. The former is well-suited to separating shifts in the mean from fluctuations due to serial correlations, while the latter simultaneously estimates the dependence structure and the number of change points without performing the difficult task of estimating the level of the noise as quantified e.g.\ by the long-run variance. We provide modular investigation into their theoretical properties and show that the combined methodology, named WCM.gSa, achieves consistency in estimating both the total number and the locations of the change points. The good performance of WCM.gSa is demonstrated via extensive simulation studies, and we further illustrate its usefulness by applying the methodology to London air quality data

    SHAH: SHape-Adaptive Haar wavelets for image processing

    Get PDF
    We propose the SHAH (SHape-Adaptive Haar) transform for images, which results in an orthonormal, adaptive decomposition of the image into Haar-wavelet-like components, arranged hierarchically according to decreasing importance, whose shapes reflect the features present in the image. The decomposition is as sparse as it can be for piecewise-constant images. It is performed via an stepwise bottom-up algorithm with quadratic computational complexity; however, nearly-linear variants also exist. SHAH is rapidly invertible. We show how to use SHAH for image denoising. Having performed the SHAH transform, the coefficients are hard- or soft-thresholded, and the inverse transform taken. The SHAH image denoising algorithm compares favourably to the state of the art for piecewise-constant images. A clear asset of the methodology is its very general scope: it can be used with any images or more generally with any data that can be represented as graphs or networks

    Detecting multiple generalized change-points by isolating single ones

    Get PDF
    We introduce a new approach, called Isolate-Detect (ID), for the consistent estimation of the number and location of multiple generalized change-points in noisy data sequences. Examples of signal changes that ID can deal with are changes in the mean of a piecewise-constant signal and changes, continuous or not, in the linear trend. The number of change-points can increase with the sample size. Our method is based on an isolation technique, which prevents the consideration of intervals that contain more than one change-point. This isolation enhances ID’s accuracy as it allows for detection in the presence of frequent changes of possibly small magnitudes. In ID, model selection is carried out via thresholding, or an information criterion, or SDLL, or a hybrid involving the former two. The hybrid model selection leads to a general method with very good practical performance and minimal parameter choice. In the scenarios tested, ID is at least as accurate as the state-of-the-art methods; most of the times it outperforms them. ID is implemented in the R packages IDetect and breakfast, available from CRAN

    Detecting linear trend changes in data sequences

    Get PDF
    We propose TrendSegment, a methodology for detecting multiple change-points corresponding to linear trend changes in one dimensional data. A core ingredient of TrendSegment is a new Tail-Greedy Unbalanced Wavelet transform: a conditionally orthonormal, bottom-up transformation of the data through an adaptively constructed unbalanced wavelet basis, which results in a sparse representation of the data. Due to its bottom-up nature, this multiscale decomposition focuses on local features in its early stages and on global features next which enables the detection of both long and short linear trend segments at once. To reduce the computational complexity, the proposed method merges multiple regions in a single pass over the data. We show the consistency of the estimated number and locations of change-points. The practicality of our approach is demonstrated through simulations and two real data examples, involving Iceland temperature data and sea ice extent of the Arctic and the Antarctic. Our methodology is implemented in the R package trendsegmentR, available from CRAN
    corecore