12 research outputs found

    Sparse modelling and estimation for nonstationary time series and high-dimensional data

    Get PDF
    Sparse modelling has attracted great attention as an efficient way of handling statistical problems in high dimensions. This thesis considers sparse modelling and estimation in a selection of problems such as breakpoint detection in nonstationary time series, nonparametric regression using piecewise constant functions and variable selection in high-dimensional linear regression. We first propose a method for detecting breakpoints in the secondorder structure of piecewise stationary time series, assuming that those structural breakpoints are sufficiently scattered over time. Our choice of time series model is the locally stationary wavelet process (Nason et al., 2000), under which the entire second-order structure of a time series is described by wavelet-based local periodogram sequences. As the initial stage of breakpoint detection, we apply a binary segmentation procedure to wavelet periodogram sequences at each scale separately, which is followed by within-scale and across-scales postprocessing steps. We show that the combined methodology achieves consistent estimation of the breakpoints in terms of their total number and locations, and investigate its practical performance using both simulated and real data. Next, we study the problem of nonparametric regression by means of piecewise constant functions, which are known to be flexible in approximating a wide range of function spaces. Among many approaches developed for this purpose, we focus on comparing two well-performing techniques, the taut string (Davies & Kovac, 2001) and the Unbalanced Haar (Fryzlewicz, 2007) methods. While the multiscale nature of the latter is easily observed, it is not so obvious that the former can also be interpreted as multiscale. We provide a unified, multiscale representation for both methods, which offers an insight into the relationship between them as well as suggesting some lessons that both methods can learn from each other. Lastly, one of the most widely-studied applications of sparse modelling and estimation is considered, variable selection in high-dimensional linear regression. High dimensionality of the data brings in many complications including (possibly spurious) non-negligible correlations among the variables, which may result in marginal correlation being unreliable as a measure of association between the variables and the response. We propose a new way of measuring the contribution of each variable to the response, which adaptively takes into account high correlations among the variables. A key ingredient of the proposed tilting procedure is hard-thresholding sample correlation of the design matrix, which enables a data-driven switch between the use of marginal correlation and tilted correlation for each variable. We study the conditions under which this measure can discriminate between relevant and irrelevant variables, and thus be used as a tool for variable selection. In order to exploit these theoretical properties of tilted correlation, we construct an iterative variable screening algorithm and examine its practical performance in a comparative simulation study

    Randomised and L1-penalty approaches to segmentation in time series and regression models

    Get PDF
    It is a common approach in statistics to assume that the parameters of a stochastic model change. The simplest model involves parameters than can be exactly or approximately piecewise constant. In such a model, the aim is the posteriori detection of the number and location in time of the changes in the parameters. This thesis develops segmentation methods for non-stationary time series and regression models using randomised methods or methods that involve L1 penalties which force the coefficients in a regression model to be exactly zero. Randomised techniques are not commonly found in nonparametric statistics, whereas L1 methods draw heavily from the variable selection literature. Considering these two categories together, apart from other contributions, enables a comparison between them by pointing out strengths and weaknesses. This is achieved by organising the thesis into three main parts. First, we propose a new technique for detecting the number and locations of the change-points in the second-order structure of a time series. The core of the segmentation procedure is the Wild Binary Segmentation method (WBS) of Fryzlewicz (2014), a technique which involves a certain randomised mechanism. The advantage of WBS over the standard Binary Segmentation lies in its localisation feature, thanks to which it works in cases where the spacings between change-points are short. Our main change-point detection statistic is the wavelet periodogram which allows a rigorous estimation of the local autocovariance of a piecewise-stationary process. We provide a proof of consistency and examine the performance of the method on simulated and real data sets. Second, we study the fused lasso estimator which, in its simplest form, deals with the estimation of a piecewise constant function contaminated with Gaussian noise (Friedman et al. (2007)). We show a fast way of implementing the solution path algorithm of Tibshirani and Taylor (2011) and we make a connection between their algorithm and the taut-string method of Davies and Kovac (2001). In addition, a theoretical result and a simulation study indicate that the fused lasso estimator is suboptimal in detecting the location of a change-point. Finally, we propose a method to estimate regression models in which the coefficients vary with respect to some covariate such as time. In particular, we present a path algorithm based on Tibshirani and Taylor (2011) and the fused lasso method of Tibshirani et al. (2005). Thanks to the adaptability of the fused lasso penalty, our proposed method goes beyond the estimation of piecewise constant models to models where the underlying coefficient function can be piecewise linear, quadratic or cubic. Our simulation studies show that in most cases the method outperforms smoothing splines, a common approach in estimating this class of models

    Tail-greedy bottom-up data decompositions and fast mulitple change-point detection

    Get PDF
    This article proposes a ‘tail-greedy’, bottom-up transform for one-dimensional data, which results in a nonlinear but conditionally orthonormal, multiscale decomposition of the data with respect to an adaptively chosen Unbalanced Haar wavelet basis. The ‘tail-greediness’of the decomposition algorithm, whereby multiple greedy steps are taken in a single pass through the data, both enables fast computation and makes the algorithm applicable in the problem of consistent estimation of the number and locations of multiple changepoints in data. The resulting agglomerative change-point detection method avoids the disadvantages of the classical divisive binary segmentation, and offers very good practical performance. It is implemented in the R package breakfast, available from CRAN

    Methods for change-point detection with additional interpretability

    Get PDF
    The main purpose of this dissertation is to introduce and critically assess some novel statistical methods for change-point detection that help better understand the nature of processes underlying observable time series. First, we advocate the use of change-point detection for local trend estimation in financial return data and propose a new approach developed to capture the oscillatory behaviour of financial returns around piecewise-constant trend functions. Core of the method is a data-adaptive hierarchically-ordered basis of Unbalanced Haar vectors which decomposes the piecewise-constant trend underlying observed daily returns into a binary-tree structure of one-step constant functions. We illustrate how this framework can provide a new perspective for the interpretation of change points in financial returns. Moreover, the approach yields a family of forecasting operators for financial return series which can be adjusted flexibly depending on the forecast horizon or the loss function. Second, we discuss change-point detection under model misspecification, focusing in particular on normally distributed data with changing mean and variance. We argue that ignoring the presence of changes in mean or variance when testing for changes in, respectively, variance or mean, can affect the application of statistical methods negatively. After illustrating the difficulties arising from this kind of model misspecification we propose a new method to address these using sequential testing on intervals with varying length and show in a simulation study how this approach compares to competitors in mixed-change situations. The third contribution of this thesis is a data-adaptive procedure to evaluate EEG data, which can improve the understanding of an epileptic seizure recording. This change-point detection method characterizes the evolution of frequencyspecific energy as measured on the human scalp. It provides new insights to this high dimensional high frequency data and has attractive computational and scalability features. In addition to contrasting our method with existing approaches, we analyse and interpret the method’s output in the application to a seizure data set

    Multiscale Change-Point Inference

    Full text link
    We introduce a new estimator SMUCE (simultaneous multiscale change-point estimator) for the change-point problem in exponential family regression. An unknown step function is estimated by minimizing the number of change-points over the acceptance region of a multiscale test at a level \alpha. The probability of overestimating the true number of change-points K is controlled by the asymptotic null distribution of the multiscale test statistic. Further, we derive exponential bounds for the probability of underestimating K. By balancing these quantities, \alpha will be chosen such that the probability of correctly estimating K is maximized. All results are even non-asymptotic for the normal case. Based on the aforementioned bounds, we construct asymptotically honest confidence sets for the unknown step function and its change-points. At the same time, we obtain exponential bounds for estimating the change-point locations which for example yield the minimax rate O(1/n) up to a log term. Finally, SMUCE asymptotically achieves the optimal detection rate of vanishing signals. We illustrate how dynamic programming techniques can be employed for efficient computation of estimators and confidence regions. The performance of the proposed multiscale approach is illustrated by simulations and in two cutting-edge applications from genetic engineering and photoemission spectroscopy

    Wavelet Methods and Inverse Problems

    Get PDF
    Archaeological investigations are designed to acquire information without damaging the archaeological site. Magnetometry is one of the important techniques for producing a surface grid of readings, which can be used to infer underground features. The inversion of this data, to give a fitted model, is an inverse problem. This type of problem can be ill-posed or ill-conditioned, making the estimation of model parameters less stable or even impossible. More precisely, the relationship between archaeological data and parameters is expressed by a likelihood. It is not possible to use the standard regression estimate obtained through the likelihood, which means that no maximum likelihood estimate exists. Instead, various constraints can be added through a prior distribution with an estimate produced using the posterior distribution. Current approaches incorporate prior information describing smoothness, which is not always appropriate. The biggest challenge is that the reconstruction of an archaeological site as a single layer requires various physical features such as depth and extent to be assumed. By applying a smoothing prior in the analysis of stratigraphy data, however, these features are not easily estimated. Wavelet analysis has proved to be highly efficient at eliciting information from noisy data. Additionally, complicated signals can be explained by interpreting only a small number of wavelet coefficients. It is possible that a modelling approach, which attempts to describe an underlying function in terms of a multi-level wavelet representation will be an improvement on standard techniques. Further, a new method proposed uses an elastic-net based distribution as the prior. Two methods are used to solve the problem, one is based on one-stage estimation and the other is based on two stages. The one-stage considers two approaches a single prior for all wavelet resolution levels and a level-dependent prior, with separate priors at each resolution level. In a simulation study and a real data analysis, all these techniques are compared to several existing methods. It is shown that the methodology using a single prior provides good reconstruction, comparable even to several established wavelet methods that use mixture priors

    An Algorithmic Theory of Dependent Regularizers, Part 1: Submodular Structure

    Full text link
    We present an exploration of the rich theoretical connections between several classes of regularized models, network flows, and recent results in submodular function theory. This work unifies key aspects of these problems under a common theory, leading to novel methods for working with several important models of interest in statistics, machine learning and computer vision. In Part 1, we review the concepts of network flows and submodular function optimization theory foundational to our results. We then examine the connections between network flows and the minimum-norm algorithm from submodular optimization, extending and improving several current results. This leads to a concise representation of the structure of a large class of pairwise regularized models important in machine learning, statistics and computer vision. In Part 2, we describe the full regularization path of a class of penalized regression problems with dependent variables that includes the graph-guided LASSO and total variation constrained models. This description also motivates a practical algorithm. This allows us to efficiently find the regularization path of the discretized version of TV penalized models. Ultimately, our new algorithms scale up to high-dimensional problems with millions of variables

    Echocardiography

    Get PDF
    The book "Echocardiography - New Techniques" brings worldwide contributions from highly acclaimed clinical and imaging science investigators, and representatives from academic medical centers. Each chapter is designed and written to be accessible to those with a basic knowledge of echocardiography. Additionally, the chapters are meant to be stimulating and educational to the experts and investigators in the field of echocardiography. This book is aimed primarily at cardiology fellows on their basic echocardiography rotation, fellows in general internal medicine, radiology and emergency medicine, and experts in the arena of echocardiography. Over the last few decades, the rate of technological advancements has developed dramatically, resulting in new techniques and improved echocardiographic imaging. The authors of this book focused on presenting the most advanced techniques useful in today's research and in daily clinical practice. These advanced techniques are utilized in the detection of different cardiac pathologies in patients, in contributing to their clinical decision, as well as follow-up and outcome predictions. In addition to the advanced techniques covered, this book expounds upon several special pathologies with respect to the functions of echocardiography
    corecore