237 research outputs found

    Sparse modelling and estimation for nonstationary time series and high-dimensional data

    Get PDF
    Sparse modelling has attracted great attention as an efficient way of handling statistical problems in high dimensions. This thesis considers sparse modelling and estimation in a selection of problems such as breakpoint detection in nonstationary time series, nonparametric regression using piecewise constant functions and variable selection in high-dimensional linear regression. We first propose a method for detecting breakpoints in the secondorder structure of piecewise stationary time series, assuming that those structural breakpoints are sufficiently scattered over time. Our choice of time series model is the locally stationary wavelet process (Nason et al., 2000), under which the entire second-order structure of a time series is described by wavelet-based local periodogram sequences. As the initial stage of breakpoint detection, we apply a binary segmentation procedure to wavelet periodogram sequences at each scale separately, which is followed by within-scale and across-scales postprocessing steps. We show that the combined methodology achieves consistent estimation of the breakpoints in terms of their total number and locations, and investigate its practical performance using both simulated and real data. Next, we study the problem of nonparametric regression by means of piecewise constant functions, which are known to be flexible in approximating a wide range of function spaces. Among many approaches developed for this purpose, we focus on comparing two well-performing techniques, the taut string (Davies & Kovac, 2001) and the Unbalanced Haar (Fryzlewicz, 2007) methods. While the multiscale nature of the latter is easily observed, it is not so obvious that the former can also be interpreted as multiscale. We provide a unified, multiscale representation for both methods, which offers an insight into the relationship between them as well as suggesting some lessons that both methods can learn from each other. Lastly, one of the most widely-studied applications of sparse modelling and estimation is considered, variable selection in high-dimensional linear regression. High dimensionality of the data brings in many complications including (possibly spurious) non-negligible correlations among the variables, which may result in marginal correlation being unreliable as a measure of association between the variables and the response. We propose a new way of measuring the contribution of each variable to the response, which adaptively takes into account high correlations among the variables. A key ingredient of the proposed tilting procedure is hard-thresholding sample correlation of the design matrix, which enables a data-driven switch between the use of marginal correlation and tilted correlation for each variable. We study the conditions under which this measure can discriminate between relevant and irrelevant variables, and thus be used as a tool for variable selection. In order to exploit these theoretical properties of tilted correlation, we construct an iterative variable screening algorithm and examine its practical performance in a comparative simulation study

    Basic Singular Spectrum Analysis and Forecasting with R

    Full text link
    Singular Spectrum Analysis (SSA) as a tool for analysis and forecasting of time series is considered. The main features of the Rssa package, which implements the SSA algorithms and methodology in R, are described and examples of its use are presented. Analysis, forecasting and parameter estimation are demonstrated by means of case study with an accompanying code in R

    True CMB Power Spectrum Estimation

    Full text link
    The cosmic microwave background (CMB) power spectrum is a powerful cosmological probe as it entails almost all the statistical information of the CMB perturbations. Having access to only one sky, the CMB power spectrum measured by our experiments is only a realization of the true underlying angular power spectrum. In this paper we aim to recover the true underlying CMB power spectrum from the one realization that we have without a need to know the cosmological parameters. The sparsity of the CMB power spectrum is first investigated in two dictionaries; Discrete Cosine Transform (DCT) and Wavelet Transform (WT). The CMB power spectrum can be recovered with only a few percentage of the coefficients in both of these dictionaries and hence is very compressible in these dictionaries. We study the performance of these dictionaries in smoothing a set of simulated power spectra. Based on this, we develop a technique that estimates the true underlying CMB power spectrum from data, i.e. without a need to know the cosmological parameters. This smooth estimated spectrum can be used to simulate CMB maps with similar properties to the true CMB simulations with the correct cosmological parameters. This allows us to make Monte Carlo simulations in a given project, without having to know the cosmological parameters. The developed IDL code, TOUSI, for Theoretical pOwer spectrUm using Sparse estImation, will be released with the next version of ISAP

    Initial Conditions for Large Cosmological Simulations

    Full text link
    This technical paper describes a software package that was designed to produce initial conditions for large cosmological simulations in the context of the Horizon collaboration. These tools generalize E. Bertschinger's Grafic1 software to distributed parallel architectures and offer a flexible alternative to the Grafic2 software for ``zoom'' initial conditions, at the price of large cumulated cpu and memory usage. The codes have been validated up to resolutions of 4096^3 and were used to generate the initial conditions of large hydrodynamical and dark matter simulations. They also provide means to generate constrained realisations for the purpose of generating initial conditions compatible with, e.g. the local group, or the SDSS catalog.Comment: 12 pages, 11 figures, submitted to ApJ