237 research outputs found
Sparse modelling and estimation for nonstationary time series and high-dimensional data
Sparse modelling has attracted great attention as an efficient way of
handling statistical problems in high dimensions. This thesis considers
sparse modelling and estimation in a selection of problems such
as breakpoint detection in nonstationary time series, nonparametric
regression using piecewise constant functions and variable selection in
high-dimensional linear regression.
We first propose a method for detecting breakpoints in the secondorder
structure of piecewise stationary time series, assuming that
those structural breakpoints are sufficiently scattered over time. Our
choice of time series model is the locally stationary wavelet process
(Nason et al., 2000), under which the entire second-order structure of a
time series is described by wavelet-based local periodogram sequences.
As the initial stage of breakpoint detection, we apply a binary segmentation
procedure to wavelet periodogram sequences at each scale
separately, which is followed by within-scale and across-scales postprocessing
steps. We show that the combined methodology achieves
consistent estimation of the breakpoints in terms of their total number
and locations, and investigate its practical performance using both
simulated and real data.
Next, we study the problem of nonparametric regression by means of
piecewise constant functions, which are known to be flexible in approximating
a wide range of function spaces. Among many approaches developed
for this purpose, we focus on comparing two well-performing
techniques, the taut string (Davies & Kovac, 2001) and the Unbalanced
Haar (Fryzlewicz, 2007) methods. While the multiscale nature
of the latter is easily observed, it is not so obvious that the former
can also be interpreted as multiscale. We provide a unified, multiscale
representation for both methods, which offers an insight into the relationship
between them as well as suggesting some lessons that both
methods can learn from each other.
Lastly, one of the most widely-studied applications of sparse modelling
and estimation is considered, variable selection in high-dimensional
linear regression. High dimensionality of the data brings in many
complications including (possibly spurious) non-negligible correlations
among the variables, which may result in marginal correlation being
unreliable as a measure of association between the variables and the
response. We propose a new way of measuring the contribution of
each variable to the response, which adaptively takes into account
high correlations among the variables. A key ingredient of the proposed
tilting procedure is hard-thresholding sample correlation of the
design matrix, which enables a data-driven switch between the use of
marginal correlation and tilted correlation for each variable. We study
the conditions under which this measure can discriminate between relevant
and irrelevant variables, and thus be used as a tool for variable
selection. In order to exploit these theoretical properties of tilted correlation,
we construct an iterative variable screening algorithm and
examine its practical performance in a comparative simulation study
Basic Singular Spectrum Analysis and Forecasting with R
Singular Spectrum Analysis (SSA) as a tool for analysis and forecasting of
time series is considered. The main features of the Rssa package, which
implements the SSA algorithms and methodology in R, are described and examples
of its use are presented. Analysis, forecasting and parameter estimation are
demonstrated by means of case study with an accompanying code in R
True CMB Power Spectrum Estimation
The cosmic microwave background (CMB) power spectrum is a powerful
cosmological probe as it entails almost all the statistical information of the
CMB perturbations. Having access to only one sky, the CMB power spectrum
measured by our experiments is only a realization of the true underlying
angular power spectrum. In this paper we aim to recover the true underlying CMB
power spectrum from the one realization that we have without a need to know the
cosmological parameters. The sparsity of the CMB power spectrum is first
investigated in two dictionaries; Discrete Cosine Transform (DCT) and Wavelet
Transform (WT). The CMB power spectrum can be recovered with only a few
percentage of the coefficients in both of these dictionaries and hence is very
compressible in these dictionaries. We study the performance of these
dictionaries in smoothing a set of simulated power spectra. Based on this, we
develop a technique that estimates the true underlying CMB power spectrum from
data, i.e. without a need to know the cosmological parameters. This smooth
estimated spectrum can be used to simulate CMB maps with similar properties to
the true CMB simulations with the correct cosmological parameters. This allows
us to make Monte Carlo simulations in a given project, without having to know
the cosmological parameters. The developed IDL code, TOUSI, for Theoretical
pOwer spectrUm using Sparse estImation, will be released with the next version
of ISAP
Initial Conditions for Large Cosmological Simulations
This technical paper describes a software package that was designed to
produce initial conditions for large cosmological simulations in the context of
the Horizon collaboration. These tools generalize E. Bertschinger's Grafic1
software to distributed parallel architectures and offer a flexible alternative
to the Grafic2 software for ``zoom'' initial conditions, at the price of large
cumulated cpu and memory usage. The codes have been validated up to resolutions
of 4096^3 and were used to generate the initial conditions of large
hydrodynamical and dark matter simulations. They also provide means to generate
constrained realisations for the purpose of generating initial conditions
compatible with, e.g. the local group, or the SDSS catalog.Comment: 12 pages, 11 figures, submitted to ApJ
- …