100 research outputs found
Sparse Volterra and Polynomial Regression Models: Recoverability and Estimation
Volterra and polynomial regression models play a major role in nonlinear
system identification and inference tasks. Exciting applications ranging from
neuroscience to genome-wide association analysis build on these models with the
additional requirement of parsimony. This requirement has high interpretative
value, but unfortunately cannot be met by least-squares based or kernel
regression methods. To this end, compressed sampling (CS) approaches, already
successful in linear regression settings, can offer a viable alternative. The
viability of CS for sparse Volterra and polynomial models is the core theme of
this work. A common sparse regression task is initially posed for the two
models. Building on (weighted) Lasso-based schemes, an adaptive RLS-type
algorithm is developed for sparse polynomial regressions. The identifiability
of polynomial models is critically challenged by dimensionality. However,
following the CS principle, when these models are sparse, they could be
recovered by far fewer measurements. To quantify the sufficient number of
measurements for a given level of sparsity, restricted isometry properties
(RIP) are investigated in commonly met polynomial regression settings,
generalizing known results for their linear counterparts. The merits of the
novel (weighted) adaptive CS algorithms to sparse polynomial modeling are
verified through synthetic as well as real data tests for genotype-phenotype
analysis.Comment: 20 pages, to appear in IEEE Trans. on Signal Processin
Variational Multiscale Nonparametric Regression: Algorithms and Implementation
Many modern statistically efficient methods come with tremendous
computational challenges, often leading to large-scale optimisation problems.
In this work, we examine such computational issues for recently developed
estimation methods in nonparametric regression with a specific view on image
denoising. We consider in particular certain variational multiscale estimators
which are statistically optimal in minimax sense, yet computationally
intensive. Such an estimator is computed as the minimiser of a smoothness
functional (e.g., TV norm) over the class of all estimators such that none of
its coefficients with respect to a given multiscale dictionary is statistically
significant. The so obtained multiscale Nemirowski-Dantzig estimator (MIND) can
incorporate any convex smoothness functional and combine it with a proper
dictionary including wavelets, curvelets and shearlets. The computation of MIND
in general requires to solve a high-dimensional constrained convex optimisation
problem with a specific structure of the constraints induced by the statistical
multiscale testing criterion. To solve this explicitly, we discuss three
different algorithmic approaches: the Chambolle-Pock, ADMM and semismooth
Newton algorithms. Algorithmic details and an explicit implementation is
presented and the solutions are then compared numerically in a simulation study
and on various test images. We thereby recommend the Chambolle-Pock algorithm
in most cases for its fast convergence. We stress that our analysis can also be
transferred to signal recovery and other denoising problems to recover more
general objects whenever it is possible to borrow statistical strength from
data patches of similar object structure.Comment: Codes are available at https://github.com/housenli/MIN
A Singular Value Thresholding Algorithm for Matrix Completion
This paper introduces a novel algorithm to approximate the matrix with minimum
nuclear norm among all matrices obeying a set of convex constraints. This problem may be understood
as the convex relaxation of a rank minimization problem and arises in many important
applications as in the task of recovering a large matrix from a small subset of its entries (the famous
Netflix problem). Off-the-shelf algorithms such as interior point methods are not directly amenable
to large problems of this kind with over a million unknown entries. This paper develops a simple
first-order and easy-to-implement algorithm that is extremely efficient at addressing problems in
which the optimal solution has low rank. The algorithm is iterative, produces a sequence of matrices
{X^k,Y^k}, and at each step mainly performs a soft-thresholding operation on the singular values
of the matrix Y^k. There are two remarkable features making this attractive for low-rank matrix
completion problems. The first is that the soft-thresholding operation is applied to a sparse matrix;
the second is that the rank of the iterates {X^k} is empirically nondecreasing. Both these facts allow
the algorithm to make use of very minimal storage space and keep the computational cost of each
iteration low. On the theoretical side, we provide a convergence analysis showing that the sequence
of iterates converges. On the practical side, we provide numerical examples in which 1,000 × 1,000
matrices are recovered in less than a minute on a modest desktop computer. We also demonstrate
that our approach is amenable to very large scale problems by recovering matrices of rank about
10 with nearly a billion unknowns from just about 0.4% of their sampled entries. Our methods are
connected with the recent literature on linearized Bregman iterations for ℓ_1 minimization, and we
develop a framework in which one can understand these algorithms in terms of well-known Lagrange
multiplier algorithms
Sparse modelling and estimation for nonstationary time series and high-dimensional data
Sparse modelling has attracted great attention as an efficient way of
handling statistical problems in high dimensions. This thesis considers
sparse modelling and estimation in a selection of problems such
as breakpoint detection in nonstationary time series, nonparametric
regression using piecewise constant functions and variable selection in
high-dimensional linear regression.
We first propose a method for detecting breakpoints in the secondorder
structure of piecewise stationary time series, assuming that
those structural breakpoints are sufficiently scattered over time. Our
choice of time series model is the locally stationary wavelet process
(Nason et al., 2000), under which the entire second-order structure of a
time series is described by wavelet-based local periodogram sequences.
As the initial stage of breakpoint detection, we apply a binary segmentation
procedure to wavelet periodogram sequences at each scale
separately, which is followed by within-scale and across-scales postprocessing
steps. We show that the combined methodology achieves
consistent estimation of the breakpoints in terms of their total number
and locations, and investigate its practical performance using both
simulated and real data.
Next, we study the problem of nonparametric regression by means of
piecewise constant functions, which are known to be flexible in approximating
a wide range of function spaces. Among many approaches developed
for this purpose, we focus on comparing two well-performing
techniques, the taut string (Davies & Kovac, 2001) and the Unbalanced
Haar (Fryzlewicz, 2007) methods. While the multiscale nature
of the latter is easily observed, it is not so obvious that the former
can also be interpreted as multiscale. We provide a unified, multiscale
representation for both methods, which offers an insight into the relationship
between them as well as suggesting some lessons that both
methods can learn from each other.
Lastly, one of the most widely-studied applications of sparse modelling
and estimation is considered, variable selection in high-dimensional
linear regression. High dimensionality of the data brings in many
complications including (possibly spurious) non-negligible correlations
among the variables, which may result in marginal correlation being
unreliable as a measure of association between the variables and the
response. We propose a new way of measuring the contribution of
each variable to the response, which adaptively takes into account
high correlations among the variables. A key ingredient of the proposed
tilting procedure is hard-thresholding sample correlation of the
design matrix, which enables a data-driven switch between the use of
marginal correlation and tilted correlation for each variable. We study
the conditions under which this measure can discriminate between relevant
and irrelevant variables, and thus be used as a tool for variable
selection. In order to exploit these theoretical properties of tilted correlation,
we construct an iterative variable screening algorithm and
examine its practical performance in a comparative simulation study
- …