5,780 research outputs found
Sparse and Functional Principal Components Analysis
Regularized variants of Principal Components Analysis, especially Sparse PCA
and Functional PCA, are among the most useful tools for the analysis of complex
high-dimensional data. Many examples of massive data, have both sparse and
functional (smooth) aspects and may benefit from a regularization scheme that
can capture both forms of structure. For example, in neuro-imaging data, the
brain's response to a stimulus may be restricted to a discrete region of
activation (spatial sparsity), while exhibiting a smooth response within that
region. We propose a unified approach to regularized PCA which can induce both
sparsity and smoothness in both the row and column principal components. Our
framework generalizes much of the previous literature, with sparse, functional,
two-way sparse, and two-way functional PCA all being special cases of our
approach. Our method permits flexible combinations of sparsity and smoothness
that lead to improvements in feature selection and signal recovery, as well as
more interpretable PCA factors. We demonstrate the efficacy of our method on
simulated data and a neuroimaging example on EEG data.Comment: The published version of this paper incorrectly thanks "Luofeng Luo"
instead of "Luofeng Liao" in the Acknowledgement
Orthogonal Sparse PCA and Covariance Estimation via Procrustes Reformulation
The problem of estimating sparse eigenvectors of a symmetric matrix attracts
a lot of attention in many applications, especially those with high dimensional
data set. While classical eigenvectors can be obtained as the solution of a
maximization problem, existing approaches formulate this problem by adding a
penalty term into the objective function that encourages a sparse solution.
However, the resulting methods achieve sparsity at the expense of sacrificing
the orthogonality property. In this paper, we develop a new method to estimate
dominant sparse eigenvectors without trading off their orthogonality. The
problem is highly non-convex and hard to handle. We apply the MM framework
where we iteratively maximize a tight lower bound (surrogate function) of the
objective function over the Stiefel manifold. The inner maximization problem
turns out to be a rectangular Procrustes problem, which has a closed form
solution. In addition, we propose a method to improve the covariance estimation
problem when its underlying eigenvectors are known to be sparse. We use the
eigenvalue decomposition of the covariance matrix to formulate an optimization
problem where we impose sparsity on the corresponding eigenvectors. Numerical
experiments show that the proposed eigenvector extraction algorithm matches or
outperforms existing algorithms in terms of support recovery and explained
variance, while the covariance estimation algorithms improve significantly the
sample covariance estimator
Nonconvex Optimization Meets Low-Rank Matrix Factorization: An Overview
Substantial progress has been made recently on developing provably accurate
and efficient algorithms for low-rank matrix factorization via nonconvex
optimization. While conventional wisdom often takes a dim view of nonconvex
optimization algorithms due to their susceptibility to spurious local minima,
simple iterative methods such as gradient descent have been remarkably
successful in practice. The theoretical footings, however, had been largely
lacking until recently.
In this tutorial-style overview, we highlight the important role of
statistical models in enabling efficient nonconvex optimization with
performance guarantees. We review two contrasting approaches: (1) two-stage
algorithms, which consist of a tailored initialization step followed by
successive refinement; and (2) global landscape analysis and
initialization-free algorithms. Several canonical matrix factorization problems
are discussed, including but not limited to matrix sensing, phase retrieval,
matrix completion, blind deconvolution, robust principal component analysis,
phase synchronization, and joint alignment. Special care is taken to illustrate
the key technical insights underlying their analyses. This article serves as a
testament that the integrated consideration of optimization and statistics
leads to fruitful research findings.Comment: Invited overview articl
Adaptive Restart for Accelerated Gradient Schemes
In this paper we demonstrate a simple heuristic adaptive restart technique
that can dramatically improve the convergence rate of accelerated gradient
schemes. The analysis of the technique relies on the observation that these
schemes exhibit two modes of behavior depending on how much momentum is
applied. In what we refer to as the 'high momentum' regime the iterates
generated by an accelerated gradient scheme exhibit a periodic behavior, where
the period is proportional to the square root of the local condition number of
the objective function. This suggests a restart technique whereby we reset the
momentum whenever we observe periodic behavior. We provide analysis to show
that in many cases adaptively restarting allows us to recover the optimal rate
of convergence with no prior knowledge of function parameters.Comment: 17 pages, 7 figure
Boosted Sparse Non-linear Distance Metric Learning
This paper proposes a boosting-based solution addressing metric learning
problems for high-dimensional data. Distance measures have been used as natural
measures of (dis)similarity and served as the foundation of various learning
methods. The efficiency of distance-based learning methods heavily depends on
the chosen distance metric. With increasing dimensionality and complexity of
data, however, traditional metric learning methods suffer from poor scalability
and the limitation due to linearity as the true signals are usually embedded
within a low-dimensional nonlinear subspace. In this paper, we propose a
nonlinear sparse metric learning algorithm via boosting. We restructure a
global optimization problem into a forward stage-wise learning of weak learners
based on a rank-one decomposition of the weight matrix in the Mahalanobis
distance metric. A gradient boosting algorithm is devised to obtain a sparse
rank-one update of the weight matrix at each step. Nonlinear features are
learned by a hierarchical expansion of interactions incorporated within the
boosting algorithm. Meanwhile, an early stopping rule is imposed to control the
overall complexity of the learned metric. As a result, our approach guarantees
three desirable properties of the final metric: positive semi-definiteness, low
rank and element-wise sparsity. Numerical experiments show that our learning
model compares favorably with the state-of-the-art methods in the current
literature of metric learning
Penalized Orthogonal Iteration for Sparse Estimation of Generalized Eigenvalue Problem
We propose a new algorithm for sparse estimation of eigenvectors in
generalized eigenvalue problems (GEP). The GEP arises in a number of modern
data-analytic situations and statistical methods, including principal component
analysis (PCA), multiclass linear discriminant analysis (LDA), canonical
correlation analysis (CCA), sufficient dimension reduction (SDR) and invariant
co-ordinate selection. We propose to modify the standard generalized orthogonal
iteration with a sparsity-inducing penalty for the eigenvectors. To achieve
this goal, we generalize the equation-solving step of orthogonal iteration to a
penalized convex optimization problem. The resulting algorithm, called
penalized orthogonal iteration, provides accurate estimation of the true
eigenspace, when it is sparse. Also proposed is a computationally more
efficient alternative, which works well for PCA and LDA problems. Numerical
studies reveal that the proposed algorithms are competitive, and that our
tuning procedure works well. We demonstrate applications of the proposed
algorithm to obtain sparse estimates for PCA, multiclass LDA, CCA and SDR.
Supplementary materials are available online
A direct formulation for sparse PCA using semidefinite programming
We examine the problem of approximating, in the Frobenius-norm sense, a
positive, semidefinite symmetric matrix by a rank-one matrix, with an upper
bound on the cardinality of its eigenvector. The problem arises in the
decomposition of a covariance matrix into sparse factors, and has wide
applications ranging from biology to finance. We use a modification of the
classical variational representation of the largest eigenvalue of a symmetric
matrix, where cardinality is constrained, and derive a semidefinite programming
based relaxation for our problem. We also discuss Nesterov's smooth
minimization technique applied to the SDP arising in the direct sparse PCA
method.Comment: Final version, to appear in SIAM revie
Low-rank spectral optimization via gauge duality
Various applications in signal processing and machine learning give rise to
highly structured spectral optimization problems characterized by low-rank
solutions. Two important examples that motivate this work are optimization
problems from phase retrieval and from blind deconvolution, which are designed
to yield rank-1 solutions. An algorithm is described that is based on solving a
certain constrained eigenvalue optimization problem that corresponds to the
gauge dual which, unlike the more typical Lagrange dual, has an especially
simple constraint. The dominant cost at each iteration is the computation of
rightmost eigenpairs of a Hermitian operator. A range of numerical examples
illustrate the scalability of the approach.Comment: Final version. To appear in SIAM J. Scientific Computin
Uniform Convergence of Gradients for Non-Convex Learning and Optimization
We investigate 1) the rate at which refined properties of the empirical
risk---in particular, gradients---converge to their population counterparts in
standard non-convex learning tasks, and 2) the consequences of this convergence
for optimization. Our analysis follows the tradition of norm-based capacity
control. We propose vector-valued Rademacher complexities as a simple,
composable, and user-friendly tool to derive dimension-free uniform convergence
bounds for gradients in non-convex learning problems. As an application of our
techniques, we give a new analysis of batch gradient descent methods for
non-convex generalized linear models and non-convex robust regression, showing
how to use any algorithm that finds approximate stationary points to obtain
optimal sample complexity, even when dimension is high or possibly infinite and
multiple passes over the dataset are allowed.
Moving to non-smooth models we show----in contrast to the smooth case---that
even for a single ReLU it is not possible to obtain dimension-independent
convergence rates for gradients in the worst case. On the positive side, it is
still possible to obtain dimension-independent rates under a new type of
distributional assumption.Comment: To appear in Neural Information Processing Systems (NIPS) 201
Sparse Quantile Huber Regression for Efficient and Robust Estimation
We consider new formulations and methods for sparse quantile regression in
the high-dimensional setting. Quantile regression plays an important role in
many applications, including outlier-robust exploratory analysis in gene
selection. In addition, the sparsity consideration in quantile regression
enables the exploration of the entire conditional distribution of the response
variable given the predictors and therefore yields a more comprehensive view of
the important predictors. We propose a generalized OMP algorithm for variable
selection, taking the misfit loss to be either the traditional quantile loss or
a smooth version we call quantile Huber, and compare the resulting greedy
approaches with convex sparsity-regularized formulations. We apply a recently
proposed interior point methodology to efficiently solve all convex
formulations as well as convex subproblems in the generalized OMP setting, pro-
vide theoretical guarantees of consistent estimation, and demonstrate the
performance of our approach using empirical studies of simulated and genomic
datasets.Comment: 9 page
- …