Search CORE

42 research outputs found

Hadamard Wirtinger Flow for Sparse Phase Retrieval

Author: Rebeschini Patrick
Wu Fan
Publication venue
Publication date: 01/01/2021
Field of study

We consider the problem of reconstructing an

n

-dimensional

k

-sparse signal from a set of noiseless magnitude-only measurements. Formulating the problem as an unregularized empirical risk minimization task, we study the sample complexity performance of gradient descent with Hadamard parametrization, which we call Hadamard Wirtinger flow (HWF). Provided knowledge of the signal sparsity

k

, we prove that a single step of HWF is able to recover the support from

k(x^*_{max})^{-2}

(modulo logarithmic term) samples, where

x^*_{max}

is the largest component of the signal in magnitude. This support recovery procedure can be used to initialize existing reconstruction methods and yields algorithms with total runtime proportional to the cost of reading the data and improved sample complexity, which is linear in

k

when the signal contains at least one large component. We numerically investigate the performance of HWF at convergence and show that, while not requiring any explicit form of regularization nor knowledge of

k

, HWF adapts to the signal sparsity and reconstructs sparse signals with fewer measurements than existing gradient based methods

arXiv.org e-Print Archive

Oxford University Research Archive

Performance of $\ell_1$ Regularization for Sparse Convex Optimization

Author: Axiotis Kyriakos
Yasuda Taisuke
Publication venue
Publication date: 14/07/2023
Field of study

Despite widespread adoption in practice, guarantees for the LASSO and Group LASSO are strikingly lacking in settings beyond statistical problems, and these algorithms are usually considered to be a heuristic in the context of sparse convex optimization on deterministic inputs. We give the first recovery guarantees for the Group LASSO for sparse convex optimization with vector-valued features. We show that if a sufficiently large Group LASSO regularization is applied when minimizing a strictly convex function

l

, then the minimizer is a sparse vector supported on vector-valued features with the largest

\ell_2

norm of the gradient. Thus, repeating this procedure selects the same set of features as the Orthogonal Matching Pursuit algorithm, which admits recovery guarantees for any function

l

with restricted strong convexity and smoothness via weak submodularity arguments. This answers open questions of Tibshirani et al. and Yasuda et al. Our result is the first to theoretically explain the empirical success of the Group LASSO for convex functions under general input instances assuming only restricted strong convexity and smoothness. Our result also generalizes provable guarantees for the Sequential Attention algorithm, which is a feature selection algorithm inspired by the attention mechanism proposed by Yasuda et al. As an application of our result, we give new results for the column subset selection problem, which is well-studied when the loss is the Frobenius norm or other entrywise matrix losses. We give the first result for general loss functions for this problem that requires only restricted strong convexity and smoothness

arXiv.org e-Print Archive

More is Less: Inducing Sparsity via Overparameterization

Author: Chou Hung-Hsu
Maly Johannes
Rauhut Holger
Publication venue
Publication date: 22/11/2022
Field of study

In deep learning it is common to overparameterize neural networks, that is, to use more parameters than training samples. Quite surprisingly training the neural network via (stochastic) gradient descent leads to models that generalize very well, while classical statistics would suggest overfitting. In order to gain understanding of this implicit bias phenomenon we study the special case of sparse recovery (compressed sensing) which is of interest on its own. More precisely, in order to reconstruct a vector from underdetermined linear measurements, we introduce a corresponding overparameterized square loss functional, where the vector to be reconstructed is deeply factorized into several vectors. We show that, if there exists an exact solution, vanilla gradient flow for the overparameterized loss functional converges to a good approximation of the solution of minimal

\ell_1

-norm. The latter is well-known to promote sparse solutions. As a by-product, our results significantly improve the sample complexity for compressed sensing via gradient flow/descent on overparameterized models derived in previous works. The theory accurately predicts the recovery rate in numerical experiments. Our proof relies on analyzing a certain Bregman divergence of the flow. This bypasses the obstacles caused by non-convexity and should be of independent interest

arXiv.org e-Print Archive

Gibbs Sampling using Anti-correlation Gaussian Data Augmentation, with Applications to L1-ball-type Models

Author: Duan Leo L.
Zheng Yu
Publication venue
Publication date: 17/09/2023
Field of study

L1-ball-type priors are a recent generalization of the spike-and-slab priors. By transforming a continuous precursor distribution to the L1-ball boundary, it induces exact zeros with positive prior and posterior probabilities. With great flexibility in choosing the precursor and threshold distributions, we can easily specify models under structured sparsity, such as those with dependent probability for zeros and smoothness among the non-zeros. Motivated to significantly accelerate the posterior computation, we propose a new data augmentation that leads to a fast block Gibbs sampling algorithm. The latent variable, named ``anti-correlation Gaussian'', cancels out the quadratic exponent term in the latent Gaussian distribution, making the parameters of interest conditionally independent so that they can be updated in a block. Compared to existing algorithms such as the No-U-Turn sampler, the new blocked Gibbs sampler has a very low computing cost per iteration and shows rapid mixing of Markov chains. We establish the geometric ergodicity guarantee of the algorithm in linear models. Further, we show useful extensions of our algorithm for posterior estimation of general latent Gaussian models, such as those involving multivariate truncated Gaussian or latent Gaussian process. Keywords: Blocked Gibbs sampler; Fast Mixing of Markov Chains; Latent Gaussian Models; Soft-thresholding

arXiv.org e-Print Archive

Non-negative Least Squares via Overparametrization

Author: Chou Hung-Hsu
Maly Johannes
Verdun Claudio Mayrink
Publication venue
Publication date: 27/10/2022
Field of study

In many applications, solutions of numerical problems are required to be non-negative, e.g., when retrieving pixel intensity values or physical densities of a substance. In this context, non-negative least squares (NNLS) is a ubiquitous tool, e.g., when seeking sparse solutions of high-dimensional statistical problems. Despite vast efforts since the seminal work of Lawson and Hanson in the '70s, the non-negativity assumption is still an obstacle for the theoretical analysis and scalability of many off-the-shelf solvers. In the different context of deep neural networks, we recently started to see that the training of overparametrized models via gradient descent leads to surprising generalization properties and the retrieval of regularized solutions. In this paper, we prove that, by using an overparametrized formulation, NNLS solutions can reliably be approximated via vanilla gradient flow. We furthermore establish stability of the method against negative perturbations of the ground-truth. Our simulations confirm that this allows the use of vanilla gradient descent as a novel and scalable numerical solver for NNLS. From a conceptual point of view, our work proposes a novel approach to trading side-constraints in optimization problems against complexity of the optimization landscape, which does not build upon the concept of Lagrangian multipliers

arXiv.org e-Print Archive

Same Root Different Leaves: Time Series and Cross-Sectional Methods in Panel Data

Author: Ding Peng
Sekhon Jasjeet
Shen Dennis
Yu Bin
Publication venue
Publication date: 08/10/2022
Field of study

A central goal in social science is to evaluate the causal effect of a policy. One dominant approach is through panel data analysis in which the behaviors of multiple units are observed over time. The information across time and space motivates two general approaches: (i) horizontal regression (i.e., unconfoundedness), which exploits time series patterns, and (ii) vertical regression (e.g., synthetic controls), which exploits cross-sectional patterns. Conventional wisdom states that the two approaches are fundamentally different. We establish this position to be partly false for estimation but generally true for inference. In particular, we prove that both approaches yield identical point estimates under several standard settings. For the same point estimate, however, each approach quantifies uncertainty with respect to a distinct estimand. In turn, the confidence interval developed for one estimand may have incorrect coverage for another. This emphasizes that the source of randomness that researchers assume has direct implications for the accuracy of inference

arXiv.org e-Print Archive

Probabilistic models for structured sparsity

Author: Andersen Michael Riis
Publication venue: Technical University of Denmark
Publication date: 01/01/2017
Field of study

Online Research Database In Technology

Tensor Networks for Dimensionality Reduction and Large-Scale Optimizations. Part 2 Applications and Future Perspectives

Author: Cichocki A.
Lee N.
Mandic D.
Oseledets I. V.
Phan A-H.
Sugiyama M.
Zhao Q.
Publication venue: 'Now Publishers'
Publication date: 01/01/2017
Field of study

Part 2 of this monograph builds on the introduction to tensor networks and their operations presented in Part 1. It focuses on tensor network models for super-compressed higher-order representation of data/parameters and related cost functions, while providing an outline of their applications in machine learning and data analytics. A particular emphasis is on the tensor train (TT) and Hierarchical Tucker (HT) decompositions, and their physically meaningful interpretations which reflect the scalability of the tensor network approach. Through a graphical approach, we also elucidate how, by virtue of the underlying low-rank tensor approximations and sophisticated contractions of core tensors, tensor networks have the ability to perform distributed computations on otherwise prohibitively large volumes of data/parameters, thereby alleviating or even eliminating the curse of dimensionality. The usefulness of this concept is illustrated over a number of applied areas, including generalized regression and classification (support tensor machines, canonical correlation analysis, higher order partial least squares), generalized eigenvalue decomposition, Riemannian optimization, and in the optimization of deep neural networks. Part 1 and Part 2 of this work can be used either as stand-alone separate texts, or indeed as a conjoint comprehensive review of the exciting field of low-rank tensor networks and tensor decompositions.Comment: 232 page

arXiv.org e-Print Archive

Crossref

Implicit regularization in AI meets generalized hardness of approximation in optimization -- Sharp results for diagonal linear networks

Author: Antun Vegard
Hansen Anders C.
Wind Johan S.
Publication venue
Publication date: 13/07/2023
Field of study

Understanding the implicit regularization imposed by neural network architectures and gradient based optimization methods is a key challenge in deep learning and AI. In this work we provide sharp results for the implicit regularization imposed by the gradient flow of Diagonal Linear Networks (DLNs) in the over-parameterized regression setting and, potentially surprisingly, link this to the phenomenon of phase transitions in generalized hardness of approximation (GHA). GHA generalizes the phenomenon of hardness of approximation from computer science to, among others, continuous and robust optimization. It is well-known that the

\ell^1

-norm of the gradient flow of DLNs with tiny initialization converges to the objective function of basis pursuit. We improve upon these results by showing that the gradient flow of DLNs with tiny initialization approximates minimizers of the basis pursuit optimization problem (as opposed to just the objective function), and we obtain new and sharp convergence bounds w.r.t.\ the initialization size. Non-sharpness of our results would imply that the GHA phenomenon would not occur for the basis pursuit optimization problem -- which is a contradiction -- thus implying sharpness. Moreover, we characterize

\textit{which}

\ell_1

minimizer of the basis pursuit problem is chosen by the gradient flow whenever the minimizer is not unique. Interestingly, this depends on the depth of the DLN

arXiv.org e-Print Archive