Search CORE

589 research outputs found

Mixtures, envelopes, and hierarchical duality

Author: Polson Nicholas G.
Scott James G.
Publication venue
Publication date: 22/02/2015
Field of study

We develop a connection between mixture and envelope representations of objective functions that arise frequently in statistics. We refer to this connection using the term "hierarchical duality." Our results suggest an interesting and previously under-exploited relationship between marginalization and profiling, or equivalently between the Fenchel--Moreau theorem for convex functions and the Bernstein--Widder theorem for Laplace transforms. We give several different sets of conditions under which such a duality result obtains. We then extend existing work on envelope representations in several ways, including novel generalizations to variance-mean models and to multivariate Gaussian location models. This turns out to provide an elegant missing-data interpretation of the proximal gradient method, a widely used algorithm in machine learning. We show several statistical applications in which the proposed framework leads to easily implemented algorithms, including a robust version of the fused lasso, nonlinear quantile regression via trend filtering, and the binomial fused double Pareto model. Code for the examples is available on GitHub at https://github.com/jgscott/hierduals

arXiv.org e-Print Archive

Nonregular and Minimax Estimation of Individualized Thresholds in High Dimension with Binary Responses

Author: Feng Huijie
Ning Yang
Zhao Jiwei
Publication venue
Publication date: 26/05/2019
Field of study

Given a large number of covariates

Z

, we consider the estimation of a high-dimensional parameter

\theta

in an individualized linear threshold

\theta^T Z

for a continuous variable

X

, which minimizes the disagreement between

\text{sign}(X-\theta^TZ)

and a binary response

Y

. While the problem can be formulated into the M-estimation framework, minimizing the corresponding empirical risk function is computationally intractable due to discontinuity of the sign function. Moreover, estimating

\theta

even in the fixed-dimensional setting is known as a nonregular problem leading to nonstandard asymptotic theory. To tackle the computational and theoretical challenges in the estimation of the high-dimensional parameter

\theta

, we propose an empirical risk minimization approach based on a regularized smoothed loss function. The statistical and computational trade-off of the algorithm is investigated. Statistically, we show that the finite sample error bound for estimating

\theta

\ell_2

norm is

(s\log d/n)^{\beta/(2\beta+1)}

, where

d

is the dimension of

\theta

s

is the sparsity level,

n

is the sample size and

\beta

is the smoothness of the conditional density of

X

given the response

Y

and the covariates

Z

. The convergence rate is nonstandard and slower than that in the classical Lasso problems. Furthermore, we prove that the resulting estimator is minimax rate optimal up to a logarithmic factor. The Lepski's method is developed to achieve the adaption to the unknown sparsity

s

and smoothness

\beta

. Computationally, an efficient path-following algorithm is proposed to compute the solution path. We show that this algorithm achieves geometric rate of convergence for computing the whole path. Finally, we evaluate the finite sample performance of the proposed estimator in simulation studies and a real data analysis

arXiv.org e-Print Archive

Robust machine learning by median-of-means : theory and practice

Author: Lecué Guillaume
Lerasle Matthieu
Publication venue
Publication date: 30/11/2017
Field of study

We introduce new estimators for robust machine learning based on median-of-means (MOM) estimators of the mean of real valued random variables. These estimators achieve optimal rates of convergence under minimal assumptions on the dataset. The dataset may also have been corrupted by outliers on which no assumption is granted. We also analyze these new estimators with standard tools from robust statistics. In particular, we revisit the concept of breakdown point. We modify the original definition by studying the number of outliers that a dataset can contain without deteriorating the estimation properties of a given estimator. This new notion of breakdown number, that takes into account the statistical performances of the estimators, is non-asymptotic in nature and adapted for machine learning purposes. We proved that the breakdown number of our estimator is of the order of (number of observations)*(rate of convergence). For instance, the breakdown number of our estimators for the problem of estimation of a d-dimensional vector with a noise variance sigma^2 is sigma^2d and it becomes sigma^2 s log(d/s) when this vector has only s non-zero component. Beyond this breakdown point, we proved that the rate of convergence achieved by our estimator is (number of outliers) divided by (number of observation). Besides these theoretical guarantees, the major improvement brought by these new estimators is that they are easily computable in practice. In fact, basically any algorithm used to approximate the standard Empirical Risk Minimizer (or its regularized versions) has a robust version approximating our estimators. As a proof of concept, we study many algorithms for the classical LASSO estimator. A byproduct of the MOM algorithms is a measure of depth of data that can be used to detect outliers.Comment: 48 pages, 6 figure

arXiv.org e-Print Archive

A Dual-Dimer Method for Training Physics-Constrained Neural Networks with Minimax Architecture

Author: Liu Dehao
Wang Yan
Publication venue: 'Elsevier BV'
Publication date: 01/01/2021
Field of study

Data sparsity is a common issue to train machine learning tools such as neural networks for engineering and scientific applications, where experiments and simulations are expensive. Recently physics-constrained neural networks (PCNNs) were developed to reduce the required amount of training data. However, the weights of different losses from data and physical constraints are adjusted empirically in PCNNs. In this paper, a new physics-constrained neural network with the minimax architecture (PCNN-MM) is proposed so that the weights of different losses can be adjusted systematically. The training of the PCNN-MM is searching the high-order saddle points of the objective function. A novel saddle point search algorithm called Dual-Dimer method is developed. It is demonstrated that the Dual-Dimer method is computationally more efficient than the gradient descent ascent method for nonconvex-nonconcave functions and provides additional eigenvalue information to verify search results. A heat transfer example also shows that the convergence of PCNN-MMs is faster than that of traditional PCNNs.Comment: 34 pages, 5 figures, accepted by neural network

arXiv.org e-Print Archive

An Extragradient-type Algorithm for Variational Inequality on Hadamard Manifolds

Author: Batista E. E. A.
Bento G. C.
Ferreira O. P.
Publication venue
Publication date: 24/04/2018
Field of study

The aim of this paper is to present an extragradient method for variational inequality associated to a point-to-set vector field in Hadamard manifolds and to study its convergence properties. In order to present our method the concept of

\epsilon

-enlargement of maximal monotone vector fields is used and its lower-semicontinuity is stablished in order to obtain the convergence of the method in this new context

arXiv.org e-Print Archive

A survey of sparse representation: algorithms and applications

Author: Li Xuelong
Xu Yong
Yang Jian
Zhang David
Zhang Zheng
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 22/02/2016
Field of study

Sparse representation has attracted much attention from researchers in fields of signal processing, image processing, computer vision and pattern recognition. Sparse representation also has a good reputation in both theoretical research and practical applications. Many different algorithms have been proposed for sparse representation. The main purpose of this article is to provide a comprehensive study and an updated review on sparse representation and to supply a guidance for researchers. The taxonomy of sparse representation methods can be studied from various viewpoints. For example, in terms of different norm minimizations used in sparsity constraints, the methods can be roughly categorized into five groups: sparse representation with

l_0

-norm minimization, sparse representation with

l_p

-norm (0

<

<

1) minimization, sparse representation with

l_1

-norm minimization and sparse representation with

l_{2,1}

-norm minimization. In this paper, a comprehensive overview of sparse representation is provided. The available sparse representation algorithms can also be empirically categorized into four groups: greedy strategy approximation, constrained optimization, proximity algorithm-based optimization, and homotopy algorithm-based sparse representation. The rationales of different algorithms in each category are analyzed and a wide range of sparse representation applications are summarized, which could sufficiently reveal the potential nature of the sparse representation theory. Specifically, an experimentally comparative study of these sparse representation algorithms was presented. The Matlab code used in this paper can be available at: http://www.yongxu.org/lunwen.html.Comment: Published on IEEE Access, Vol. 3, pp. 490-530, 201

arXiv.org e-Print Archive

Efficient regularization with wavelet sparsity constraints in PAT

Author: Frikel Jürgen
Haltmeier Markus
Publication venue
Publication date: 23/03/2017
Field of study

In this paper we consider the reconstruction problem of photoacoustic tomography (PAT) with a flat observation surface. We develop a direct reconstruction method that employs regularization with wavelet sparsity constraints. To that end, we derive a wavelet-vaguelette decomposition (WVD) for the PAT forward operator and a corresponding explicit reconstruction formula in the case of exact data. In the case of noisy data, we combine the WVD reconstruction formula with soft-thresholding which yields a spatially adaptive estimation method. We demonstrate that our method is statistically optimal for white random noise if the unknown function is assumed to lie in any Besov-ball. We present generalizations of this approach and, in particular, we discuss the combination of vaguelette soft-thresholding with a TV prior. We also provide an efficient implementation of the vaguelette transform that leads to fast image reconstruction algorithms supported by numerical results.Comment: 25 pages, 6 figure

arXiv.org e-Print Archive

Optimal rates for zero-order convex optimization: the power of two function evaluations

Author: Duchi John C.
Jordan Michael I.
Wainwright Martin J.
Wibisono Andre
Publication venue
Publication date: 20/08/2014
Field of study

We consider derivative-free algorithms for stochastic and non-stochastic convex optimization problems that use only function values rather than gradients. Focusing on non-asymptotic bounds on convergence rates, we show that if pairs of function values are available, algorithms for

d

-dimensional optimization that use gradient estimates based on random perturbations suffer a factor of at most

\sqrt{d}

in convergence rate over traditional stochastic gradient methods. We establish such results for both smooth and non-smooth cases, sharpening previous analyses that suggested a worse dimension dependence, and extend our results to the case of multiple (

m \ge 2

) evaluations. We complement our algorithmic development with information-theoretic lower bounds on the minimax convergence rate of such problems, establishing the sharpness of our achievable results up to constant (sometimes logarithmic) factors.Comment: 34 page

arXiv.org e-Print Archive

Learning from Comparisons and Choices

Author: Negahban Sahand
Oh Sewoong
Thekumparampil Kiran K.
Xu Jiaming
Publication venue
Publication date: 30/12/2018
Field of study

When tracking user-specific online activities, each user's preference is revealed in the form of choices and comparisons. For example, a user's purchase history is a record of her choices, i.e. which item was chosen among a subset of offerings. A user's preferences can be observed either explicitly as in movie ratings or implicitly as in viewing times of news articles. Given such individualized ordinal data in the form of comparisons and choices, we address the problem of collaboratively learning representations of the users and the items. The learned features can be used to predict a user's preference of an unseen item to be used in recommendation systems. This also allows one to compute similarities among users and items to be used for categorization and search. Motivated by the empirical successes of the MultiNomial Logit (MNL) model in marketing and transportation, and also more recent successes in word embedding and crowdsourced image embedding, we pose this problem as learning the MNL model parameters that best explain the data. We propose a convex relaxation for learning the MNL model, and show that it is minimax optimal up to a logarithmic factor by comparing its performance to a fundamental lower bound. This characterizes the minimax sample complexity of the problem, and proves that the proposed estimator cannot be improved upon other than by a logarithmic factor. Further, the analysis identifies how the accuracy depends on the topology of sampling via the spectrum of the sampling graph. This provides a guideline for designing surveys when one can choose which items are to be compared. This is accompanied by numerical simulations on synthetic and real data sets, confirming our theoretical predictions.Comment: 77 pages, 12 figures; added new experiments and references. arXiv admin note: substantial text overlap with arXiv:1506.0794

arXiv.org e-Print Archive

Solving Non-Convex Non-Differentiable Min-Max Games using Proximal Gradient Method

Author: Barazandeh Babak
Razaviyayn Meisam
Publication venue
Publication date: 18/03/2020
Field of study

Min-max saddle point games appear in a wide range of applications in machine leaning and signal processing. Despite their wide applicability, theoretical studies are mostly limited to the special convex-concave structure. While some recent works generalized these results to special smooth non-convex cases, our understanding of non-smooth scenarios is still limited. In this work, we study special form of non-smooth min-max games when the objective function is (strongly) convex with respect to one of the player's decision variable. We show that a simple multi-step proximal gradient descent-ascent algorithm converges to

\epsilon

-first-order Nash equilibrium of the min-max game with the number of gradient evaluations being polynomial in

1/\epsilon

. We will also show that our notion of stationarity is stronger than existing ones in the literature. Finally, we evaluate the performance of the proposed algorithm through adversarial attack on a LASSO estimator

arXiv.org e-Print Archive