Search CORE

4,992 research outputs found

Optimal designs for statistical analysis with Zernike polynomials

Author: Dette Holger
Melas Viatcheslav B.
Pepelyshev Andrey
Publication venue
Publication date
Field of study

n.a. --Optimal design,Zernike polynomials,image analysis,D-optimality,E-optimality

Research Papers in Economics

Quantum learning: optimal classification of qubit states

Author: Aïmeur E
Aïmeur E Brassard G Gambs S
Bishop C M
Cohn D A
D'Ariano G M
Devroye L
Friedman J H
Gambs S
Gammelmark S
Guţă M Kahn J
Helstrom C W
Holevo A S
Le Cam L
Leonhardt U
Mitchell T
Mădălin Guţă
Nielsen M
Paris M G A
Radcliffe J M
van der Vaart A
Vapnik V
Wiseman H M
Wojciech Kotłowski
Publication venue: 'IOP Publishing'
Publication date: 01/01/2010
Field of study

Pattern recognition is a central topic in Learning Theory with numerous applications such as voice and text recognition, image analysis, computer diagnosis. The statistical set-up in classification is the following: we are given an i.i.d. training set

(X_{1},Y_{1}),... (X_{n},Y_{n})

where

X_{i}

represents a feature and

Y_{i}\in \{0,1\}

is a label attached to that feature. The underlying joint distribution of

(X,Y)

is unknown, but we can learn about it from the training set and we aim at devising low error classifiers

f:X\to Y

used to predict the label of new incoming features. Here we solve a quantum analogue of this problem, namely the classification of two arbitrary unknown qubit states. Given a number of `training' copies from each of the states, we would like to `learn' about them by performing a measurement on the training set. The outcome is then used to design mesurements for the classification of future systems with unknown labels. We find the asymptotically optimal classification strategy and show that typically, it performs strictly better than a plug-in strategy based on state estimation. The figure of merit is the excess risk which is the difference between the probability of error and the probability of error of the optimal measurement when the states are known, that is the Helstrom measurement. We show that the excess risk has rate

n^{-1}

and compute the exact constant of the rate.Comment: 24 pages, 4 figure

arXiv.org e-Print Archive

Crossref

CWI's Institutional Repository

Semi-supervised Learning based on Distributionally Robust Optimization

Author: Balsubramani A.
Blanchet J.H.
Blum A.
Ghosh S.
Grandvalet Y.
Luenberger D.G.
Namkoong H.
Rubner Y.
Shafieezadeh‐Abadeh S.
Villani C.
Volpi R.
Xu H.
Zhu X.
Publication venue: 'Wiley'
Publication date: 20/03/2019
Field of study

We propose a novel method for semi-supervised learning (SSL) based on data-driven distributionally robust optimization (DRO) using optimal transport metrics. Our proposed method enhances generalization error by using the unlabeled data to restrict the support of the worst case distribution in our DRO formulation. We enable the implementation of our DRO formulation by proposing a stochastic gradient descent algorithm which allows to easily implement the training procedure. We demonstrate that our Semi-supervised DRO method is able to improve the generalization error over natural supervised procedures and state-of-the-art SSL estimators. Finally, we include a discussion on the large sample behavior of the optimal uncertainty region in the DRO formulation. Our discussion exposes important aspects such as the role of dimension reduction in SSL

arXiv.org e-Print Archive

Crossref

The importance of better models in stochastic optimization

Author: Asi Hilal
Duchi John C.
Publication venue
Publication date: 20/03/2019
Field of study

Standard stochastic optimization methods are brittle, sensitive to stepsize choices and other algorithmic parameters, and they exhibit instability outside of well-behaved families of objectives. To address these challenges, we investigate models for stochastic minimization and learning problems that exhibit better robustness to problem families and algorithmic parameters. With appropriately accurate models---which we call the aProx family---stochastic methods can be made stable, provably convergent and asymptotically optimal; even modeling that the objective is nonnegative is sufficient for this stability. We extend these results beyond convexity to weakly convex objectives, which include compositions of convex losses with smooth functions common in modern machine learning applications. We highlight the importance of robustness and accurate modeling with a careful experimental evaluation of convergence time and algorithm sensitivity

arXiv.org e-Print Archive

SOCP relaxation bounds for the optimal subset selection problem applied to robust linear regression

Author: Flores Salvador
Publication venue: 'Elsevier BV'
Publication date: 01/01/2015
Field of study

This paper deals with the problem of finding the globally optimal subset of h elements from a larger set of n elements in d space dimensions so as to minimize a quadratic criterion, with an special emphasis on applications to computing the Least Trimmed Squares Estimator (LTSE) for robust regression. The computation of the LTSE is a challenging subset selection problem involving a nonlinear program with continuous and binary variables, linked in a highly nonlinear fashion. The selection of a globally optimal subset using the branch and bound (BB) algorithm is limited to problems in very low dimension, tipically d<5, as the complexity of the problem increases exponentially with d. We introduce a bold pruning strategy in the BB algorithm that results in a significant reduction in computing time, at the price of a negligeable accuracy lost. The novelty of our algorithm is that the bounds at nodes of the BB tree come from pseudo-convexifications derived using a linearization technique with approximate bounds for the nonlinear terms. The approximate bounds are computed solving an auxiliary semidefinite optimization problem. We show through a computational study that our algorithm performs well in a wide set of the most difficult instances of the LTSE problem.Comment: 12 pages, 3 figures, 2 table

arXiv.org e-Print Archive

Repositorio Académico de la Universidad de Chile

Adaptive Reduced Rank Regression

Author: Kanade Varun
Li Yanhua
Liu Zhenming
Wong Felix Ming Fai
Wu Qiong
Publication venue
Publication date: 26/02/2020
Field of study

We study the low rank regression problem

\mathbf{y} = M\mathbf{x} + \epsilon

, where

\mathbf{x}

and

\mathbf{y}

are

d_1

and

d_2

dimensional vectors respectively. We consider the extreme high-dimensional setting where the number of observations

n

is less than

d_1 + d_2

. Existing algorithms are designed for settings where

n

is typically as large as

\mathrm{rank}(M)(d_1+d_2)

. This work provides an efficient algorithm which only involves two SVD, and establishes statistical guarantees on its performance. The algorithm decouples the problem by first estimating the precision matrix of the features, and then solving the matrix denoising problem. To complement the upper bound, we introduce new techniques for establishing lower bounds on the performance of any algorithm for this problem. Our preliminary experiments confirm that our algorithm often out-performs existing baselines, and is always at least competitive.Comment: 40 page

arXiv.org e-Print Archive

Oxford University Research Archive

Exponential Screening and optimal rates of sparse estimation

Author: Rigollet Philippe
Tsybakov Alexandre
Publication venue
Publication date: 27/07/2010
Field of study

In high-dimensional linear regression, the goal pursued here is to estimate an unknown regression function using linear combinations of a suitable set of covariates. One of the key assumptions for the success of any statistical procedure in this setup is to assume that the linear combination is sparse in some sense, for example, that it involves only few covariates. We consider a general, non necessarily linear, regression with Gaussian noise and study a related question that is to find a linear combination of approximating functions, which is at the same time sparse and has small mean squared error (MSE). We introduce a new estimation procedure, called Exponential Screening that shows remarkable adaptation properties. It adapts to the linear combination that optimally balances MSE and sparsity, whether the latter is measured in terms of the number of non-zero entries in the combination (

\ell_0

norm) or in terms of the global weight of the combination (

\ell_1

norm). The power of this adaptation result is illustrated by showing that Exponential Screening solves optimally and simultaneously all the problems of aggregation in Gaussian regression that have been discussed in the literature. Moreover, we show that the performance of the Exponential Screening estimator cannot be improved in a minimax sense, even if the optimal sparsity is known in advance. The theoretical and numerical superiority of Exponential Screening compared to state-of-the-art sparse procedures is also discussed

arXiv.org e-Print Archive

Hal-Diderot