Search CORE

437 research outputs found

Sparse Regression Learning by Aggregation and Langevin Monte-Carlo

Author: Dalalyan Arnak
Tsybakov Alexandre B.
Publication venue: 'Elsevier BV'
Publication date: 01/01/2009
Field of study

We consider the problem of regression learning for deterministic design and independent random errors. We start by proving a sharp PAC-Bayesian type bound for the exponentially weighted aggregate (EWA) under the expected squared empirical loss. For a broad class of noise distributions the presented bound is valid whenever the temperature parameter

\beta

of the EWA is larger than or equal to

4\sigma^2

, where

\sigma^2

is the noise variance. A remarkable feature of this result is that it is valid even for unbounded regression functions and the choice of the temperature parameter depends exclusively on the noise level. Next, we apply this general bound to the problem of aggregating the elements of a finite-dimensional linear space spanned by a dictionary of functions

\phi_1,...,\phi_M

. We allow

M

to be much larger than the sample size

n

but we assume that the true regression function can be well approximated by a sparse linear combination of functions

\phi_j

. Under this sparsity scenario, we propose an EWA with a heavy tailed prior and we show that it satisfies a sparsity oracle inequality with leading constant one. Finally, we propose several Langevin Monte-Carlo algorithms to approximately compute such an EWA when the number

M

of aggregated functions can be large. We discuss in some detail the convergence of these algorithms and present numerical experiments that confirm our theoretical findings.Comment: Short version published in COLT 200

arXiv.org e-Print Archive

CiteSeerX

Elsevier - Publisher Connector

Crossref

Hal-Diderot

HAL-Ecole des Ponts ParisTech

HAL-Polytechnique

A reduced-rank approach to predicting multiple binary responses through machine learning

Author: Mai The Tien
Publication venue
Publication date: 09/06/2023
Field of study

This paper investigates the problem of simultaneously predicting multiple binary responses by utilizing a shared set of covariates. Our approach incorporates machine learning techniques for binary classification, without making assumptions about the underlying observations. Instead, our focus lies on a group of predictors, aiming to identify the one that minimizes prediction error. Unlike previous studies that primarily address estimation error, we directly analyze the prediction error of our method using PAC-Bayesian bounds techniques. In this paper, we introduce a pseudo-Bayesian approach capable of handling incomplete response data. Our strategy is efficiently implemented using the Langevin Monte Carlo method. Through simulation studies and a practical application using real data, we demonstrate the effectiveness of our proposed method, producing comparable or sometimes superior results compared to the current state-of-the-art method

arXiv.org e-Print Archive

Fast rates in learning with dependent observations

Author: Alquier Pierre
Wintenberger Olivier
Publication venue
Publication date: 01/01/2012
Field of study

In this paper we tackle the problem of fast rates in time series forecasting from a statistical learning perspective. In a serie of papers (e.g. Meir 2000, Modha and Masry 1998, Alquier and Wintenberger 2012) it is shown that the main tools used in learning theory with iid observations can be extended to the prediction of time series. The main message of these papers is that, given a family of predictors, we are able to build a new predictor that predicts the series as well as the best predictor in the family, up to a remainder of order

1/\sqrt{n}

. It is known that this rate cannot be improved in general. In this paper, we show that in the particular case of the least square loss, and under a strong assumption on the time series (phi-mixing) the remainder is actually of order

1/n

. Thus, the optimal rate for iid variables, see e.g. Tsybakov 2003, and individual sequences, see \cite{lugosi} is, for the first time, achieved for uniformly mixing processes. We also show that our method is optimal for aggregating sparse linear combinations of predictors

arXiv.org e-Print Archive

Base de publications de l'université Paris-Dauphine

Hal-Diderot

HAL-Polytechnique

Noisy Monte Carlo: Convergence of Markov chains with approximate transition kernels

Author: A Caimo
A Dalalyan
A. Boland
AY Mitrophanov
C Andrieu
G Golub
G Robins
GO Roberts
GO Roberts
GO Roberts
H Robbins
J Møller
J Propp
J-M Marin
JE Besag
L Bottou
L Valiant
M Girolami
MA Beaumont
N Friel
N Friel
N Friel
N. Friel
NV Kartashov
P Bühlmann
P. Alquier
R Reeves
R Tibshirani
R. Everitt
S Geman
S Meyn
W Gilks
Publication venue
Publication date: 15/04/2014
Field of study

Monte Carlo algorithms often aim to draw from a distribution

\pi

by simulating a Markov chain with transition kernel

P

such that

\pi

is invariant under

P

. However, there are many situations for which it is impractical or impossible to draw from the transition kernel

P

. For instance, this is the case with massive datasets, where is it prohibitively expensive to calculate the likelihood and is also the case for intractable likelihood models arising from, for example, Gibbs random fields, such as those found in spatial statistics and network analysis. A natural approach in these cases is to replace

P

by an approximation

\hat{P}

. Using theory from the stability of Markov chains we explore a variety of situations where it is possible to quantify how 'close' the chain given by the transition kernel

\hat{P}

is to the chain given by

P

. We apply these results to several examples from spatial statistics and network analysis.Comment: This version: results extended to non-uniformly ergodic Markov chain

arXiv.org e-Print Archive

Central Archive at the University of Reading

Crossref

Research Repository UCD

Irish Universities

Warwick Research Archives Portal Repository

Pac-bayesian bounds for sparse regression estimation with exponential weights

Author: Alquier Pierre
Lounici Karim
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 30/08/2010
Field of study

We consider the sparse regression model where the number of parameters

p

is larger than the sample size

n

. The difficulty when considering high-dimensional problems is to propose estimators achieving a good compromise between statistical and computational performances. The BIC estimator for instance performs well from the statistical point of view \cite{BTW07} but can only be computed for values of

p

of at most a few tens. The Lasso estimator is solution of a convex minimization problem, hence computable for large value of

p

. However stringent conditions on the design are required to establish fast rates of convergence for this estimator. Dalalyan and Tsybakov \cite{arnak} propose a method achieving a good compromise between the statistical and computational aspects of the problem. Their estimator can be computed for reasonably large

p

and satisfies nice statistical properties under weak assumptions on the design. However, \cite{arnak} proposes sparsity oracle inequalities in expectation for the empirical excess risk only. In this paper, we propose an aggregation procedure similar to that of \cite{arnak} but with improved statistical performances. Our main theoretical result is a sparsity oracle inequality in probability for the true excess risk for a version of exponential weight estimator. We also propose a MCMC method to compute our estimator for reasonably large values of

p

.Comment: 19 page

arXiv.org e-Print Archive

Hal-Diderot

HAL-Polytechnique

PAC-Bayesian High Dimensional Bipartite Ranking

Author: Guedj Benjamin
Robbiano Sylvain
Publication venue: 'Elsevier BV'
Publication date: 01/01/2018
Field of study

This paper is devoted to the bipartite ranking problem, a classical statistical learning task, in a high dimensional setting. We propose a scoring and ranking strategy based on the PAC-Bayesian approach. We consider nonlinear additive scoring functions, and we derive non-asymptotic risk bounds under a sparsity assumption. In particular, oracle inequalities in probability holding under a margin condition assess the performance of our procedure, and prove its minimax optimality. An MCMC-flavored algorithm is proposed to implement our method, along with its behavior on synthetic and real-life datasets

arXiv.org e-Print Archive

Crossref

INRIA a CCSD electronic archive server

UCL Discovery

Optimal learning with $Q$ -aggregation

Author: Lecué Guillaume
Rigollet Philippe
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/02/2014
Field of study

We consider a general supervised learning problem with strongly convex and Lipschitz loss and study the problem of model selection aggregation. In particular, given a finite dictionary functions (learners) together with the prior, we generalize the results obtained by Dai, Rigollet and Zhang [Ann. Statist. 40 (2012) 1878-1905] for Gaussian regression with squared loss and fixed design to this learning setup. Specifically, we prove that the

Q

-aggregation procedure outputs an estimator that satisfies optimal oracle inequalities both in expectation and with high probability. Our proof techniques somewhat depart from traditional proofs by making most of the standard arguments on the Laplace transform of the empirical process to be controlled.Comment: Published in at http://dx.doi.org/10.1214/13-AOS1190 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

Princeton University Open Access Repository

Crossref