198 research outputs found

    Optimal Rates of Statistical Seriation

    Full text link
    Given a matrix the seriation problem consists in permuting its rows in such way that all its columns have the same shape, for example, they are monotone increasing. We propose a statistical approach to this problem where the matrix of interest is observed with noise and study the corresponding minimax rate of estimation of the matrices. Specifically, when the columns are either unimodal or monotone, we show that the least squares estimator is optimal up to logarithmic factors and adapts to matrices with a certain natural structure. Finally, we propose a computationally efficient estimator in the monotonic case and study its performance both theoretically and experimentally. Our work is at the intersection of shape constrained estimation and recent work that involves permutation learning, such as graph denoising and ranking.Comment: V2 corrects an error in Lemma A.1, v3 corrects appendix F on unimodal regression where the bounds now hold with polynomial probability rather than exponentia

    General nonexact oracle inequalities for classes with a subexponential envelope

    Full text link
    We show that empirical risk minimization procedures and regularized empirical risk minimization procedures satisfy nonexact oracle inequalities in an unbounded framework, under the assumption that the class has a subexponential envelope function. The main novelty, in addition to the boundedness assumption free setup, is that those inequalities can yield fast rates even in situations in which exact oracle inequalities only hold with slower rates. We apply these results to show that procedures based on ℓ1\ell_1 and nuclear norms regularization functions satisfy oracle inequalities with a residual term that decreases like 1/n1/n for every LqL_q-loss functions (q≥2q\geq2), while only assuming that the tail behavior of the input and output variables are well behaved. In particular, no RIP type of assumption or "incoherence condition" are needed to obtain fast residual terms in those setups. We also apply these results to the problems of convex aggregation and model selection.Comment: Published in at http://dx.doi.org/10.1214/11-AOS965 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Sparse Regression Learning by Aggregation and Langevin Monte-Carlo

    Get PDF
    We consider the problem of regression learning for deterministic design and independent random errors. We start by proving a sharp PAC-Bayesian type bound for the exponentially weighted aggregate (EWA) under the expected squared empirical loss. For a broad class of noise distributions the presented bound is valid whenever the temperature parameter β\beta of the EWA is larger than or equal to 4σ24\sigma^2, where σ2\sigma^2 is the noise variance. A remarkable feature of this result is that it is valid even for unbounded regression functions and the choice of the temperature parameter depends exclusively on the noise level. Next, we apply this general bound to the problem of aggregating the elements of a finite-dimensional linear space spanned by a dictionary of functions ϕ1,...,ϕM\phi_1,...,\phi_M. We allow MM to be much larger than the sample size nn but we assume that the true regression function can be well approximated by a sparse linear combination of functions ϕj\phi_j. Under this sparsity scenario, we propose an EWA with a heavy tailed prior and we show that it satisfies a sparsity oracle inequality with leading constant one. Finally, we propose several Langevin Monte-Carlo algorithms to approximately compute such an EWA when the number MM of aggregated functions can be large. We discuss in some detail the convergence of these algorithms and present numerical experiments that confirm our theoretical findings.Comment: Short version published in COLT 200

    Model averaging: A shrinkage perspective

    Full text link
    Model averaging (MA), a technique for combining estimators from a set of candidate models, has attracted increasing attention in machine learning and statistics. In the existing literature, there is an implicit understanding that MA can be viewed as a form of shrinkage estimation that draws the response vector towards the subspaces spanned by the candidate models. This paper explores this perspective by establishing connections between MA and shrinkage in a linear regression setting with multiple nested models. We first demonstrate that the optimal MA estimator is the best linear estimator with monotone non-increasing weights in a Gaussian sequence model. The Mallows MA, which estimates weights by minimizing the Mallows' CpC_p, is a variation of the positive-part Stein estimator. Motivated by these connections, we develop a novel MA procedure based on a blockwise Stein estimation. Our resulting Stein-type MA estimator is asymptotically optimal across a broad parameter space when the variance is known. Numerical results support our theoretical findings. The connections established in this paper may open up new avenues for investigating MA from different perspectives. A discussion on some topics for future research concludes the paper
    • …
    corecore