988 research outputs found

    Convergence rates of Kernel Conjugate Gradient for random design regression

    Full text link
    We prove statistical rates of convergence for kernel-based least squares regression from i.i.d. data using a conjugate gradient algorithm, where regularization against overfitting is obtained by early stopping. This method is related to Kernel Partial Least Squares, a regression method that combines supervised dimensionality reduction with least squares projection. Following the setting introduced in earlier related literature, we study so-called "fast convergence rates" depending on the regularity of the target regression function (measured by a source condition in terms of the kernel integral operator) and on the effective dimensionality of the data mapped into the kernel space. We obtain upper bounds, essentially matching known minimax lower bounds, for the L2\mathcal{L}^2 (prediction) norm as well as for the stronger Hilbert norm, if the true regression function belongs to the reproducing kernel Hilbert space. If the latter assumption is not fulfilled, we obtain similar convergence rates for appropriate norms, provided additional unlabeled data are available

    Two simple sufficient conditions for FDR control

    Get PDF
    We show that the control of the false discovery rate (FDR) for a multiple testing procedure is implied by two coupled simple sufficient conditions. The first one, which we call ``self-consistency condition'', concerns the algorithm itself, and the second, called ``dependency control condition'' is related to the dependency assumptions on the pp-value family. Many standard multiple testing procedures are self-consistent (e.g. step-up, step-down or step-up-down procedures), and we prove that the dependency control condition can be fulfilled when choosing correspondingly appropriate rejection functions, in three classical types of dependency: independence, positive dependency (PRDS) and unspecified dependency. As a consequence, we recover earlier results through simple and unifying proofs while extending their scope to several regards: weighted FDR, pp-value reweighting, new family of step-up procedures under unspecified pp-value dependency and adaptive step-up procedures. We give additional examples of other possible applications. This framework also allows for defining and studying FDR control for multiple testing procedures over a continuous, uncountable space of hypotheses.Comment: Published in at http://dx.doi.org/10.1214/08-EJS180 the Electronic Journal of Statistics (http://www.i-journals.org/ejs/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Discussion of ``2004 IMS Medallion Lecture: Local Rademacher complexities and oracle inequalities in risk minimization'' by V. Koltchinskii

    Full text link
    Discussion of ``2004 IMS Medallion Lecture: Local Rademacher complexities and oracle inequalities in risk minimization'' by V. Koltchinskii [arXiv:0708.0083]Comment: Published at http://dx.doi.org/10.1214/009053606000001037 in the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Hierarchical testing designs for pattern recognition

    Full text link
    We explore the theoretical foundations of a ``twenty questions'' approach to pattern recognition. The object of the analysis is the computational process itself rather than probability distributions (Bayesian inference) or decision boundaries (statistical learning). Our formulation is motivated by applications to scene interpretation in which there are a great many possible explanations for the data, one (``background'') is statistically dominant, and it is imperative to restrict intensive computation to genuinely ambiguous regions. The focus here is then on pattern filtering: Given a large set Y of possible patterns or explanations, narrow down the true one Y to a small (random) subset \hat Y\subsetY of ``detected'' patterns to be subjected to further, more intense, processing. To this end, we consider a family of hypothesis tests for Y\in A versus the nonspecific alternatives Y\in A^c. Each test has null type I error and the candidate sets A\subsetY are arranged in a hierarchy of nested partitions. These tests are then characterized by scope (|A|), power (or type II error) and algorithmic cost. We consider sequential testing strategies in which decisions are made iteratively, based on past outcomes, about which test to perform next and when to stop testing. The set \hat Y is then taken to be the set of patterns that have not been ruled out by the tests performed. The total cost of a strategy is the sum of the ``testing cost'' and the ``postprocessing cost'' (proportional to |\hat Y|) and the corresponding optimization problem is analyzed.Comment: Published at http://dx.doi.org/10.1214/009053605000000174 in the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Concentration of weakly dependent Banach-valued sums and applications to statistical learning methods

    Full text link
    We obtain a Bernstein-type inequality for sums of Banach-valued random variables satisfying a weak dependence assumption of general type and under certain smoothness assumptions of the underlying Banach norm. We use this inequality in order to investigate in the asymptotical regime the error upper bounds for the broad family of spectral regularization methods for reproducing kernel decision rules, when trained on a sample coming from a τ−\tau-mixing process.Comment: 39 page

    Occam's hammer: a link between randomized learning and multiple testing FDR control

    Full text link
    We establish a generic theoretical tool to construct probabilistic bounds for algorithms where the output is a subset of objects from an initial pool of candidates (or more generally, a probability distribution on said pool). This general device, dubbed "Occam's hammer'', acts as a meta layer when a probabilistic bound is already known on the objects of the pool taken individually, and aims at controlling the proportion of the objects in the set output not satisfying their individual bound. In this regard, it can be seen as a non-trivial generalization of the "union bound with a prior'' ("Occam's razor''), a familiar tool in learning theory. We give applications of this principle to randomized classifiers (providing an interesting alternative approach to PAC-Bayes bounds) and multiple testing (where it allows to retrieve exactly and extend the so-called Benjamini-Yekutieli testing procedure).Comment: 13 pages -- conference communication type forma
    • …
    corecore