988 research outputs found
Convergence rates of Kernel Conjugate Gradient for random design regression
We prove statistical rates of convergence for kernel-based least squares
regression from i.i.d. data using a conjugate gradient algorithm, where
regularization against overfitting is obtained by early stopping. This method
is related to Kernel Partial Least Squares, a regression method that combines
supervised dimensionality reduction with least squares projection. Following
the setting introduced in earlier related literature, we study so-called "fast
convergence rates" depending on the regularity of the target regression
function (measured by a source condition in terms of the kernel integral
operator) and on the effective dimensionality of the data mapped into the
kernel space. We obtain upper bounds, essentially matching known minimax lower
bounds, for the (prediction) norm as well as for the stronger
Hilbert norm, if the true regression function belongs to the reproducing kernel
Hilbert space. If the latter assumption is not fulfilled, we obtain similar
convergence rates for appropriate norms, provided additional unlabeled data are
available
Two simple sufficient conditions for FDR control
We show that the control of the false discovery rate (FDR) for a multiple
testing procedure is implied by two coupled simple sufficient conditions. The
first one, which we call ``self-consistency condition'', concerns the algorithm
itself, and the second, called ``dependency control condition'' is related to
the dependency assumptions on the -value family. Many standard multiple
testing procedures are self-consistent (e.g. step-up, step-down or step-up-down
procedures), and we prove that the dependency control condition can be
fulfilled when choosing correspondingly appropriate rejection functions, in
three classical types of dependency: independence, positive dependency (PRDS)
and unspecified dependency. As a consequence, we recover earlier results
through simple and unifying proofs while extending their scope to several
regards: weighted FDR, -value reweighting, new family of step-up procedures
under unspecified -value dependency and adaptive step-up procedures. We give
additional examples of other possible applications. This framework also allows
for defining and studying FDR control for multiple testing procedures over a
continuous, uncountable space of hypotheses.Comment: Published in at http://dx.doi.org/10.1214/08-EJS180 the Electronic
Journal of Statistics (http://www.i-journals.org/ejs/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Discussion of ``2004 IMS Medallion Lecture: Local Rademacher complexities and oracle inequalities in risk minimization'' by V. Koltchinskii
Discussion of ``2004 IMS Medallion Lecture: Local Rademacher complexities and
oracle inequalities in risk minimization'' by V. Koltchinskii [arXiv:0708.0083]Comment: Published at http://dx.doi.org/10.1214/009053606000001037 in the
Annals of Statistics (http://www.imstat.org/aos/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Hierarchical testing designs for pattern recognition
We explore the theoretical foundations of a ``twenty questions'' approach to
pattern recognition. The object of the analysis is the computational process
itself rather than probability distributions (Bayesian inference) or decision
boundaries (statistical learning). Our formulation is motivated by applications
to scene interpretation in which there are a great many possible explanations
for the data, one (``background'') is statistically dominant, and it is
imperative to restrict intensive computation to genuinely ambiguous regions.
The focus here is then on pattern filtering: Given a large set Y of possible
patterns or explanations, narrow down the true one Y to a small (random) subset
\hat Y\subsetY of ``detected'' patterns to be subjected to further, more
intense, processing. To this end, we consider a family of hypothesis tests for
Y\in A versus the nonspecific alternatives Y\in A^c. Each test has null type I
error and the candidate sets A\subsetY are arranged in a hierarchy of nested
partitions. These tests are then characterized by scope (|A|), power (or type
II error) and algorithmic cost. We consider sequential testing strategies in
which decisions are made iteratively, based on past outcomes, about which test
to perform next and when to stop testing. The set \hat Y is then taken to be
the set of patterns that have not been ruled out by the tests performed. The
total cost of a strategy is the sum of the ``testing cost'' and the
``postprocessing cost'' (proportional to |\hat Y|) and the corresponding
optimization problem is analyzed.Comment: Published at http://dx.doi.org/10.1214/009053605000000174 in the
Annals of Statistics (http://www.imstat.org/aos/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Concentration of weakly dependent Banach-valued sums and applications to statistical learning methods
We obtain a Bernstein-type inequality for sums of Banach-valued random
variables satisfying a weak dependence assumption of general type and under
certain smoothness assumptions of the underlying Banach norm. We use this
inequality in order to investigate in the asymptotical regime the error upper
bounds for the broad family of spectral regularization methods for reproducing
kernel decision rules, when trained on a sample coming from a mixing
process.Comment: 39 page
Occam's hammer: a link between randomized learning and multiple testing FDR control
We establish a generic theoretical tool to construct probabilistic bounds for
algorithms where the output is a subset of objects from an initial pool of
candidates (or more generally, a probability distribution on said pool). This
general device, dubbed "Occam's hammer'', acts as a meta layer when a
probabilistic bound is already known on the objects of the pool taken
individually, and aims at controlling the proportion of the objects in the set
output not satisfying their individual bound. In this regard, it can be seen as
a non-trivial generalization of the "union bound with a prior'' ("Occam's
razor''), a familiar tool in learning theory. We give applications of this
principle to randomized classifiers (providing an interesting alternative
approach to PAC-Bayes bounds) and multiple testing (where it allows to retrieve
exactly and extend the so-called Benjamini-Yekutieli testing procedure).Comment: 13 pages -- conference communication type forma
- …