78 research outputs found
Learning from compressed observations
The problem of statistical learning is to construct a predictor of a random
variable as a function of a related random variable on the basis of an
i.i.d. training sample from the joint distribution of . Allowable
predictors are drawn from some specified class, and the goal is to approach
asymptotically the performance (expected loss) of the best predictor in the
class. We consider the setting in which one has perfect observation of the
-part of the sample, while the -part has to be communicated at some
finite bit rate. The encoding of the -values is allowed to depend on the
-values. Under suitable regularity conditions on the admissible predictors,
the underlying family of probability distributions and the loss function, we
give an information-theoretic characterization of achievable predictor
performance in terms of conditional distortion-rate functions. The ideas are
illustrated on the example of nonparametric regression in Gaussian noise.Comment: 6 pages; submitted to the 2007 IEEE Information Theory Workshop (ITW
2007
Complexity regularization via localized random penalties
In this article, model selection via penalized empirical loss minimization in
nonparametric classification problems is studied. Data-dependent penalties are
constructed, which are based on estimates of the complexity of a small subclass
of each model class, containing only those functions with small empirical loss.
The penalties are novel since those considered in the literature are typically
based on the entire model class. Oracle inequalities using these penalties are
established, and the advantage of the new penalties over those based on the
complexity of the whole model class is demonstrated.Comment: Published by the Institute of Mathematical Statistics
(http://www.imstat.org) in the Annals of Statistics
(http://www.imstat.org/aos/) at http://dx.doi.org/10.1214/00905360400000046
Estimates of the Approximation Error Using Rademacher Complexity: Learning Vector-Valued Functions
For certain families of multivariable vector-valued functions to be approximated, the accuracy of approximation schemes made up of linear combinations of computational units containing adjustable parameters is investigated. Upper bounds on the approximation error are derived that depend on the Rademacher complexities of the families. The estimates exploit possible relationships among the components of the multivariable vector-valued functions. All such components are approximated simultaneously in such a way to use, for a desired approximation accuracy, less computational units than those required by componentwise approximation. An application to -stage optimization problems is discussed
Integer cells in convex sets
Every convex body K in R^n has a coordinate projection PK that contains at
least vol(0.1 K) cells of the integer lattice PZ^n, provided this volume is at
least one. Our proof of this counterpart of Minkowski's theorem is based on an
extension of the combinatorial density theorem of Sauer, Shelah and
Vapnik-Chervonenkis to Z^n. This leads to a new approach to sections of convex
bodies. In particular, fundamental results of the asymptotic convex geometry
such as the Volume Ratio Theorem and Milman's duality of the diameters admit
natural versions for coordinate sections.Comment: Historical remarks on the notion of the combinatorial dimension are
added. This is a published version in Advances in Mathematic
Dimension reduction by random hyperplane tessellations
Given a subset K of the unit Euclidean sphere, we estimate the minimal number
m = m(K) of hyperplanes that generate a uniform tessellation of K, in the sense
that the fraction of the hyperplanes separating any pair x, y in K is nearly
proportional to the Euclidean distance between x and y. Random hyperplanes
prove to be almost ideal for this problem; they achieve the almost optimal
bound m = O(w(K)^2) where w(K) is the Gaussian mean width of K. Using the map
that sends x in K to the sign vector with respect to the hyperplanes, we
conclude that every bounded subset K of R^n embeds into the Hamming cube {-1,
1}^m with a small distortion in the Gromov-Haussdorf metric. Since for many
sets K one has m = m(K) << n, this yields a new discrete mechanism of dimension
reduction for sets in Euclidean spaces.Comment: 17 pages, 3 figures, minor update
- …