78 research outputs found

    Learning from compressed observations

    Full text link
    The problem of statistical learning is to construct a predictor of a random variable YY as a function of a related random variable XX on the basis of an i.i.d. training sample from the joint distribution of (X,Y)(X,Y). Allowable predictors are drawn from some specified class, and the goal is to approach asymptotically the performance (expected loss) of the best predictor in the class. We consider the setting in which one has perfect observation of the XX-part of the sample, while the YY-part has to be communicated at some finite bit rate. The encoding of the YY-values is allowed to depend on the XX-values. Under suitable regularity conditions on the admissible predictors, the underlying family of probability distributions and the loss function, we give an information-theoretic characterization of achievable predictor performance in terms of conditional distortion-rate functions. The ideas are illustrated on the example of nonparametric regression in Gaussian noise.Comment: 6 pages; submitted to the 2007 IEEE Information Theory Workshop (ITW 2007

    Complexity regularization via localized random penalties

    Full text link
    In this article, model selection via penalized empirical loss minimization in nonparametric classification problems is studied. Data-dependent penalties are constructed, which are based on estimates of the complexity of a small subclass of each model class, containing only those functions with small empirical loss. The penalties are novel since those considered in the literature are typically based on the entire model class. Oracle inequalities using these penalties are established, and the advantage of the new penalties over those based on the complexity of the whole model class is demonstrated.Comment: Published by the Institute of Mathematical Statistics (http://www.imstat.org) in the Annals of Statistics (http://www.imstat.org/aos/) at http://dx.doi.org/10.1214/00905360400000046

    Estimates of the Approximation Error Using Rademacher Complexity: Learning Vector-Valued Functions

    Get PDF
    For certain families of multivariable vector-valued functions to be approximated, the accuracy of approximation schemes made up of linear combinations of computational units containing adjustable parameters is investigated. Upper bounds on the approximation error are derived that depend on the Rademacher complexities of the families. The estimates exploit possible relationships among the components of the multivariable vector-valued functions. All such components are approximated simultaneously in such a way to use, for a desired approximation accuracy, less computational units than those required by componentwise approximation. An application to -stage optimization problems is discussed

    Integer cells in convex sets

    Get PDF
    Every convex body K in R^n has a coordinate projection PK that contains at least vol(0.1 K) cells of the integer lattice PZ^n, provided this volume is at least one. Our proof of this counterpart of Minkowski's theorem is based on an extension of the combinatorial density theorem of Sauer, Shelah and Vapnik-Chervonenkis to Z^n. This leads to a new approach to sections of convex bodies. In particular, fundamental results of the asymptotic convex geometry such as the Volume Ratio Theorem and Milman's duality of the diameters admit natural versions for coordinate sections.Comment: Historical remarks on the notion of the combinatorial dimension are added. This is a published version in Advances in Mathematic

    Dimension reduction by random hyperplane tessellations

    Full text link
    Given a subset K of the unit Euclidean sphere, we estimate the minimal number m = m(K) of hyperplanes that generate a uniform tessellation of K, in the sense that the fraction of the hyperplanes separating any pair x, y in K is nearly proportional to the Euclidean distance between x and y. Random hyperplanes prove to be almost ideal for this problem; they achieve the almost optimal bound m = O(w(K)^2) where w(K) is the Gaussian mean width of K. Using the map that sends x in K to the sign vector with respect to the hyperplanes, we conclude that every bounded subset K of R^n embeds into the Hamming cube {-1, 1}^m with a small distortion in the Gromov-Haussdorf metric. Since for many sets K one has m = m(K) << n, this yields a new discrete mechanism of dimension reduction for sets in Euclidean spaces.Comment: 17 pages, 3 figures, minor update
    corecore