3,641 research outputs found

    Dimensionality-dependent generalization bounds for k-dimensional coding schemes

    Full text link
    © 2016 Massachusetts Institute of Technology. The k-dimensional coding schemes refer to a collection of methods that attempt to represent data using a set of representative k-dimensional vectors and include nonnegative matrix factorization, dictionary learning, sparse coding, k-means clustering, and vector quantization as special cases. Previous generalization bounds for the reconstruction error of the k-dimensional coding schemes are mainly dimensionality-independent. A major advantage of these bounds is that they can be used to analyze the generalization error when data are mapped into an infinite- or high-dimensional feature space. However, many applications use finite-dimensional data features. Can we obtain dimensionality-dependent generalization bounds for k-dimensional coding schemes that are tighter than dimensionality-independent bounds when data are in a finite-dimensional feature space? Yes. In this letter, we address this problem and derive a dimensionality-dependent generalization bound for k-dimensional coding schemes by bounding the covering number of the loss function class induced by the reconstruction error. The bound is of order O((mk ln(mkn)/n)λn, where m is the dimension of features, k is the number of the columns in the linear implementation of coding schemes, and n is the size of sample, λn > 0.5 when n is finite and λn = 0.5 when n is infinite. We show that our bound can be tighter than previous results because it avoids inducing the worst-case upper bound on k of the loss function. The proposed generalization bound is also applied to some specific coding schemes to demonstrate that the dimensionality-dependent bound is an indispensable complement to the dimensionality-independent generalization bounds

    The complexity of algorithmic hypothesis class

    Full text link
    University of Technology Sydney. Faculty of Engineering and Information Technology.Statistical learning theory provides the mathematical and theoretical foundations for statistical learning algorithms and inspires the development of more efficient methods. It is observed that learning algorithms may not output some hypotheses in the predefined hypothesis class. Therefore, in this thesis, we focus on statistical learning theory and study how to measure the complexity of the algorithmic hypothesis class, which is a subset of the predefined hypothesis class that a learning algorithm will (or is likely to) output. By designing complexity measures for the algorithmic hypothesis class, we provide new generalization bounds for k-dimensional coding schemes and multi-task learning and propose two frameworks to derive tighter generalization bounds than the current state-of-the-art. We take k-dimensional coding schemes, a set of unsupervised learning algorithms, and multi-task learning, a set of supervised learning algorithms, as examples to demonstrate that learning algorithm outputs may have special properties and are therefore included in a subset of the predefined hypothesis class. By analyzing the subsets (or the algorithmic hypothesis classes), we shed new light on learning problems and derive tighter generalization bounds than the current state-of-the-art. Specifically, for k-dimensional coding schemes, we show that the induced algorithmic loss function classes are sets of Lipschitz-continuous hypotheses and that a dimensionality-dependent complexity measure helps to derive small Lipschitz constants and thus improve the generalization bounds. For multi-task learning, we prove that tasks can act as regularizer and that feature structures can contribute to a small algorithmic hypothesis class and also help to improve the generalization bounds. To more precisely exploit algorithmic hypothesis class complexity by considering the hypothesis and feature structure properties, we extend algorithmic robustness and stability to complexity measures for the hypothesis class. Inspired by the idea of algorithmic robustness, we propose the complexity measure of uniform robustness. Compared to the Rademacher complexity, our measure more finely considers the geometric information of data. For example, when the sample space is covered by a small number of small radius and widely separated balls, the uniform robustness can be very small while the Rademacher complexity can be very large. Moreover, based on the definition of uniform robustness, we also provide a framework to derive generalization bounds for a very general class of learning algorithms. We exploit the algorithmic hypothesis class of stable algorithms by studying the definition of algorithmic stability. Stable learning algorithms have the property that their outputs will not change much when one training example is changed. This implies that their outputs will not be sufficiently far apart, even though the training sample is completely altered. Thus, stable learning algorithms often have small algorithmic hypothesis classes. However, since measuring the complexity of the small algorithmic hypothesis class is unknown, we design a novel complexity measure called the algorithmic Rademacher complexity to measure the algorithmic hypothesis class of stable learning algorithms and provide sharper error bounds than the current state-of-the-art

    PAC-Bayes Compression Bounds So Tight That They Can Explain Generalization

    Full text link
    While there has been progress in developing non-vacuous generalization bounds for deep neural networks, these bounds tend to be uninformative about why deep learning works. In this paper, we develop a compression approach based on quantizing neural network parameters in a linear subspace, profoundly improving on previous results to provide state-of-the-art generalization bounds on a variety of tasks, including transfer learning. We use these tight bounds to better understand the role of model size, equivariance, and the implicit biases of optimization, for generalization in deep learning. Notably, we find large models can be compressed to a much greater extent than previously known, encapsulating Occam's razor. We also argue for data-independent bounds in explaining generalization.Comment: NeurIPS 2022. Code is available at https://github.com/activatedgeek/tight-pac-baye
    • …
    corecore