4,413 research outputs found

    Optimal Phase Transitions in Compressed Sensing

    Full text link
    Compressed sensing deals with efficient recovery of analog signals from linear encodings. This paper presents a statistical study of compressed sensing by modeling the input signal as an i.i.d. process with known distribution. Three classes of encoders are considered, namely optimal nonlinear, optimal linear and random linear encoders. Focusing on optimal decoders, we investigate the fundamental tradeoff between measurement rate and reconstruction fidelity gauged by error probability and noise sensitivity in the absence and presence of measurement noise, respectively. The optimal phase transition threshold is determined as a functional of the input distribution and compared to suboptimal thresholds achieved by popular reconstruction algorithms. In particular, we show that Gaussian sensing matrices incur no penalty on the phase transition threshold with respect to optimal nonlinear encoding. Our results also provide a rigorous justification of previous results based on replica heuristics in the weak-noise regime.Comment: to appear in IEEE Transactions of Information Theor

    Density of Spherically-Embedded Stiefel and Grassmann Codes

    Full text link
    The density of a code is the fraction of the coding space covered by packing balls centered around the codewords. This paper investigates the density of codes in the complex Stiefel and Grassmann manifolds equipped with the chordal distance. The choice of distance enables the treatment of the manifolds as subspaces of Euclidean hyperspheres. In this geometry, the densest packings are not necessarily equivalent to maximum-minimum-distance codes. Computing a code's density follows from computing: i) the normalized volume of a metric ball and ii) the kissing radius, the radius of the largest balls one can pack around the codewords without overlapping. First, the normalized volume of a metric ball is evaluated by asymptotic approximations. The volume of a small ball can be well-approximated by the volume of a locally-equivalent tangential ball. In order to properly normalize this approximation, the precise volumes of the manifolds induced by their spherical embedding are computed. For larger balls, a hyperspherical cap approximation is used, which is justified by a volume comparison theorem showing that the normalized volume of a ball in the Stiefel or Grassmann manifold is asymptotically equal to the normalized volume of a ball in its embedding sphere as the dimension grows to infinity. Then, bounds on the kissing radius are derived alongside corresponding bounds on the density. Unlike spherical codes or codes in flat spaces, the kissing radius of Grassmann or Stiefel codes cannot be exactly determined from its minimum distance. It is nonetheless possible to derive bounds on density as functions of the minimum distance. Stiefel and Grassmann codes have larger density than their image spherical codes when dimensions tend to infinity. Finally, the bounds on density lead to refinements of the standard Hamming bounds for Stiefel and Grassmann codes.Comment: Two-column version (24 pages, 6 figures, 4 tables). To appear in IEEE Transactions on Information Theor

    MDL, Penalized Likelihood, and Statistical Risk

    Get PDF
    Abstract-We determine, for both countable and uncountable collections of functions, information-theoretic conditions on a penalty pen(f ) such that the optimizerf of the penalized log likelihood criterion log 1/likelihood(f )+pen(f ) has risk not more than the index of resolvability corresponding to the accuracy of the optimizer of the expected value of the criterion. If F is the linear span of a dictionary of functions, traditional descriptionlength penalties are based on the number of non-zero terms (the 0 norm of the coefficients). We specialize our general conclusions to show the 1 norm of the coefficients times a suitable multiplier λ is also an information-theoretically valid penalty

    Learning multi-modal generative models with permutation-invariant encoders and tighter variational bounds

    Full text link
    Devising deep latent variable models for multi-modal data has been a long-standing theme in machine learning research. Multi-modal Variational Autoencoders (VAEs) have been a popular generative model class that learns latent representations which jointly explain multiple modalities. Various objective functions for such models have been suggested, often motivated as lower bounds on the multi-modal data log-likelihood or from information-theoretic considerations. In order to encode latent variables from different modality subsets, Product-of-Experts (PoE) or Mixture-of-Experts (MoE) aggregation schemes have been routinely used and shown to yield different trade-offs, for instance, regarding their generative quality or consistency across multiple modalities. In this work, we consider a variational bound that can tightly lower bound the data log-likelihood. We develop more flexible aggregation schemes that generalise PoE or MoE approaches by combining encoded features from different modalities based on permutation-invariant neural networks. Our numerical experiments illustrate trade-offs for multi-modal variational bounds and various aggregation schemes. We show that tighter variational bounds and more flexible aggregation models can become beneficial when one wants to approximate the true joint distribution over observed modalities and latent variables in identifiable models

    Small Transformers Compute Universal Metric Embeddings

    Full text link
    We study representations of data from an arbitrary metric space X\mathcal{X} in the space of univariate Gaussian mixtures with a transport metric (Delon and Desolneux 2020). We derive embedding guarantees for feature maps implemented by small neural networks called \emph{probabilistic transformers}. Our guarantees are of memorization type: we prove that a probabilistic transformer of depth about nlog(n)n\log(n) and width about n2n^2 can bi-H\"{o}lder embed any nn-point dataset from X\mathcal{X} with low metric distortion, thus avoiding the curse of dimensionality. We further derive probabilistic bi-Lipschitz guarantees, which trade off the amount of distortion and the probability that a randomly chosen pair of points embeds with that distortion. If X\mathcal{X}'s geometry is sufficiently regular, we obtain stronger, bi-Lipschitz guarantees for all points in the dataset. As applications, we derive neural embedding guarantees for datasets from Riemannian manifolds, metric trees, and certain types of combinatorial graphs. When instead embedding into multivariate Gaussian mixtures, we show that probabilistic transformers can compute bi-H\"{o}lder embeddings with arbitrarily small distortion.Comment: 42 pages, 10 Figures, 3 Table

    Information Theory and Machine Learning

    Get PDF
    The recent successes of machine learning, especially regarding systems based on deep neural networks, have encouraged further research activities and raised a new set of challenges in understanding and designing complex machine learning algorithms. New applications require learning algorithms to be distributed, have transferable learning results, use computation resources efficiently, convergence quickly on online settings, have performance guarantees, satisfy fairness or privacy constraints, incorporate domain knowledge on model structures, etc. A new wave of developments in statistical learning theory and information theory has set out to address these challenges. This Special Issue, "Machine Learning and Information Theory", aims to collect recent results in this direction reflecting a diverse spectrum of visions and efforts to extend conventional theories and develop analysis tools for these complex machine learning systems

    Coresets-Methods and History: A Theoreticians Design Pattern for Approximation and Streaming Algorithms

    Get PDF
    We present a technical survey on the state of the art approaches in data reduction and the coreset framework. These include geometric decompositions, gradient methods, random sampling, sketching and random projections. We further outline their importance for the design of streaming algorithms and give a brief overview on lower bounding techniques
    corecore