4,413 research outputs found
Optimal Phase Transitions in Compressed Sensing
Compressed sensing deals with efficient recovery of analog signals from
linear encodings. This paper presents a statistical study of compressed sensing
by modeling the input signal as an i.i.d. process with known distribution.
Three classes of encoders are considered, namely optimal nonlinear, optimal
linear and random linear encoders. Focusing on optimal decoders, we investigate
the fundamental tradeoff between measurement rate and reconstruction fidelity
gauged by error probability and noise sensitivity in the absence and presence
of measurement noise, respectively. The optimal phase transition threshold is
determined as a functional of the input distribution and compared to suboptimal
thresholds achieved by popular reconstruction algorithms. In particular, we
show that Gaussian sensing matrices incur no penalty on the phase transition
threshold with respect to optimal nonlinear encoding. Our results also provide
a rigorous justification of previous results based on replica heuristics in the
weak-noise regime.Comment: to appear in IEEE Transactions of Information Theor
Density of Spherically-Embedded Stiefel and Grassmann Codes
The density of a code is the fraction of the coding space covered by packing
balls centered around the codewords. This paper investigates the density of
codes in the complex Stiefel and Grassmann manifolds equipped with the chordal
distance. The choice of distance enables the treatment of the manifolds as
subspaces of Euclidean hyperspheres. In this geometry, the densest packings are
not necessarily equivalent to maximum-minimum-distance codes. Computing a
code's density follows from computing: i) the normalized volume of a metric
ball and ii) the kissing radius, the radius of the largest balls one can pack
around the codewords without overlapping. First, the normalized volume of a
metric ball is evaluated by asymptotic approximations. The volume of a small
ball can be well-approximated by the volume of a locally-equivalent tangential
ball. In order to properly normalize this approximation, the precise volumes of
the manifolds induced by their spherical embedding are computed. For larger
balls, a hyperspherical cap approximation is used, which is justified by a
volume comparison theorem showing that the normalized volume of a ball in the
Stiefel or Grassmann manifold is asymptotically equal to the normalized volume
of a ball in its embedding sphere as the dimension grows to infinity. Then,
bounds on the kissing radius are derived alongside corresponding bounds on the
density. Unlike spherical codes or codes in flat spaces, the kissing radius of
Grassmann or Stiefel codes cannot be exactly determined from its minimum
distance. It is nonetheless possible to derive bounds on density as functions
of the minimum distance. Stiefel and Grassmann codes have larger density than
their image spherical codes when dimensions tend to infinity. Finally, the
bounds on density lead to refinements of the standard Hamming bounds for
Stiefel and Grassmann codes.Comment: Two-column version (24 pages, 6 figures, 4 tables). To appear in IEEE
Transactions on Information Theor
MDL, Penalized Likelihood, and Statistical Risk
Abstract-We determine, for both countable and uncountable collections of functions, information-theoretic conditions on a penalty pen(f ) such that the optimizerf of the penalized log likelihood criterion log 1/likelihood(f )+pen(f ) has risk not more than the index of resolvability corresponding to the accuracy of the optimizer of the expected value of the criterion. If F is the linear span of a dictionary of functions, traditional descriptionlength penalties are based on the number of non-zero terms (the 0 norm of the coefficients). We specialize our general conclusions to show the 1 norm of the coefficients times a suitable multiplier λ is also an information-theoretically valid penalty
Learning multi-modal generative models with permutation-invariant encoders and tighter variational bounds
Devising deep latent variable models for multi-modal data has been a
long-standing theme in machine learning research. Multi-modal Variational
Autoencoders (VAEs) have been a popular generative model class that learns
latent representations which jointly explain multiple modalities. Various
objective functions for such models have been suggested, often motivated as
lower bounds on the multi-modal data log-likelihood or from
information-theoretic considerations. In order to encode latent variables from
different modality subsets, Product-of-Experts (PoE) or Mixture-of-Experts
(MoE) aggregation schemes have been routinely used and shown to yield different
trade-offs, for instance, regarding their generative quality or consistency
across multiple modalities. In this work, we consider a variational bound that
can tightly lower bound the data log-likelihood. We develop more flexible
aggregation schemes that generalise PoE or MoE approaches by combining encoded
features from different modalities based on permutation-invariant neural
networks. Our numerical experiments illustrate trade-offs for multi-modal
variational bounds and various aggregation schemes. We show that tighter
variational bounds and more flexible aggregation models can become beneficial
when one wants to approximate the true joint distribution over observed
modalities and latent variables in identifiable models
Recommended from our members
Interpreting Deep Learning for cell differentiation. Supervised and Unsupervised models viewed through the lens of information and perturbation theory.
"Predicting the future isn't magic, it's artificial intelligence" Dave Waters.
In the last decades there has been an unprecedented growth in the field of machine learning, and particularly within deep learning models. The combination of big data and computational power has nurtured the evolution of a variety of new methods to predict and interpret future scenarios. These data centric models can achieve exceptional performances on specific tasks, with their prediction boundaries continuously expanding towards new and more complex challenges.
However, the model complexity often translates into a lack of interpretability from a scientific c perspective, it is not trivial to identify the factors involved in final outcomes.
Explainability may not always be a requirement for some machine learning tasks, specially when it comes in detriment of performance power. But for some applications, such as biological discoveries or medical diagnostics, understanding the output and determining factors that influence decisions is essential.
In this thesis we develop both a supervised and unsupervised approach to map from genotype to phenotype. We emphasise the importance of interpretability and feature extraction from the models, by identifying relevant genes for cell differentiation. We then continue to explore the rules and mechanisms behind the models from a theoretical perspective. Using information theory to explain the learning process and applying
perturbation theory to transform the results into a generalisable representation.
We start by building a supervised approach to mapping cell profiles from genotype to phenotype, using single cell RNA-Seq data. We leverage non-linearities among gene expressions to identify cellular levels of differentiation. The ambiguity and even absence of labels in most biological studies instigated the development of novel unsupervised techniques, leading to a new general and biologically interpretable framework based on Variational Autoencoders.
The application and validation of the methods has proven to be successful, but questions regarding the learning process and generative nature of the results remained unanswered. I use information theory to define a new approach to interpret training and the converged solutions of our models.
The variational and generative nature of Autoencoders provides a platform to develop general models. Their results should extrapolate and allow generalisation beyond the boundaries of the observed data. To this extent, we introduce for the first time a new interpretation of the embedded generative functions through Perturbation Theory. The embedding multiplicity is addressed by transforming the distributions into a new set of generalisable functions, while characterising their energy spectrum
under a particular energy landscape.
We outline the combination of theoretical and machine learning based methods, for moving towards interpretable and generalisable models. Developing a theoretical framework to map from genotype to phenotype, we provide both supervised and unsupervised tools to operate over single cell RNA-Seq. data. We have generated a pipeline to identify relevant genes and cell types through Variational Autoencoders (VAEs),
validating reconstructed gene expressions to prove the generative performance of the embeddings. The new interpretation of the information learned and extracted by the models de fines a label independent evaluation, particularly useful for unsupervised
learning. Lastly, we introduce a novel transformation of the generative embeddings based on quantum and perturbation theory.
Our contributions can and have been extended to new datasets, according to the nature of the tasks being explored. For instance, the combination of unsupervised learning and information theory can be applied to a variety of biological or medical data. We have trained several VAE models with additional cancer and metabolic data, proving to extract meaningful representations of the data. The perturbation theory transformation of the embedding can also lead to future research on the generative potential of Variational Autoencoders through a physics perspective, combining statistical and quantum mechanics.
We believe that machine learning will only continue its fast expansion and growth through the development of more generalisable more interpretable models.
"Prediction is very difficult, especially if it's about the future" Niels Boh
Small Transformers Compute Universal Metric Embeddings
We study representations of data from an arbitrary metric space
in the space of univariate Gaussian mixtures with a transport metric (Delon and
Desolneux 2020). We derive embedding guarantees for feature maps implemented by
small neural networks called \emph{probabilistic transformers}. Our guarantees
are of memorization type: we prove that a probabilistic transformer of depth
about and width about can bi-H\"{o}lder embed any -point
dataset from with low metric distortion, thus avoiding the curse
of dimensionality. We further derive probabilistic bi-Lipschitz guarantees,
which trade off the amount of distortion and the probability that a randomly
chosen pair of points embeds with that distortion. If 's geometry
is sufficiently regular, we obtain stronger, bi-Lipschitz guarantees for all
points in the dataset. As applications, we derive neural embedding guarantees
for datasets from Riemannian manifolds, metric trees, and certain types of
combinatorial graphs. When instead embedding into multivariate Gaussian
mixtures, we show that probabilistic transformers can compute bi-H\"{o}lder
embeddings with arbitrarily small distortion.Comment: 42 pages, 10 Figures, 3 Table
Information Theory and Machine Learning
The recent successes of machine learning, especially regarding systems based on deep neural networks, have encouraged further research activities and raised a new set of challenges in understanding and designing complex machine learning algorithms. New applications require learning algorithms to be distributed, have transferable learning results, use computation resources efficiently, convergence quickly on online settings, have performance guarantees, satisfy fairness or privacy constraints, incorporate domain knowledge on model structures, etc. A new wave of developments in statistical learning theory and information theory has set out to address these challenges. This Special Issue, "Machine Learning and Information Theory", aims to collect recent results in this direction reflecting a diverse spectrum of visions and efforts to extend conventional theories and develop analysis tools for these complex machine learning systems
Coresets-Methods and History: A Theoreticians Design Pattern for Approximation and Streaming Algorithms
We present a technical survey on the state of the art approaches in data reduction and the coreset framework. These include geometric decompositions, gradient methods, random sampling, sketching and random projections. We further outline their importance for the design of streaming algorithms and give a brief overview on lower bounding techniques
- …