3,207 research outputs found
Optimal Riemannian quantization with an application to air traffic analysis
The goal of optimal quantization is to find the best approximation of a
probability distribution by a discrete measure with finite support. When
dealing with empirical distributions, this boils down to finding the best
summary of the data by a smaller number of points, and automatically yields a
K-means-type clustering. In this paper, we introduce Competitive Learning
Riemannian Quantization (CLRQ), an online algorithm that computes the optimal
summary when the data does not belong to a vector space, but rather a
Riemannian manifold. We prove its convergence and show simulated examples on
the sphere and the hyperbolic plane. We also provide an application to real
data by using CLRQ to create summaries of images of covariance matrices
estimated from air traffic images. These summaries are representative of the
air traffic complexity and yield clusterings of the airspaces into zones that
are homogeneous with respect to that criterion. They can then be compared using
discrete optimal transport and be further used as inputs of a machine learning
algorithm or as indexes in a traffic database
Poincar\'e Wasserstein Autoencoder
This work presents a reformulation of the recently proposed Wasserstein
autoencoder framework on a non-Euclidean manifold, the Poincar\'e ball model of
the hyperbolic space. By assuming the latent space to be hyperbolic, we can use
its intrinsic hierarchy to impose structure on the learned latent space
representations. We demonstrate the model in the visual domain to analyze some
of its properties and show competitive results on a graph link prediction task
A Wrapped Normal Distribution on Hyperbolic Space for Gradient-Based Learning
Hyperbolic space is a geometry that is known to be well-suited for
representation learning of data with an underlying hierarchical structure. In
this paper, we present a novel hyperbolic distribution called
\textit{pseudo-hyperbolic Gaussian}, a Gaussian-like distribution on hyperbolic
space whose density can be evaluated analytically and differentiated with
respect to the parameters. Our distribution enables the gradient-based learning
of the probabilistic models on hyperbolic space that could never have been
considered before. Also, we can sample from this hyperbolic probability
distribution without resorting to auxiliary means like rejection sampling. As
applications of our distribution, we develop a hyperbolic-analog of variational
autoencoder and a method of probabilistic word embedding on hyperbolic space.
We demonstrate the efficacy of our distribution on various datasets including
MNIST, Atari 2600 Breakout, and WordNet.Comment: 20 pages, 12 figure
A VEST of the Pseudoinverse Learning Algorithm
In this paper, we briefly review the basic scheme of the pseudoinverse
learning (PIL) algorithm and present some discussions on the PIL, as well as
its variants. The PIL algorithm, first presented in 1995, is a non-gradient
descent and non-iterative learning algorithm for multi-layer neural networks
and has several advantages compared with gradient descent based algorithms.
Some new viewpoints to PIL algorithm are presented, and several common pitfalls
in practical implementation of the neural network learning task are also
addressed. In addition, we show that so called extreme learning machine is a
Variant crEated by Simple name alTernation (VEST) of the PIL algorithm for
single hidden layer feedforward neural networks.Comment: ELM is another name of the PI
Poincar\'e Embeddings for Learning Hierarchical Representations
Representation learning has become an invaluable approach for learning from
symbolic data such as text and graphs. However, while complex symbolic datasets
often exhibit a latent hierarchical structure, state-of-the-art methods
typically learn embeddings in Euclidean vector spaces, which do not account for
this property. For this purpose, we introduce a new approach for learning
hierarchical representations of symbolic data by embedding them into hyperbolic
space -- or more precisely into an n-dimensional Poincar\'e ball. Due to the
underlying hyperbolic geometry, this allows us to learn parsimonious
representations of symbolic data by simultaneously capturing hierarchy and
similarity. We introduce an efficient algorithm to learn the embeddings based
on Riemannian optimization and show experimentally that Poincar\'e embeddings
outperform Euclidean embeddings significantly on data with latent hierarchies,
both in terms of representation capacity and in terms of generalization
ability
Gaussian Process Neurons Learn Stochastic Activation Functions
We propose stochastic, non-parametric activation functions that are fully
learnable and individual to each neuron. Complexity and the risk of overfitting
are controlled by placing a Gaussian process prior over these functions. The
result is the Gaussian process neuron, a probabilistic unit that can be used as
the basic building block for probabilistic graphical models that resemble the
structure of neural networks. The proposed model can intrinsically handle
uncertainties in its inputs and self-estimate the confidence of its
predictions. Using variational Bayesian inference and the central limit
theorem, a fully deterministic loss function is derived, allowing it to be
trained as efficiently as a conventional neural network using mini-batch
gradient descent. The posterior distribution of activation functions is
inferred from the training data alongside the weights of the network.
The proposed model favorably compares to deep Gaussian processes, both in
model complexity and efficiency of inference. It can be directly applied to
recurrent or convolutional network structures, allowing its use in audio and
image processing tasks.
As an preliminary empirical evaluation we present experiments on regression
and classification tasks, in which our model achieves performance comparable to
or better than a Dropout regularized neural network with a fixed activation
function. Experiments are ongoing and results will be added as they become
available
Global Guarantees for Blind Demodulation with Generative Priors
We study a deep learning inspired formulation for the blind demodulation
problem, which is the task of recovering two unknown vectors from their
entrywise multiplication. We consider the case where the unknown vectors are in
the range of known deep generative models,
and
. In the case when
the networks corresponding to the generative models are expansive, the weight
matrices are random and the dimension of the unknown vectors satisfy , up to log factors, we show that the empirical risk objective
has a favorable landscape for optimization. That is, the objective function has
a descent direction at every point outside of a small neighborhood around four
hyperbolic curves. We also characterize the local maximizers of the empirical
risk objective and, hence, show that there does not exist any other stationary
points outside of these neighborhood around four hyperbolic curves and the set
of local maximizers. We also implement a gradient descent scheme inspired by
the geometry of the landscape of the objective function. In order to converge
to a global minimizer, this gradient descent scheme exploits the fact that
exactly one of the hyperbolic curve corresponds to the global minimizer, and
thus points near this hyperbolic curve have a lower objective value than points
close to the other spurious hyperbolic curves. We show that this gradient
descent scheme can effectively remove distortions synthetically introduced to
the MNIST dataset
Wrapped Distributions on homogeneous Riemannian manifolds
We provide a general framework for constructing probability distributions on
Riemannian manifolds, taking advantage of area-preserving maps and isometries.
Control over distributions' properties, such as parameters, symmetry and
modality yield a family of flexible distributions that are straightforward to
sample from, suitable for use within Monte Carlo algorithms and latent variable
models, such as autoencoders. As an illustration, we empirically validate our
approach by utilizing our proposed distributions within a variational
autoencoder and a latent space network model. Finally, we take advantage of the
generalized description of this framework to posit questions for future work.Comment: 34 pages, 9 figures. arXiv admin note: text overlap with
arXiv:1804.00891 by other author
First-order Methods for Geodesically Convex Optimization
Geodesic convexity generalizes the notion of (vector space) convexity to
nonlinear metric spaces. But unlike convex optimization, geodesically convex
(g-convex) optimization is much less developed. In this paper we contribute to
the understanding of g-convex optimization by developing iteration complexity
analysis for several first-order algorithms on Hadamard manifolds.
Specifically, we prove upper bounds for the global complexity of deterministic
and stochastic (sub)gradient methods for optimizing smooth and nonsmooth
g-convex functions, both with and without strong g-convexity. Our analysis also
reveals how the manifold geometry, especially \emph{sectional curvature},
impacts convergence rates. To the best of our knowledge, our work is the first
to provide global complexity analysis for first-order algorithms for general
g-convex optimization.Comment: 21 page
Constructing the Matrix Multilayer Perceptron and its Application to the VAE
Like most learning algorithms, the multilayer perceptrons (MLP) is designed
to learn a vector of parameters from data. However, in certain scenarios we are
interested in learning structured parameters (predictions) in the form of
symmetric positive definite matrices. Here, we introduce a variant of the MLP,
referred to as the matrix MLP, that is specialized at learning symmetric
positive definite matrices. We also present an application of the model within
the context of the variational autoencoder (VAE). Our formulation of the VAE
extends the vanilla formulation to the cases where the recognition and the
generative networks can be from the parametric family of distributions with
dense covariance matrices. Two specific examples are discussed in more detail:
the dense covariance Gaussian and its generalization, the power exponential
distribution. Our new developments are illustrated using both synthetic and
real data
- …