1,860 research outputs found
Exponential expressivity in deep neural networks through transient chaos
We combine Riemannian geometry with the mean field theory of high dimensional
chaos to study the nature of signal propagation in generic, deep neural
networks with random weights. Our results reveal an order-to-chaos expressivity
phase transition, with networks in the chaotic phase computing nonlinear
functions whose global curvature grows exponentially with depth but not width.
We prove this generic class of deep random functions cannot be efficiently
computed by any shallow network, going beyond prior work restricted to the
analysis of single functions. Moreover, we formalize and quantitatively
demonstrate the long conjectured idea that deep networks can disentangle highly
curved manifolds in input space into flat manifolds in hidden space. Our
theoretical analysis of the expressive power of deep networks broadly applies
to arbitrary nonlinearities, and provides a quantitative underpinning for
previously abstract notions about the geometry of deep functions.Comment: Fixed equation reference
How deep learning works --The geometry of deep learning
Why and how that deep learning works well on different tasks remains a
mystery from a theoretical perspective. In this paper we draw a geometric
picture of the deep learning system by finding its analogies with two existing
geometric structures, the geometry of quantum computations and the geometry of
the diffeomorphic template matching. In this framework, we give the geometric
structures of different deep learning systems including convolutional neural
networks, residual networks, recursive neural networks, recurrent neural
networks and the equilibrium prapagation framework. We can also analysis the
relationship between the geometrical structures and their performance of
different networks in an algorithmic level so that the geometric framework may
guide the design of the structures and algorithms of deep learning systems.Comment: 16 pages, 13 figure
Curvature-based Comparison of Two Neural Networks
In this paper we show the similarities and differences of two deep neural
networks by comparing the manifolds composed of activation vectors in each
fully connected layer of them. The main contribution of this paper includes 1)
a new data generating algorithm which is crucial for determining the dimension
of manifolds; 2) a systematic strategy to compare manifolds. Especially, we
take Riemann curvature and sectional curvature as part of criterion, which can
reflect the intrinsic geometric properties of manifolds. Some interesting
results and phenomenon are given, which help in specifying the similarities and
differences between the features extracted by two networks and demystifying the
intrinsic mechanism of deep neural networks
Geometry of Deep Generative Models for Disentangled Representations
Deep generative models like variational autoencoders approximate the
intrinsic geometry of high dimensional data manifolds by learning
low-dimensional latent-space variables and an embedding function. The geometric
properties of these latent spaces has been studied under the lens of Riemannian
geometry; via analysis of the non-linearity of the generator function. In new
developments, deep generative models have been used for learning semantically
meaningful `disentangled' representations; that capture task relevant
attributes while being invariant to other attributes. In this work, we explore
the geometry of popular generative models for disentangled representation
learning. We use several metrics to compare the properties of latent spaces of
disentangled representation models in terms of class separability and curvature
of the latent-space. The results we obtain establish that the class
distinguishable features in the disentangled latent space exhibits higher
curvature as opposed to a variational autoencoder. We evaluate and compare the
geometry of three such models with variational autoencoder on two different
datasets. Further, our results show that distances and interpolation in the
latent space are significantly improved with Riemannian metrics derived from
the curvature of the space. We expect these results will have implications on
understanding how deep-networks can be made more robust, generalizable, as well
as interpretable.Comment: Accepted at ICVGIP, 201
Understanding over-parameterized deep networks by geometrization
A complete understanding of the widely used over-parameterized deep networks
is a key step for AI. In this work we try to give a geometric picture of
over-parameterized deep networks using our geometrization scheme. We show that
the Riemannian geometry of network complexity plays a key role in understanding
the basic properties of over-parameterizaed deep networks, including the
generalization, convergence and parameter sensitivity. We also point out deep
networks share lots of similarities with quantum computation systems. This can
be regarded as a strong support of our proposal that geometrization is not only
the bible for physics, it is also the key idea to understand deep learning
systems.Comment: 6 page
Constant Curvature Graph Convolutional Networks
Interest has been rising lately towards methods representing data in
non-Euclidean spaces, e.g. hyperbolic or spherical, that provide specific
inductive biases useful for certain real-world data properties, e.g.
scale-free, hierarchical or cyclical. However, the popular graph neural
networks are currently limited in modeling data only via Euclidean geometry and
associated vector space operations. Here, we bridge this gap by proposing
mathematically grounded generalizations of graph convolutional networks (GCN)
to (products of) constant curvature spaces. We do this by i) introducing a
unified formalism that can interpolate smoothly between all geometries of
constant curvature, ii) leveraging gyro-barycentric coordinates that generalize
the classic Euclidean concept of the center of mass. Our class of models
smoothly recover their Euclidean counterparts when the curvature goes to zero
from either side. Empirically, we outperform Euclidean GCNs in the tasks of
node classification and distortion minimization for symbolic data exhibiting
non-Euclidean behavior, according to their discrete curvature
ManifoldNet: A Deep Network Framework for Manifold-valued Data
Deep neural networks have become the main work horse for many tasks involving
learning from data in a variety of applications in Science and Engineering.
Traditionally, the input to these networks lie in a vector space and the
operations employed within the network are well defined on vector-spaces. In
the recent past, due to technological advances in sensing, it has become
possible to acquire manifold-valued data sets either directly or indirectly.
Examples include but are not limited to data from omnidirectional cameras on
automobiles, drones etc., synthetic aperture radar imaging, diffusion magnetic
resonance imaging, elastography and conductance imaging in the Medical Imaging
domain and others. Thus, there is need to generalize the deep neural networks
to cope with input data that reside on curved manifolds where vector space
operations are not naturally admissible. In this paper, we present a novel
theoretical framework to generalize the widely popular convolutional neural
networks (CNNs) to high dimensional manifold-valued data inputs. We call these
networks, ManifoldNets.
In ManifoldNets, convolution operation on data residing on Riemannian
manifolds is achieved via a provably convergent recursive computation of the
weighted Fr\'{e}chet Mean (wFM) of the given data, where the weights makeup the
convolution mask, to be learned. Further, we prove that the proposed wFM layer
achieves a contraction mapping and hence ManifoldNet does not need the
non-linear ReLU unit used in standard CNNs. We present experiments, using the
ManifoldNet framework, to achieve dimensionality reduction by computing the
principal linear subspaces that naturally reside on a Grassmannian. The
experimental results demonstrate the efficacy of ManifoldNets in the context of
classification and reconstruction accuracy
Fine-grained Optimization of Deep Neural Networks
In recent studies, several asymptotic upper bounds on generalization errors
on deep neural networks (DNNs) are theoretically derived. These bounds are
functions of several norms of weights of the DNNs, such as the Frobenius and
spectral norms, and they are computed for weights grouped according to either
input and output channels of the DNNs. In this work, we conjecture that if we
can impose multiple constraints on weights of DNNs to upper bound the norms of
the weights, and train the DNNs with these weights, then we can attain
empirical generalization errors closer to the derived theoretical bounds, and
improve accuracy of the DNNs.
To this end, we pose two problems. First, we aim to obtain weights whose
different norms are all upper bounded by a constant number, e.g. 1.0. To
achieve these bounds, we propose a two-stage renormalization procedure; (i)
normalization of weights according to different norms used in the bounds, and
(ii) reparameterization of the normalized weights to set a constant and finite
upper bound of their norms. In the second problem, we consider training DNNs
with these renormalized weights. To this end, we first propose a strategy to
construct joint spaces (manifolds) of weights according to different
constraints in DNNs. Next, we propose a fine-grained SGD algorithm (FG-SGD) for
optimization on the weight manifolds to train DNNs with assurance of
convergence to minima. Experimental results show that image classification
accuracy of baseline DNNs can be boosted using FG-SGD on collections of
manifolds identified by multiple constraints
Information Geometry of Orthogonal Initializations and Training
Recently mean field theory has been successfully used to analyze properties
of wide, random neural networks. It gave rise to a prescriptive theory for
initializing feed-forward neural networks with orthogonal weights, which
ensures that both the forward propagated activations and the backpropagated
gradients are near isometries and as a consequence training is orders
of magnitude faster. Despite strong empirical performance, the mechanisms by
which critical initializations confer an advantage in the optimization of deep
neural networks are poorly understood. Here we show a novel connection between
the maximum curvature of the optimization landscape (gradient smoothness) as
measured by the Fisher information matrix (FIM) and the spectral radius of the
input-output Jacobian, which partially explains why more isometric networks can
train much faster. Furthermore, given that orthogonal weights are necessary to
ensure that gradient norms are approximately preserved at initialization, we
experimentally investigate the benefits of maintaining orthogonality throughout
training, from which we conclude that manifold optimization of weights performs
well regardless of the smoothness of the gradients. Moreover, motivated by
experimental results we show that a low condition number of the FIM is not
predictive of faster learning.Comment: 10 pages and 5 figures; 5 page appendi
On the effect of pooling on the geometry of representations
In machine learning and neuroscience, certain computational structures and
algorithms are known to yield disentangled representations without us
understanding why, the most striking examples being perhaps convolutional
neural networks and the ventral stream of the visual cortex in humans and
primates. As for the latter, it was conjectured that representations may be
disentangled by being flattened progressively and at a local scale. An attempt
at a formalization of the role of invariance in learning representations was
made recently, being referred to as I-theory. In this framework and using the
language of differential geometry, we show that pooling over a group of
transformations of the input contracts the metric and reduces its curvature,
and provide quantitative bounds, in the aim of moving towards a theoretical
understanding on how to disentangle representations
- …