32,637 research outputs found
On the Expressive Efficiency of Sum Product Networks
Sum Product Networks (SPNs) are a recently developed class of deep generative
models which compute their associated unnormalized density functions using a
special type of arithmetic circuit. When certain sufficient conditions, called
the decomposability and completeness conditions (or "D&C" conditions), are
imposed on the structure of these circuits, marginal densities and other useful
quantities, which are typically intractable for other deep generative models,
can be computed by what amounts to a single evaluation of the network (which is
a property known as "validity"). However, the effect that the D&C conditions
have on the capabilities of D&C SPNs is not well understood.
In this work we analyze the D&C conditions, expose the various connections
that D&C SPNs have with multilinear arithmetic circuits, and consider the
question of how well they can capture various distributions as a function of
their size and depth. Among our various contributions is a result which
establishes the existence of a relatively simple distribution with fully
tractable marginal densities which cannot be efficiently captured by D&C SPNs
of any depth, but which can be efficiently captured by various other deep
generative models. We also show that with each additional layer of depth
permitted, the set of distributions which can be efficiently captured by D&C
SPNs grows in size. This kind of "depth hierarchy" property has been widely
conjectured to hold for various deep models, but has never been proven for any
of them. Some of our other contributions include a new characterization of the
D&C conditions as sufficient and necessary ones for a slightly strengthened
notion of validity, and various state-machine characterizations of the types of
computations that can be performed efficiently by D&C SPNs.Comment: Various minor revisions and corrections throughou
Analysis and Design of Convolutional Networks via Hierarchical Tensor Decompositions
The driving force behind convolutional networks - the most successful deep
learning architecture to date, is their expressive power. Despite its wide
acceptance and vast empirical evidence, formal analyses supporting this belief
are scarce. The primary notions for formally reasoning about expressiveness are
efficiency and inductive bias. Expressive efficiency refers to the ability of a
network architecture to realize functions that require an alternative
architecture to be much larger. Inductive bias refers to the prioritization of
some functions over others given prior knowledge regarding a task at hand. In
this paper we overview a series of works written by the authors, that through
an equivalence to hierarchical tensor decompositions, analyze the expressive
efficiency and inductive bias of various convolutional network architectural
features (depth, width, strides and more). The results presented shed light on
the demonstrated effectiveness of convolutional networks, and in addition,
provide new tools for network design.Comment: Part of the Intel Collaborative Research Institute for Computational
Intelligence (ICRI-CI) Special Issue on Deep Learning Theor
On the Expressive Power of Deep Learning: A Tensor Analysis
It has long been conjectured that hypotheses spaces suitable for data that is
compositional in nature, such as text or images, may be more efficiently
represented with deep hierarchical networks than with shallow ones. Despite the
vast empirical evidence supporting this belief, theoretical justifications to
date are limited. In particular, they do not account for the locality, sharing
and pooling constructs of convolutional networks, the most successful deep
learning architecture to date. In this work we derive a deep network
architecture based on arithmetic circuits that inherently employs locality,
sharing and pooling. An equivalence between the networks and hierarchical
tensor factorizations is established. We show that a shallow network
corresponds to CP (rank-1) decomposition, whereas a deep network corresponds to
Hierarchical Tucker decomposition. Using tools from measure theory and matrix
algebra, we prove that besides a negligible set, all functions that can be
implemented by a deep network of polynomial size, require exponential size in
order to be realized (or even approximated) by a shallow network. Since
log-space computation transforms our networks into SimNets, the result applies
directly to a deep learning architecture demonstrating promising empirical
performance. The construction and theory developed in this paper shed new light
on various practices and ideas employed by the deep learning community
Sum-Product-Quotient Networks
We present a novel tractable generative model that extends Sum-Product
Networks (SPNs) and significantly boosts their power. We call it
Sum-Product-Quotient Networks (SPQNs), whose core concept is to incorporate
conditional distributions into the model by direct computation using quotient
nodes, e.g. . We provide sufficient conditions
for the tractability of SPQNs that generalize and relax the decomposable and
complete tractability conditions of SPNs. These relaxed conditions give rise to
an exponential boost to the expressive efficiency of our model, i.e. we prove
that there are distributions which SPQNs can compute efficiently but require
SPNs to be of exponential size. Thus, we narrow the gap in expressivity between
tractable graphical models and other Neural Network-based generative models.Comment: Published as a conference paper at AISTATS 201
Convolutional Rectifier Networks as Generalized Tensor Decompositions
Convolutional rectifier networks, i.e. convolutional neural networks with
rectified linear activation and max or average pooling, are the cornerstone of
modern deep learning. However, despite their wide use and success, our
theoretical understanding of the expressive properties that drive these
networks is partial at best. On the other hand, we have a much firmer grasp of
these issues in the world of arithmetic circuits. Specifically, it is known
that convolutional arithmetic circuits possess the property of "complete depth
efficiency", meaning that besides a negligible set, all functions that can be
implemented by a deep network of polynomial size, require exponential size in
order to be implemented (or even approximated) by a shallow network. In this
paper we describe a construction based on generalized tensor decompositions,
that transforms convolutional arithmetic circuits into convolutional rectifier
networks. We then use mathematical tools available from the world of arithmetic
circuits to prove new results. First, we show that convolutional rectifier
networks are universal with max pooling but not with average pooling. Second,
and more importantly, we show that depth efficiency is weaker with
convolutional rectifier networks than it is with convolutional arithmetic
circuits. This leads us to believe that developing effective methods for
training convolutional arithmetic circuits, thereby fulfilling their expressive
potential, may give rise to a deep learning architecture that is provably
superior to convolutional rectifier networks but has so far been overlooked by
practitioners
Expressive power of recurrent neural networks
Deep neural networks are surprisingly efficient at solving practical tasks,
but the theory behind this phenomenon is only starting to catch up with the
practice. Numerous works show that depth is the key to this efficiency. A
certain class of deep convolutional networks -- namely those that correspond to
the Hierarchical Tucker (HT) tensor decomposition -- has been proven to have
exponentially higher expressive power than shallow networks. I.e. a shallow
network of exponential width is required to realize the same score function as
computed by the deep architecture. In this paper, we prove the expressive power
theorem (an exponential lower bound on the width of the equivalent shallow
network) for a class of recurrent neural networks -- ones that correspond to
the Tensor Train (TT) decomposition. This means that even processing an image
patch by patch with an RNN can be exponentially more efficient than a (shallow)
convolutional network with one hidden layer. Using theoretical results on the
relation between the tensor decompositions we compare expressive powers of the
HT- and TT-Networks. We also implement the recurrent TT-Networks and provide
numerical evidence of their expressivity.Comment: Accepted as a conference paper at ICLR 201
Tucker Decomposition Network: Expressive Power and Comparison
Deep neural networks have achieved a great success in solving many machine
learning and computer vision problems. The main contribution of this paper is
to develop a deep network based on Tucker tensor decomposition, and analyze its
expressive power. It is shown that the expressiveness of Tucker network is more
powerful than that of shallow network. In general, it is required to use an
exponential number of nodes in a shallow network in order to represent a Tucker
network. Experimental results are also given to compare the performance of the
proposed Tucker network with hierarchical tensor network and shallow network,
and demonstrate the usefulness of Tucker network in image classification
problems
On the Expressive Power of Overlapping Architectures of Deep Learning
Expressive efficiency refers to the relation between two architectures A and
B, whereby any function realized by B could be replicated by A, but there
exists functions realized by A, which cannot be replicated by B unless its size
grows significantly larger. For example, it is known that deep networks are
exponentially efficient with respect to shallow networks, in the sense that a
shallow network must grow exponentially large in order to approximate the
functions represented by a deep network of polynomial size. In this work, we
extend the study of expressive efficiency to the attribute of network
connectivity and in particular to the effect of "overlaps" in the convolutional
process, i.e., when the stride of the convolution is smaller than its filter
size (receptive field). To theoretically analyze this aspect of network's
design, we focus on a well-established surrogate for ConvNets called
Convolutional Arithmetic Circuits (ConvACs), and then demonstrate empirically
that our results hold for standard ConvNets as well. Specifically, our analysis
shows that having overlapping local receptive fields, and more broadly denser
connectivity, results in an exponential increase in the expressive capacity of
neural networks. Moreover, while denser connectivity can increase the
expressive capacity, we show that the most common types of modern architectures
already exhibit exponential increase in expressivity, without relying on
fully-connected layers.Comment: Published as a conference paper at ICLR 201
Boosting Dilated Convolutional Networks with Mixed Tensor Decompositions
The driving force behind deep networks is their ability to compactly
represent rich classes of functions. The primary notion for formally reasoning
about this phenomenon is expressive efficiency, which refers to a situation
where one network must grow unfeasibly large in order to realize (or
approximate) functions of another. To date, expressive efficiency analyses
focused on the architectural feature of depth, showing that deep networks are
representationally superior to shallow ones. In this paper we study the
expressive efficiency brought forth by connectivity, motivated by the
observation that modern networks interconnect their layers in elaborate ways.
We focus on dilated convolutional networks, a family of deep models delivering
state of the art performance in sequence processing tasks. By introducing and
analyzing the concept of mixed tensor decompositions, we prove that
interconnecting dilated convolutional networks can lead to expressive
efficiency. In particular, we show that even a single connection between
intermediate layers can already lead to an almost quadratic gap, which in
large-scale settings typically makes the difference between a model that is
practical and one that is not. Empirical evaluation demonstrates how the
expressive efficiency of connectivity, similarly to that of depth, translates
into gains in accuracy. This leads us to believe that expressive efficiency may
serve a key role in the development of new tools for deep network design.Comment: Published as a conference paper at ICLR 201
The Expressive Power of Neural Networks: A View from the Width
The expressive power of neural networks is important for understanding deep
learning. Most existing works consider this problem from the view of the depth
of a network. In this paper, we study how width affects the expressiveness of
neural networks. Classical results state that depth-bounded (e.g. depth-)
networks with suitable activation functions are universal approximators. We
show a universal approximation theorem for width-bounded ReLU networks:
width- ReLU networks, where is the input dimension, are universal
approximators. Moreover, except for a measure zero set, all functions cannot be
approximated by width- ReLU networks, which exhibits a phase transition.
Several recent works demonstrate the benefits of depth by proving the
depth-efficiency of neural networks. That is, there are classes of deep
networks which cannot be realized by any shallow network whose size is no more
than an exponential bound. Here we pose the dual question on the
width-efficiency of ReLU networks: Are there wide networks that cannot be
realized by narrow networks whose size is not substantially larger? We show
that there exist classes of wide networks which cannot be realized by any
narrow network whose depth is no more than a polynomial bound. On the other
hand, we demonstrate by extensive experiments that narrow networks whose size
exceed the polynomial bound by a constant factor can approximate wide and
shallow network with high accuracy. Our results provide more comprehensive
evidence that depth is more effective than width for the expressiveness of ReLU
networks.Comment: accepted by NIPS 2017 ( with some typos fixed
- …