6 research outputs found
Boosting Dilated Convolutional Networks with Mixed Tensor Decompositions
The driving force behind deep networks is their ability to compactly
represent rich classes of functions. The primary notion for formally reasoning
about this phenomenon is expressive efficiency, which refers to a situation
where one network must grow unfeasibly large in order to realize (or
approximate) functions of another. To date, expressive efficiency analyses
focused on the architectural feature of depth, showing that deep networks are
representationally superior to shallow ones. In this paper we study the
expressive efficiency brought forth by connectivity, motivated by the
observation that modern networks interconnect their layers in elaborate ways.
We focus on dilated convolutional networks, a family of deep models delivering
state of the art performance in sequence processing tasks. By introducing and
analyzing the concept of mixed tensor decompositions, we prove that
interconnecting dilated convolutional networks can lead to expressive
efficiency. In particular, we show that even a single connection between
intermediate layers can already lead to an almost quadratic gap, which in
large-scale settings typically makes the difference between a model that is
practical and one that is not. Empirical evaluation demonstrates how the
expressive efficiency of connectivity, similarly to that of depth, translates
into gains in accuracy. This leads us to believe that expressive efficiency may
serve a key role in the development of new tools for deep network design.Comment: Published as a conference paper at ICLR 201
Generalized Tensor Models for Recurrent Neural Networks
Recurrent Neural Networks (RNNs) are very successful at solving challenging
problems with sequential data. However, this observed efficiency is not yet
entirely explained by theory. It is known that a certain class of
multiplicative RNNs enjoys the property of depth efficiency --- a shallow
network of exponentially large width is necessary to realize the same score
function as computed by such an RNN. Such networks, however, are not very often
applied to real life tasks. In this work, we attempt to reduce the gap between
theory and practice by extending the theoretical analysis to RNNs which employ
various nonlinearities, such as Rectified Linear Unit (ReLU), and show that
they also benefit from properties of universality and depth efficiency. Our
theoretical results are verified by a series of extensive computational
experiments.Comment: Accepted as a conference paper at ICLR 201
Analysis and Design of Convolutional Networks via Hierarchical Tensor Decompositions
The driving force behind convolutional networks - the most successful deep
learning architecture to date, is their expressive power. Despite its wide
acceptance and vast empirical evidence, formal analyses supporting this belief
are scarce. The primary notions for formally reasoning about expressiveness are
efficiency and inductive bias. Expressive efficiency refers to the ability of a
network architecture to realize functions that require an alternative
architecture to be much larger. Inductive bias refers to the prioritization of
some functions over others given prior knowledge regarding a task at hand. In
this paper we overview a series of works written by the authors, that through
an equivalence to hierarchical tensor decompositions, analyze the expressive
efficiency and inductive bias of various convolutional network architectural
features (depth, width, strides and more). The results presented shed light on
the demonstrated effectiveness of convolutional networks, and in addition,
provide new tools for network design.Comment: Part of the Intel Collaborative Research Institute for Computational
Intelligence (ICRI-CI) Special Issue on Deep Learning Theor
Deep Compressive Autoencoder for Action Potential Compression in Large-Scale Neural Recording
Understanding the coordinated activity underlying brain computations requires
large-scale, simultaneous recordings from distributed neuronal structures at a
cellular-level resolution. One major hurdle to design high-bandwidth,
high-precision, large-scale neural interfaces lies in the formidable data
streams that are generated by the recorder chip and need to be online
transferred to a remote computer. The data rates can require hundreds to
thousands of I/O pads on the recorder chip and power consumption on the order
of Watts for data streaming alone. We developed a deep learning-based
compression model to reduce the data rate of multichannel action potentials.
The proposed model is built upon a deep compressive autoencoder (CAE) with
discrete latent embeddings. The encoder is equipped with residual
transformations to extract representative features from spikes, which are
mapped into the latent embedding space and updated via vector quantization
(VQ). The decoder network reconstructs spike waveforms from the quantized
latent embeddings. Experimental results show that the proposed model
consistently outperforms conventional methods by achieving much higher
compression ratios (20-500x) and better or comparable reconstruction
accuracies. Testing results also indicate that CAE is robust against a diverse
range of imperfections, such as waveform variation and spike misalignment, and
has minor influence on spike sorting accuracy. Furthermore, we have estimated
the hardware cost and real-time performance of CAE and shown that it could
support thousands of recording channels simultaneously without excessive
power/heat dissipation. The proposed model can reduce the required data
transmission bandwidth in large-scale recording experiments and maintain good
signal qualities. The code of this work has been made available at
https://github.com/tong-wu-umn/spike-compression-autoencoderComment: 19 pages, 13 figure
Implicit Regularization in Deep Learning May Not Be Explainable by Norms
Mathematically characterizing the implicit regularization induced by
gradient-based optimization is a longstanding pursuit in the theory of deep
learning. A widespread hope is that a characterization based on minimization of
norms may apply, and a standard test-bed for studying this prospect is matrix
factorization (matrix completion via linear neural networks). It is an open
question whether norms can explain the implicit regularization in matrix
factorization. The current paper resolves this open question in the negative,
by proving that there exist natural matrix factorization problems on which the
implicit regularization drives all norms (and quasi-norms) towards infinity.
Our results suggest that, rather than perceiving the implicit regularization
via norms, a potentially more useful interpretation is minimization of rank. We
demonstrate empirically that this interpretation extends to a certain class of
non-linear neural networks, and hypothesize that it may be key to explaining
generalization in deep learning
Depth Enables Long-Term Memory for Recurrent Neural Networks
A key attribute that drives the unprecedented success of modern Recurrent
Neural Networks (RNNs) on learning tasks which involve sequential data, is
their ability to model intricate long-term temporal dependencies. However, a
well established measure of RNNs long-term memory capacity is lacking, and thus
formal understanding of the effect of depth on their ability to correlate data
throughout time is limited. Specifically, existing depth efficiency results on
convolutional networks do not suffice in order to account for the success of
deep RNNs on data of varying lengths. In order to address this, we introduce a
measure of the network's ability to support information flow across time,
referred to as the Start-End separation rank, which reflects the distance of
the function realized by the recurrent network from modeling no dependency
between the beginning and end of the input sequence. We prove that deep
recurrent networks support Start-End separation ranks which are combinatorially
higher than those supported by their shallow counterparts. Thus, we establish
that depth brings forth an overwhelming advantage in the ability of recurrent
networks to model long-term dependencies, and provide an exemplar of
quantifying this key attribute. We empirically demonstrate the discussed
phenomena on common RNNs through extensive experimental evaluation using the
optimization technique of restricting the hidden-to-hidden matrix to being
orthogonal. Finally, we employ the tool of quantum Tensor Networks to gain
additional graphic insights regarding the complexity brought forth by depth in
recurrent networks.Comment: This document is an extension of arXiv:1710.09431 in the form of a
Master's thesi