35,699 research outputs found
Mutual Information Scaling and Expressive Power of Sequence Models
Sequence models assign probabilities to variable-length sequences such as
natural language texts. The ability of sequence models to capture temporal
dependence can be characterized by the temporal scaling of correlation and
mutual information. In this paper, we study the mutual information of recurrent
neural networks (RNNs) including long short-term memories and self-attention
networks such as Transformers. Through a combination of theoretical study of
linear RNNs and empirical study of nonlinear RNNs, we find their mutual
information decays exponentially in temporal distance. On the other hand,
Transformers can capture long-range mutual information more efficiently, making
them preferable in modeling sequences with slow power-law mutual information,
such as natural languages and stock prices. We discuss the connection of these
results with statistical mechanics. We also point out the non-uniformity
problem in many natural language datasets. We hope this work provides a new
perspective in understanding the expressive power of sequence models and shed
new light on improving the architecture of them.Comment: 12 + 15 pages. Comments are welcom
Conditional WaveGAN
Generative models are successfully used for image synthesis in the recent
years. But when it comes to other modalities like audio, text etc little
progress has been made. Recent works focus on generating audio from a
generative model in an unsupervised setting. We explore the possibility of
using generative models conditioned on class labels. Concatenation based
conditioning and conditional scaling were explored in this work with various
hyper-parameter tuning methods. In this paper we introduce Conditional WaveGANs
(cWaveGAN). Find our implementation at https://github.com/acheketa/cwaveganComment: Preprin
Sequence-to-Sequence Speech Recognition with Time-Depth Separable Convolutions
We propose a fully convolutional sequence-to-sequence encoder architecture
with a simple and efficient decoder. Our model improves WER on LibriSpeech
while being an order of magnitude more efficient than a strong RNN baseline.
Key to our approach is a time-depth separable convolution block which
dramatically reduces the number of parameters in the model while keeping the
receptive field large. We also give a stable and efficient beam search
inference procedure which allows us to effectively integrate a language model.
Coupled with a convolutional language model, our time-depth separable
convolution architecture improves by more than 22% relative WER over the best
previously reported sequence-to-sequence results on the noisy LibriSpeech test
set
GrAMME: Semi-Supervised Learning using Multi-layered Graph Attention Models
Modern data analysis pipelines are becoming increasingly complex due to the
presence of multi-view information sources. While graphs are effective in
modeling complex relationships, in many scenarios a single graph is rarely
sufficient to succinctly represent all interactions, and hence multi-layered
graphs have become popular. Though this leads to richer representations,
extending solutions from the single-graph case is not straightforward.
Consequently, there is a strong need for novel solutions to solve classical
problems, such as node classification, in the multi-layered case. In this
paper, we consider the problem of semi-supervised learning with multi-layered
graphs. Though deep network embeddings, e.g. DeepWalk, are widely adopted for
community discovery, we argue that feature learning with random node
attributes, using graph neural networks, can be more effective. To this end, we
propose to use attention models for effective feature learning, and develop two
novel architectures, GrAMME-SG and GrAMME-Fusion, that exploit the inter-layer
dependencies for building multi-layered graph embeddings. Using empirical
studies on several benchmark datasets, we evaluate the proposed approaches and
demonstrate significant performance improvements in comparison to
state-of-the-art network embedding strategies. The results also show that using
simple random features is an effective choice, even in cases where explicit
node attributes are not available
Uncertainty in the Variational Information Bottleneck
We present a simple case study, demonstrating that Variational Information
Bottleneck (VIB) can improve a network's classification calibration as well as
its ability to detect out-of-distribution data. Without explicitly being
designed to do so, VIB gives two natural metrics for handling and quantifying
uncertainty.Comment: 10 pages, 7 figures. Accepted to UAI 2018 - Uncertainty in Deep
Learning Worksho
Quantum Computation: Particle and Wave Aspects of Algorithms
The driving force in the pursuit for quantum computation is the exciting
possibility that quantum algorithms can be more efficient than their classical
analogues. Research on the subject has unraveled several aspects of how that
can happen. Clever quantum algorithms have been discovered in recent years,
although not systematically, and the field remains under active investigation.
Richard Feynman was one of the pioneers who foresaw the power of quantum
computers. In this issue dedicated to him, I give an introduction to how
particle and wave aspects contribute to the power of quantum computers. Shor's
and Grover's algorithms are analysed as examples.Comment: 6 pages, Prepared for the issue of Resonance in honour of Richard
Feynman (v2) Minor changes. Published versio
Physics-Constrained Deep Learning for High-dimensional Surrogate Modeling and Uncertainty Quantification without Labeled Data
Surrogate modeling and uncertainty quantification tasks for PDE systems are
most often considered as supervised learning problems where input and output
data pairs are used for training. The construction of such emulators is by
definition a small data problem which poses challenges to deep learning
approaches that have been developed to operate in the big data regime. Even in
cases where such models have been shown to have good predictive capability in
high dimensions, they fail to address constraints in the data implied by the
PDE model. This paper provides a methodology that incorporates the governing
equations of the physical model in the loss/likelihood functions. The resulting
physics-constrained, deep learning models are trained without any labeled data
(e.g. employing only input data) and provide comparable predictive responses
with data-driven models while obeying the constraints of the problem at hand.
This work employs a convolutional encoder-decoder neural network approach as
well as a conditional flow-based generative model for the solution of PDEs,
surrogate model construction, and uncertainty quantification tasks. The
methodology is posed as a minimization problem of the reverse Kullback-Leibler
(KL) divergence between the model predictive density and the reference
conditional density, where the later is defined as the Boltzmann-Gibbs
distribution at a given inverse temperature with the underlying potential
relating to the PDE system of interest. The generalization capability of these
models to out-of-distribution input is considered. Quantification and
interpretation of the predictive uncertainty is provided for a number of
problems.Comment: 51 pages, 18 figures, submitted to Journal of Computational Physic
SOC computer simulations
The following chapter provides an overview of the techniques used to
understand Self-Organised Criticality (SOC) by performing computer simulations.
Those are of particular significance in SOC, given its very paradigm, the BTW
(Bak-Tang-Wiesenfeld) sandpile, was introduced on the basis of a process that
is conveniently implemented as a computer program. The chapter is divided into
three sections: In the first section a number of key concepts are introduced,
followed by four brief presentations of SOC models which are most commonly
investigated or which have played an important part in the development of the
field as a whole. The second section is concerned with the basics of scaling
with particular emphasis of its role in numerical models of SOC, introducing a
number of basic tools for data analysis such as binning, moment analysis and
error estimation. The third section is devoted to numerical methods and
algorithms as applied to SOC models, addressing typical computational questions
with the particular application of SOC in mind. The present chapter is rather
technical, but hands-on at the same time, providing practical advice and even
code snippets (in C) wherever possible.Comment: 57 pages, 5 figures, chapter 7 of e-book Self-Organized Criticality
Systems, edited by M Aschwanden, OpenAcademicPres
Hierarchical Annotation of Images with Two-Alternative-Forced-Choice Metric Learning
Many tasks such as retrieval and recommendations can significantly benefit
from structuring the data, commonly in a hierarchical way. To achieve this
through annotations of high dimensional data such as images or natural text can
be significantly labor intensive. We propose an approach for uncovering the
hierarchical structure of data based on efficient discriminative testing rather
than annotations of individual datapoints. Using two-alternative-forced-choice
(2AFC) testing and deep metric learning we achieve embedding of the data in
semantic space where we are able to successfully hierarchically cluster. We
actively select triplets for the 2AFC test such that the modeling process is
highly efficient with respect to the number of tests presented to the
annotator. We empirically demonstrate the feasibility of the method by
confirming the shape bias on synthetic data and extract hierarchical structure
on the Fashion-MNIST dataset to a finer granularity than the original labels.Comment: presented at 2019 ICML Workshop on Human in the Loop Learning (HILL
2019), Long Beach, US
Depth Adaptive Deep Neural Network for Semantic Segmentation
In this work, we present the depth-adaptive deep neural network using a depth
map for semantic segmentation. Typical deep neural networks receive inputs at
the predetermined locations regardless of the distance from the camera. This
fixed receptive field presents a challenge to generalize the features of
objects at various distances in neural networks. Specifically, the
predetermined receptive fields are too small at a short distance, and vice
versa. To overcome this challenge, we develop a neural network which is able to
adapt the receptive field not only for each layer but also for each neuron at
the spatial location. To adjust the receptive field, we propose the
depth-adaptive multiscale (DaM) convolution layer consisting of the adaptive
perception neuron and the in-layer multiscale neuron. The adaptive perception
neuron is to adjust the receptive field at each spatial location using the
corresponding depth information. The in-layer multiscale neuron is to apply the
different size of the receptive field at each feature space to learn features
at multiple scales. The proposed DaM convolution is applied to two fully
convolutional neural networks. We demonstrate the effectiveness of the proposed
neural networks on the publicly available RGB-D dataset for semantic
segmentation and the novel hand segmentation dataset for hand-object
interaction. The experimental results show that the proposed method outperforms
the state-of-the-art methods without any additional layers or
pre/post-processing.Comment: IEEE Transactions on Multimedia, 201
- …