1,434 research outputs found
Regularized Forward-Backward Decoder for Attention Models
Nowadays, attention models are one of the popular candidates for speech
recognition. So far, many studies mainly focus on the encoder structure or the
attention module to enhance the performance of these models. However, mostly
ignore the decoder. In this paper, we propose a novel regularization technique
incorporating a second decoder during the training phase. This decoder is
optimized on time-reversed target labels beforehand and supports the standard
decoder during training by adding knowledge from future context. Since it is
only added during training, we are not changing the basic structure of the
network or adding complexity during decoding. We evaluate our approach on the
smaller TEDLIUMv2 and the larger LibriSpeech dataset, achieving consistent
improvements on both of them.Comment: Under review for Interspeech 202
Bilingual-GAN: A Step Towards Parallel Text Generation
Latent space based GAN methods and attention based sequence to sequence
models have achieved impressive results in text generation and unsupervised
machine translation respectively. Leveraging the two domains, we propose an
adversarial latent space based model capable of generating parallel sentences
in two languages concurrently and translating bidirectionally. The bilingual
generation goal is achieved by sampling from the latent space that is shared
between both languages. First two denoising autoencoders are trained, with
shared encoders and back-translation to enforce a shared latent state between
the two languages. The decoder is shared for the two translation directions.
Next, a GAN is trained to generate synthetic "code" mimicking the languages'
shared latent space. This code is then fed into the decoder to generate text in
either language. We perform our experiments on Europarl and Multi30k datasets,
on the English-French language pair, and document our performance using both
supervised and unsupervised machine translation
Going Wider: Recurrent Neural Network With Parallel Cells
Recurrent Neural Network (RNN) has been widely applied for sequence modeling.
In RNN, the hidden states at current step are full connected to those at
previous step, thus the influence from less related features at previous step
may potentially decrease model's learning ability. We propose a simple
technique called parallel cells (PCs) to enhance the learning ability of
Recurrent Neural Network (RNN). In each layer, we run multiple small RNN cells
rather than one single large cell. In this paper, we evaluate PCs on 2 tasks.
On language modeling task on PTB (Penn Tree Bank), our model outperforms state
of art models by decreasing perplexity from 78.6 to 75.3. On Chinese-English
translation task, our model increases BLEU score for 0.39 points than baseline
model
Unsupervised Feature Learning of Human Actions as Trajectories in Pose Embedding Manifold
An unsupervised human action modeling framework can provide useful
pose-sequence representation, which can be utilized in a variety of pose
analysis applications. In this work we propose a novel temporal pose-sequence
modeling framework, which can embed the dynamics of 3D human-skeleton joints to
a continuous latent space in an efficient manner. In contrast to end-to-end
framework explored by previous works, we disentangle the task of individual
pose representation learning from the task of learning actions as a trajectory
in pose embedding space. In order to realize a continuous pose embedding
manifold with improved reconstructions, we propose an unsupervised, manifold
learning procedure named Encoder GAN, (or EnGAN). Further, we use the pose
embeddings generated by EnGAN to model human actions using a bidirectional RNN
auto-encoder architecture, PoseRNN. We introduce first-order gradient loss to
explicitly enforce temporal regularity in the predicted motion sequence. A
hierarchical feature fusion technique is also investigated for simultaneous
modeling of local skeleton joints along with global pose variations. We
demonstrate state-of-the-art transfer-ability of the learned representation
against other supervisedly and unsupervisedly learned motion embeddings for the
task of fine-grained action recognition on SBU interaction dataset. Further, we
show the qualitative strengths of the proposed framework by visualizing
skeleton pose reconstructions and interpolations in pose-embedding space, and
low dimensional principal component projections of the reconstructed pose
trajectories.Comment: Accepted at WACV 201
Transfer-Entropy-Regularized Markov Decision Processes
We consider the framework of transfer-entropy-regularized Markov Decision
Process (TERMDP) in which the weighted sum of the classical state-dependent
cost and the transfer entropy from the state random process to the control
random process is minimized. Although TERMDPs are generally formulated as
nonconvex optimization problems, we derive an analytical necessary optimality
condition expressed as a finite set of nonlinear equations, based on which an
iterative forward-backward computational procedure similar to the
Arimoto-Blahut algorithm is proposed. It is shown that every limit point of the
sequence generated by the proposed algorithm is a stationary point of the
TERMDP. Applications of TERMDPs are discussed in the context of networked
control systems theory and non-equilibrium thermodynamics. The proposed
algorithm is applied to an information-constrained maze navigation problem,
whereby we study how the price of information qualitatively alters the optimal
decision polices
A Regularized Framework for Sparse and Structured Neural Attention
Modern neural networks are often augmented with an attention mechanism, which
tells the network where to focus within the input. We propose in this paper a
new framework for sparse and structured attention, building upon a smoothed max
operator. We show that the gradient of this operator defines a mapping from
real values to probabilities, suitable as an attention mechanism. Our framework
includes softmax and a slight generalization of the recently-proposed sparsemax
as special cases. However, we also show how our framework can incorporate
modern structured penalties, resulting in more interpretable attention
mechanisms, that focus on entire segments or groups of an input. We derive
efficient algorithms to compute the forward and backward passes of our
attention mechanisms, enabling their use in a neural network trained with
backpropagation. To showcase their potential as a drop-in replacement for
existing ones, we evaluate our attention mechanisms on three large-scale tasks:
textual entailment, machine translation, and sentence summarization. Our
attention mechanisms improve interpretability without sacrificing performance;
notably, on textual entailment and summarization, we outperform the standard
attention mechanisms based on softmax and sparsemax.Comment: In proceedings of NeurIPS 2017; added errat
Latent Variable Algorithms for Multimodal Learning and Sensor Fusion
Multimodal learning has been lacking principled ways of combining information
from different modalities and learning a low-dimensional manifold of meaningful
representations. We study multimodal learning and sensor fusion from a latent
variable perspective. We first present a regularized recurrent attention filter
for sensor fusion. This algorithm can dynamically combine information from
different types of sensors in a sequential decision making task. Each sensor is
bonded with a modular neural network to maximize utility of its own
information. A gating modular neural network dynamically generates a set of
mixing weights for outputs from sensor networks by balancing utility of all
sensors' information. We design a co-learning mechanism to encourage
co-adaption and independent learning of each sensor at the same time, and
propose a regularization based co-learning method. In the second part, we focus
on recovering the manifold of latent representation. We propose a co-learning
approach using probabilistic graphical model which imposes a structural prior
on the generative model: multimodal variational RNN (MVRNN) model, and derive a
variational lower bound for its objective functions. In the third part, we
extend the siamese structure to sensor fusion for robust acoustic event
detection. We perform experiments to investigate the latent representations
that are extracted; works will be done in the following months. Our experiments
show that the recurrent attention filter can dynamically combine different
sensor inputs according to the information carried in the inputs. We consider
MVRNN can identify latent representations that are useful for many downstream
tasks such as speech synthesis, activity recognition, and control and planning.
Both algorithms are general frameworks which can be applied to other tasks
where different types of sensors are jointly used for decision making
On the Diachronic Stability of Irregularity in Inflectional Morphology
Many languages' inflectional morphological systems are replete with
irregulars, i.e., words that do not seem to follow standard inflectional rules.
In this work, we quantitatively investigate the conditions under which
irregulars can survive in a language over the course of time. Using recurrent
neural networks to simulate language learners, we test the diachronic relation
between frequency of words and their irregularity.Comment: accepted to NAACL 2018; withdrawn in order to add more thorough
experiments (coming in next version
Position-based Content Attention for Time Series Forecasting with Sequence-to-sequence RNNs
We propose here an extended attention model for sequence-to-sequence
recurrent neural networks (RNNs) designed to capture (pseudo-)periods in time
series. This extended attention model can be deployed on top of any RNN and is
shown to yield state-of-the-art performance for time series forecasting on
several univariate and multivariate time series
RiFCN: Recurrent Network in Fully Convolutional Network for Semantic Segmentation of High Resolution Remote Sensing Images
Semantic segmentation in high resolution remote sensing images is a
fundamental and challenging task. Convolutional neural networks (CNNs), such as
fully convolutional network (FCN) and SegNet, have shown outstanding
performance in many segmentation tasks. One key pillar of these successes is
mining useful information from features in convolutional layers for producing
high resolution segmentation maps. For example, FCN nonlinearly combines
high-level features extracted from last convolutional layers; whereas SegNet
utilizes a deconvolutional network which takes as input only coarse, high-level
feature maps of the last convolutional layer. However, how to better fuse
multi-level convolutional feature maps for semantic segmentation of remote
sensing images is underexplored. In this work, we propose a novel bidirectional
network called recurrent network in fully convolutional network (RiFCN), which
is end-to-end trainable. It has a forward stream and a backward stream. The
former is a classification CNN architecture for feature extraction, which takes
an input image and produces multi-level convolutional feature maps from shallow
to deep; while in the later, to achieve accurate boundary inference and
semantic segmentation, boundary-aware high resolution feature maps in shallower
layers and high-level but low-resolution features are recursively embedded into
the learning framework (from deep to shallow) to generate a fused feature
representation that draws a holistic picture of not only high-level semantic
information but also low-level fine-grained details. Experimental results on
two widely-used high resolution remote sensing data sets for semantic
segmentation tasks, ISPRS Potsdam and Inria Aerial Image Labeling Data Set,
demonstrate competitive performance obtained by the proposed methodology
compared to other studied approaches
- …