23,383 research outputs found
Generating Sentences Using a Dynamic Canvas
We introduce the Attentive Unsupervised Text (W)riter (AUTR), which is a word
level generative model for natural language. It uses a recurrent neural network
with a dynamic attention and canvas memory mechanism to iteratively construct
sentences. By viewing the state of the memory at intermediate stages and where
the model is placing its attention, we gain insight into how it constructs
sentences. We demonstrate that AUTR learns a meaningful latent representation
for each sentence, and achieves competitive log-likelihood lower bounds whilst
being computationally efficient. It is effective at generating and
reconstructing sentences, as well as imputing missing words.Comment: AAAI 201
Frequency vs. Association for Constraint Selection in Usage-Based Construction Grammar
A usage-based Construction Grammar (CxG) posits that slot-constraints
generalize from common exemplar constructions. But what is the best model of
constraint generalization? This paper evaluates competing frequency-based and
association-based models across eight languages using a metric derived from the
Minimum Description Length paradigm. The experiments show that
association-based models produce better generalizations across all languages by
a significant margin
Improved training of end-to-end attention models for speech recognition
Sequence-to-sequence attention-based models on subword units allow simple
open-vocabulary end-to-end speech recognition. In this work, we show that such
models can achieve competitive results on the Switchboard 300h and LibriSpeech
1000h tasks. In particular, we report the state-of-the-art word error rates
(WER) of 3.54% on the dev-clean and 3.82% on the test-clean evaluation subsets
of LibriSpeech. We introduce a new pretraining scheme by starting with a high
time reduction factor and lowering it during training, which is crucial both
for convergence and final performance. In some experiments, we also use an
auxiliary CTC loss function to help the convergence. In addition, we train long
short-term memory (LSTM) language models on subword units. By shallow fusion,
we report up to 27% relative improvements in WER over the attention baseline
without a language model.Comment: submitted to Interspeech 201
Sampling from Stochastic Finite Automata with Applications to CTC Decoding
Stochastic finite automata arise naturally in many language and speech
processing tasks. They include stochastic acceptors, which represent certain
probability distributions over random strings. We consider the problem of
efficient sampling: drawing random string variates from the probability
distribution represented by stochastic automata and transformations of those.
We show that path-sampling is effective and can be efficient if the
epsilon-graph of a finite automaton is acyclic. We provide an algorithm that
ensures this by conflating epsilon-cycles within strongly connected components.
Sampling is also effective in the presence of non-injective transformations of
strings. We illustrate this in the context of decoding for Connectionist
Temporal Classification (CTC), where the predictive probabilities yield
auxiliary sequences which are transformed into shorter labeling strings. We can
sample efficiently from the transformed labeling distribution and use this in
two different strategies for finding the most probable CTC labeling
- …