3 research outputs found
On Training Recurrent Networks with Truncated Backpropagation Through Time in Speech Recognition
Recurrent neural networks have been the dominant models for many speech and
language processing tasks. However, we understand little about the behavior and
the class of functions recurrent networks can realize. Moreover, the heuristics
used during training complicate the analyses. In this paper, we study recurrent
networks' ability to learn long-term dependency in the context of speech
recognition. We consider two decoding approaches, online and batch decoding,
and show the classes of functions to which the decoding approaches correspond.
We then draw a connection between batch decoding and a popular training
approach for recurrent networks, truncated backpropagation through time.
Changing the decoding approach restricts the amount of past history recurrent
networks can use for prediction, allowing us to analyze their ability to
remember. Empirically, we utilize long-term dependency in subphonetic states,
phonemes, and words, and show how the design decisions, such as the decoding
approach, lookahead, context frames, and consecutive prediction, characterize
the behavior of recurrent networks. Finally, we draw a connection between
Markov processes and vanishing gradients. These results have implications for
studying the long-term dependency in speech data and how these properties are
learned by recurrent networks
On The Inductive Bias of Words in Acoustics-to-Word Models
Acoustics-to-word models are end-to-end speech recognizers that use words as
targets without relying on pronunciation dictionaries or graphemes. These
models are notoriously difficult to train due to the lack of linguistic
knowledge. It is also unclear how the amount of training data impacts the
optimization and generalization of such models. In this work, we study the
optimization and generalization of acoustics-to-word models under different
amounts of training data. In addition, we study three types of inductive bias,
leveraging a pronunciation dictionary, word boundary annotations, and
constraints on word durations. We find that constraining word durations leads
to the most improvement. Finally, we analyze the word embedding space learned
by the model, and find that the space has a structure dominated by the
pronunciation of words. This suggests that the contexts of words, instead of
their phonetic structure, should be the future focus of inductive bias in
acoustics-to-word models
AI-enabled Prediction of eSports Player Performance Using the Data from Heterogeneous Sensors
The emerging progress of eSports lacks the tools for ensuring high-quality
analytics and training in Pro and amateur eSports teams. We report on an
Artificial Intelligence (AI) enabled solution for predicting the eSports player
in-game performance using exclusively the data from sensors. For this reason,
we collected the physiological, environmental, and the game chair data from Pro
and amateur players. The player performance is assessed from the game logs in a
multiplayer game for each moment of time using a recurrent neural network. We
have investigated that attention mechanism improves the generalization of the
network and provides the straightforward feature importance as well. The best
model achieves ROC AUC score 0.73. The prediction of the performance of
particular player is realized although his data are not utilized in the
training set. The proposed solution has a number of promising applications for
Pro eSports teams as well as a learning tool for amateur players