14,548 research outputs found
Multilingual training of deep neural networks
We investigate multilingual modeling in the context of a deep neural network (DNN) – hidden Markov model (HMM) hy-brid, where the DNN outputs are used as the HMM state like-lihoods. By viewing neural networks as a cascade of fea-ture extractors followed by a logistic regression classifier, we hypothesise that the hidden layers, which act as feature ex-tractors, will be transferable between languages. As a corol-lary, we propose that training the hidden layers on multiple languages makes them more suitable for such cross-lingual transfer. We experimentally confirm these hypotheses on the GlobalPhone corpus using seven languages from three dif-ferent language families: Germanic, Romance, and Slavic. The experiments demonstrate substantial improvements over a monolingual DNN-HMM hybrid baseline, and hint at av-enues of further exploration. Index Terms — Speech recognition, deep learning, neural networks, multilingual modelin
From feature to paradigm: deep learning in machine translation
In the last years, deep learning algorithms have highly revolutionized several areas including speech, image and natural language processing. The specific field of Machine Translation (MT) has not remained invariant. Integration of deep learning in MT varies from re-modeling existing features into standard statistical systems to the development of a new architecture. Among the different neural networks, research works use feed- forward neural networks, recurrent neural networks and the encoder-decoder schema. These architectures are able to tackle challenges as having low-resources or morphology variations. This manuscript focuses on describing how these neural networks have been integrated to enhance different aspects and models from statistical MT, including language modeling, word alignment, translation, reordering, and rescoring. Then, we report the new neural MT approach together with a description of the foundational related works and recent approaches on using subword, characters and training with multilingual languages, among others. Finally, we include an analysis of the corresponding challenges and future work in using deep learning in MTPostprint (author's final draft
Transfer Learning for Speech and Language Processing
Transfer learning is a vital technique that generalizes models trained for
one setting or task to other settings or tasks. For example in speech
recognition, an acoustic model trained for one language can be used to
recognize speech in another language, with little or no re-training data.
Transfer learning is closely related to multi-task learning (cross-lingual vs.
multilingual), and is traditionally studied in the name of `model adaptation'.
Recent advance in deep learning shows that transfer learning becomes much
easier and more effective with high-level abstract features learned by deep
models, and the `transfer' can be conducted not only between data distributions
and data types, but also between model structures (e.g., shallow nets and deep
nets) or even model types (e.g., Bayesian models and neural models). This
review paper summarizes some recent prominent research towards this direction,
particularly for speech and language processing. We also report some results
from our group and highlight the potential of this very interesting research
field.Comment: 13 pages, APSIPA 201
Multilingual Training and Cross-lingual Adaptation on CTC-based Acoustic Model
Multilingual models for Automatic Speech Recognition (ASR) are attractive as
they have been shown to benefit from more training data, and better lend
themselves to adaptation to under-resourced languages. However, initialisation
from monolingual context-dependent models leads to an explosion of
context-dependent states. Connectionist Temporal Classification (CTC) is a
potential solution to this as it performs well with monophone labels.
We investigate multilingual CTC in the context of adaptation and
regularisation techniques that have been shown to be beneficial in more
conventional contexts. The multilingual model is trained to model a universal
International Phonetic Alphabet (IPA)-based phone set using the CTC loss
function. Learning Hidden Unit Contribution (LHUC) is investigated to perform
language adaptive training. In addition, dropout during cross-lingual
adaptation is also studied and tested in order to mitigate the overfitting
problem.
Experiments show that the performance of the universal phoneme-based CTC
system can be improved by applying LHUC and it is extensible to new phonemes
during cross-lingual adaptation. Updating all the parameters shows consistent
improvement on limited data. Applying dropout during adaptation can further
improve the system and achieve competitive performance with Deep Neural Network
/ Hidden Markov Model (DNN/HMM) systems on limited data
On Multilingual Training of Neural Dependency Parsers
We show that a recently proposed neural dependency parser can be improved by
joint training on multiple languages from the same family. The parser is
implemented as a deep neural network whose only input is orthographic
representations of words. In order to successfully parse, the network has to
discover how linguistically relevant concepts can be inferred from word
spellings. We analyze the representations of characters and words that are
learned by the network to establish which properties of languages were
accounted for. In particular we show that the parser has approximately learned
to associate Latin characters with their Cyrillic counterparts and that it can
group Polish and Russian words that have a similar grammatical function.
Finally, we evaluate the parser on selected languages from the Universal
Dependencies dataset and show that it is competitive with other recently
proposed state-of-the art methods, while having a simple structure.Comment: preprint accepted into the TSD201
Multilingual Adaptation of RNN Based ASR Systems
In this work, we focus on multilingual systems based on recurrent neural
networks (RNNs), trained using the Connectionist Temporal Classification (CTC)
loss function. Using a multilingual set of acoustic units poses difficulties.
To address this issue, we proposed Language Feature Vectors (LFVs) to train
language adaptive multilingual systems. Language adaptation, in contrast to
speaker adaptation, needs to be applied not only on the feature level, but also
to deeper layers of the network. In this work, we therefore extended our
previous approach by introducing a novel technique which we call "modulation".
Based on this method, we modulated the hidden layers of RNNs using LFVs. We
evaluated this approach in both full and low resource conditions, as well as
for grapheme and phone based systems. Lower error rates throughout the
different conditions could be achieved by the use of the modulation.Comment: 5 pages, 1 figure, to appear in 2018 IEEE International Conference on
Acoustics, Speech and Signal Processing (ICASSP 2018
- …