14,481 research outputs found
Deep Learning for Distant Speech Recognition
Deep learning is an emerging technology that is considered one of the most
promising directions for reaching higher levels of artificial intelligence.
Among the other achievements, building computers that understand speech
represents a crucial leap towards intelligent machines. Despite the great
efforts of the past decades, however, a natural and robust human-machine speech
interaction still appears to be out of reach, especially when users interact
with a distant microphone in noisy and reverberant environments. The latter
disturbances severely hamper the intelligibility of a speech signal, making
Distant Speech Recognition (DSR) one of the major open challenges in the field.
This thesis addresses the latter scenario and proposes some novel techniques,
architectures, and algorithms to improve the robustness of distant-talking
acoustic models. We first elaborate on methodologies for realistic data
contamination, with a particular emphasis on DNN training with simulated data.
We then investigate on approaches for better exploiting speech contexts,
proposing some original methodologies for both feed-forward and recurrent
neural networks. Lastly, inspired by the idea that cooperation across different
DNNs could be the key for counteracting the harmful effects of noise and
reverberation, we propose a novel deep learning paradigm called network of deep
neural networks. The analysis of the original concepts were based on extensive
experimental validations conducted on both real and simulated data, considering
different corpora, microphone configurations, environments, noisy conditions,
and ASR tasks.Comment: PhD Thesis Unitn, 201
Named Entity Recognition Only from Word Embeddings
Deep neural network models have helped named entity (NE) recognition achieve
amazing performance without handcrafting features. However, existing systems
require large amounts of human annotated training data. Efforts have been made
to replace human annotations with external knowledge (e.g., NE dictionary,
part-of-speech tags), while it is another challenge to obtain such effective
resources. In this work, we propose a fully unsupervised NE recognition model
which only needs to take informative clues from pre-trained word embeddings. We
first apply Gaussian Hidden Markov Model and Deep Autoencoding Gaussian Mixture
Model on word embeddings for entity span detection and type prediction, and
then further design an instance selector based on reinforcement learning to
distinguish positive sentences from noisy sentences and refine these
coarse-grained annotations through neural networks. Extensive experiments on
CoNLL benchmark datasets demonstrate that our proposed light NE recognition
model achieves remarkable performance without using any annotated lexicon or
corpus.Comment: Accepted by EMNLP202
- …