5,820 research outputs found
Deep Contextualized Acoustic Representations For Semi-Supervised Speech Recognition
We propose a novel approach to semi-supervised automatic speech recognition
(ASR). We first exploit a large amount of unlabeled audio data via
representation learning, where we reconstruct a temporal slice of filterbank
features from past and future context frames. The resulting deep contextualized
acoustic representations (DeCoAR) are then used to train a CTC-based end-to-end
ASR system using a smaller amount of labeled audio data. In our experiments, we
show that systems trained on DeCoAR consistently outperform ones trained on
conventional filterbank features, giving 42% and 19% relative improvement over
the baseline on WSJ eval92 and LibriSpeech test-clean, respectively. Our
approach can drastically reduce the amount of labeled data required;
unsupervised training on LibriSpeech then supervision with 100 hours of labeled
data achieves performance on par with training on all 960 hours directly.
Pre-trained models and code will be released online.Comment: Accepted to ICASSP 2020 (oral
Natural language processing
Beginning with the basic issues of NLP, this chapter aims to chart the major research activities in this area since the last ARIST Chapter in 1996 (Haas, 1996), including: (i) natural language text processing systems - text summarization, information extraction, information retrieval, etc., including domain-specific applications; (ii) natural language interfaces; (iii) NLP in the context of www and digital libraries ; and (iv) evaluation of NLP systems
What's Cookin'? Interpreting Cooking Videos using Text, Speech and Vision
We present a novel method for aligning a sequence of instructions to a video
of someone carrying out a task. In particular, we focus on the cooking domain,
where the instructions correspond to the recipe. Our technique relies on an HMM
to align the recipe steps to the (automatically generated) speech transcript.
We then refine this alignment using a state-of-the-art visual food detector,
based on a deep convolutional neural network. We show that our technique
outperforms simpler techniques based on keyword spotting. It also enables
interesting applications, such as automatically illustrating recipes with
keyframes, and searching within a video for events of interest.Comment: To appear in NAACL 201
- …