32 research outputs found
Multi-Graph Decoding for Code-Switching ASR
In the FAME! Project, a code-switching (CS) automatic speech recognition
(ASR) system for Frisian-Dutch speech is developed that can accurately
transcribe the local broadcaster's bilingual archives with CS speech. This
archive contains recordings with monolingual Frisian and Dutch speech segments
as well as Frisian-Dutch CS speech, hence the recognition performance on
monolingual segments is also vital for accurate transcriptions. In this work,
we propose a multi-graph decoding and rescoring strategy using bilingual and
monolingual graphs together with a unified acoustic model for CS ASR. The
proposed decoding scheme gives the freedom to design and employ alternative
search spaces for each (monolingual or bilingual) recognition task and enables
the effective use of monolingual resources of the high-resourced mixed language
in low-resourced CS scenarios. In our scenario, Dutch is the high-resourced and
Frisian is the low-resourced language. We therefore use additional monolingual
Dutch text resources to improve the Dutch language model (LM) and compare the
performance of single- and multi-graph CS ASR systems on Dutch segments using
larger Dutch LMs. The ASR results show that the proposed approach outperforms
baseline single-graph CS ASR systems, providing better performance on the
monolingual Dutch segments without any accuracy loss on monolingual Frisian and
code-mixed segments.Comment: Accepted for publication at Interspeech 201
ASR-free CNN-DTW keyword spotting using multilingual bottleneck features for almost zero-resource languages
We consider multilingual bottleneck features (BNFs) for nearly zero-resource
keyword spotting. This forms part of a United Nations effort using keyword
spotting to support humanitarian relief programmes in parts of Africa where
languages are severely under-resourced. We use 1920 isolated keywords (40
types, 34 minutes) as exemplars for dynamic time warping (DTW) template
matching, which is performed on a much larger body of untranscribed speech.
These DTW costs are used as targets for a convolutional neural network (CNN)
keyword spotter, giving a much faster system than direct DTW. Here we consider
how available data from well-resourced languages can improve this CNN-DTW
approach. We show that multilingual BNFs trained on ten languages improve the
area under the ROC curve of a CNN-DTW system by 10.9% absolute relative to the
MFCC baseline. By combining low-resource DTW-based supervision with information
from well-resourced languages, CNN-DTW is a competitive option for low-resource
keyword spotting.Comment: 5 pages, 3 figures, 3 tables, 1 equation accepted at SLTU 201