6,560 research outputs found
Hierarchical Character-Word Models for Language Identification
Social media messages' brevity and unconventional spelling pose a challenge
to language identification. We introduce a hierarchical model that learns
character and contextualized word-level representations for language
identification. Our method performs well against strong base- lines, and can
also reveal code-switching
Language-specific Acoustic Boundary Learning for Mandarin-English Code-switching Speech Recognition
Code-switching speech recognition (CSSR) transcribes speech that switches
between multiple languages or dialects within a single sentence. The main
challenge in this task is that different languages often have similar
pronunciations, making it difficult for models to distinguish between them. In
this paper, we propose a method for solving the CSSR task from the perspective
of language-specific acoustic boundary learning. We introduce language-specific
weight estimators (LSWE) to model acoustic boundary learning in different
languages separately. Additionally, a non-autoregressive (NAR) decoder and a
language change detection (LCD) module are employed to assist in training.
Evaluated on the SEAME corpus, our method achieves a state-of-the-art mixed
error rate (MER) of 16.29% and 22.81% on the test_man and test_sge sets. We
also demonstrate the effectiveness of our method on a 9000-hour in-house
meeting code-switching dataset, where our method achieves a relatively 7.9% MER
reduction
A sticky HDP-HMM with application to speaker diarization
We consider the problem of speaker diarization, the problem of segmenting an
audio recording of a meeting into temporal segments corresponding to individual
speakers. The problem is rendered particularly difficult by the fact that we
are not allowed to assume knowledge of the number of people participating in
the meeting. To address this problem, we take a Bayesian nonparametric approach
to speaker diarization that builds on the hierarchical Dirichlet process hidden
Markov model (HDP-HMM) of Teh et al. [J. Amer. Statist. Assoc. 101 (2006)
1566--1581]. Although the basic HDP-HMM tends to over-segment the audio
data---creating redundant states and rapidly switching among them---we describe
an augmented HDP-HMM that provides effective control over the switching rate.
We also show that this augmentation makes it possible to treat emission
distributions nonparametrically. To scale the resulting architecture to
realistic diarization problems, we develop a sampling algorithm that employs a
truncated approximation of the Dirichlet process to jointly resample the full
state sequence, greatly improving mixing rates. Working with a benchmark NIST
data set, we show that our Bayesian nonparametric architecture yields
state-of-the-art speaker diarization results.Comment: Published in at http://dx.doi.org/10.1214/10-AOAS395 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Multi-Graph Decoding for Code-Switching ASR
In the FAME! Project, a code-switching (CS) automatic speech recognition
(ASR) system for Frisian-Dutch speech is developed that can accurately
transcribe the local broadcaster's bilingual archives with CS speech. This
archive contains recordings with monolingual Frisian and Dutch speech segments
as well as Frisian-Dutch CS speech, hence the recognition performance on
monolingual segments is also vital for accurate transcriptions. In this work,
we propose a multi-graph decoding and rescoring strategy using bilingual and
monolingual graphs together with a unified acoustic model for CS ASR. The
proposed decoding scheme gives the freedom to design and employ alternative
search spaces for each (monolingual or bilingual) recognition task and enables
the effective use of monolingual resources of the high-resourced mixed language
in low-resourced CS scenarios. In our scenario, Dutch is the high-resourced and
Frisian is the low-resourced language. We therefore use additional monolingual
Dutch text resources to improve the Dutch language model (LM) and compare the
performance of single- and multi-graph CS ASR systems on Dutch segments using
larger Dutch LMs. The ASR results show that the proposed approach outperforms
baseline single-graph CS ASR systems, providing better performance on the
monolingual Dutch segments without any accuracy loss on monolingual Frisian and
code-mixed segments.Comment: Accepted for publication at Interspeech 201
Towards Zero-Shot Code-Switched Speech Recognition
In this work, we seek to build effective code-switched (CS) automatic speech
recognition systems (ASR) under the zero-shot setting where no transcribed CS
speech data is available for training. Previously proposed frameworks which
conditionally factorize the bilingual task into its constituent monolingual
parts are a promising starting point for leveraging monolingual data
efficiently. However, these methods require the monolingual modules to perform
language segmentation. That is, each monolingual module has to simultaneously
detect CS points and transcribe speech segments of one language while ignoring
those of other languages -- not a trivial task. We propose to simplify each
monolingual module by allowing them to transcribe all speech segments
indiscriminately with a monolingual script (i.e. transliteration). This simple
modification passes the responsibility of CS point detection to subsequent
bilingual modules which determine the final output by considering multiple
monolingual transliterations along with external language model information. We
apply this transliteration-based approach in an end-to-end differentiable
neural network and demonstrate its efficacy for zero-shot CS ASR on
Mandarin-English SEAME test sets.Comment: 5 page
- …