5,716 research outputs found
Integration of Language Identification into a Recognition System for Spoken Conversations Containing Code-Switches
ABSTRACT This paper describes the integration of language identification (LID) into a multilingual automatic speech recognition (ASR) system for spoken conversations containing code-switches between Mandarin and English. We apply a multistream approach to combine at frame level the acoustic model score and the language information, where the latter is provided by an LID component. Furthermore, we advance this multistream approach by a new method called "Language Lookahead", in which the language information of subsequent frames is used to improve accuracy. Both methods are evaluated using a set of controlled LID results with varying frame accuracies. Our results show that both approaches improve the ASR performance by at least 4% relative if the LID achieves a minimum frame accuracy of 85%
Multi-Graph Decoding for Code-Switching ASR
In the FAME! Project, a code-switching (CS) automatic speech recognition
(ASR) system for Frisian-Dutch speech is developed that can accurately
transcribe the local broadcaster's bilingual archives with CS speech. This
archive contains recordings with monolingual Frisian and Dutch speech segments
as well as Frisian-Dutch CS speech, hence the recognition performance on
monolingual segments is also vital for accurate transcriptions. In this work,
we propose a multi-graph decoding and rescoring strategy using bilingual and
monolingual graphs together with a unified acoustic model for CS ASR. The
proposed decoding scheme gives the freedom to design and employ alternative
search spaces for each (monolingual or bilingual) recognition task and enables
the effective use of monolingual resources of the high-resourced mixed language
in low-resourced CS scenarios. In our scenario, Dutch is the high-resourced and
Frisian is the low-resourced language. We therefore use additional monolingual
Dutch text resources to improve the Dutch language model (LM) and compare the
performance of single- and multi-graph CS ASR systems on Dutch segments using
larger Dutch LMs. The ASR results show that the proposed approach outperforms
baseline single-graph CS ASR systems, providing better performance on the
monolingual Dutch segments without any accuracy loss on monolingual Frisian and
code-mixed segments.Comment: Accepted for publication at Interspeech 201
Computational Sociolinguistics: A Survey
Language is a social phenomenon and variation is inherent to its social
nature. Recently, there has been a surge of interest within the computational
linguistics (CL) community in the social dimension of language. In this article
we present a survey of the emerging field of "Computational Sociolinguistics"
that reflects this increased interest. We aim to provide a comprehensive
overview of CL research on sociolinguistic themes, featuring topics such as the
relation between language and social identity, language use in social
interaction and multilingual communication. Moreover, we demonstrate the
potential for synergy between the research communities involved, by showing how
the large-scale data-driven methods that are widely used in CL can complement
existing sociolinguistic studies, and how sociolinguistics can inform and
challenge the methods and assumptions employed in CL studies. We hope to convey
the possible benefits of a closer collaboration between the two communities and
conclude with a discussion of open challenges.Comment: To appear in Computational Linguistics. Accepted for publication:
18th February, 201
Integration of Phonotactic Features for Language Identification on Code-Switched Speech
Abstract: In this paper, phoneme sequences are used as language information to perform code-switched language identification (LID). With the one-pass recognition system, the spoken sounds are converted into phonetically arranged sequences of sounds. The acoustic models are robust enough to handle multiple languages when emulating multiple hidden Markov models (HMMs). To determine the phoneme similarity among our target languages, we reported two methods of phoneme mapping. Statistical phoneme-based bigram language models (LM) are integrated into speech decoding to eliminate possible phone mismatches. The supervised support vector machine (SVM) is used to learn to recognize the phonetic information of mixed-language speech based on recognized phone sequences. As the back-end decision is taken by an SVM, the likelihood scores of segments with monolingual phone occurrence are used to classify language identity. The speech corpus was tested on Sepedi and English languages that are often mixed. Our system is evaluated by measuring both the ASR performance and the LID performance separately. The systems have obtained a promising ASR accuracy with data-driven phone merging approach modelled using 16 Gaussian mixtures per state. In code-switched speech and monolingual speech segments respectively, the proposed systems achieved an acceptable ASR and LID accuracy
Integrating Language Identification to improve Multilingual Speech Recognition
The process of determining the language of a speech utterance is called Language Identification (LID). This task can be very challenging as it has to take into account various language-specific aspects, such as phonetic, phonotactic, vocabulary and grammar-related cues. In multilingual speech recognition we try to find the most likely word sequence that corresponds to an utterance where the language is not known a priori. This is a considerably harder task compared to monolingual speech recognition and it is common to use LID to estimate the current language. In this project we present two general approaches for LID and describe how to integrate them into multilingual speech recognizers. The first approach uses hierarchical multilayer perceptrons to estimate language posterior probabilities given the acoustics in combination with hidden Markov models. The second approach evaluates the output of a multilingual speech recognizer to determine the spoken language. The research is applied to the MediaParl speech corpus that was recorded at the Parliament of the canton of Valais, where people switch from Swiss French to Swiss German or vice versa. Our experiments show that, on that particular data set, LID can be used to significantly improve the performance of multilingual speech recognizers. We will also point out that ASR dependent LID approaches yield the best performance due to higher-level cues and that our systems perform much worse on non-native dat
- …