Search CORE

166 research outputs found

Heterophonic speech recognition using composite phones

Author: CJ Leggetter
DL Hinton
F Jelinek
GE Dahl
H Soltau
JP Olive
K Kirchhoff
L Lamel
M Abushariaha
T Demeechai
Y El-Imam
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Automatic Speech Recognition for Low-resource Languages and Accents Using Multilingual and Crosslingual Information

Author: Vu Ngoc Thang
Publication venue: KIT-Bibliothek, Karlsruhe
Publication date: 01/01/2014
Field of study

This thesis explores methods to rapidly bootstrap automatic speech recognition systems for languages, which lack resources for speech and language processing. We focus on finding approaches which allow using data from multiple languages to improve the performance for those languages on different levels, such as feature extraction, acoustic modeling and language modeling. Under application aspects, this thesis also includes research work on non-native and Code-Switching speech

KITopen

Recommended from our members

Story Segmentation of Broadcast News in English, Mandarin and Arabic

Author: Hirschberg Julia Bell
Rosenberg Andrew
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2006
Field of study

In this paper, we present results from a Broadcast News story segmentation system developed for the SRI NIGHTINGALE system operating on English, Arabic and Mandarin news shows to provide input to subsequent question-answering processes. Using a rule-induction algorithm with automatically extracted acoustic and lexical features, we report success rates that are competitive with state-of-the-art systems on each input language. We further demonstrate that features useful for English and Mandarin are not discriminative for Arabic

Columbia University Academic Commons

Stimulated training for automatic speech recognition and keyword search in limited resource conditions

Author: Gales MJF
Knill KM
Ragni A
Vasilakes J
Wu C
Publication venue: ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
Publication date: 19/06/2017
Field of study

© 2017 IEEE. Training neural network acoustic models on limited quantities of data is a challenging task. A number of techniques have been proposed to improve generalisation. This paper investigates one such technique called stimulated training. It enables standard criteria such as cross-entropy to enforce spatial constraints on activations originating from different units. Having different regions being active depending on the input unit may help network to discriminate better and as a consequence yield lower error rates. This paper investigates stimulated training for automatic speech recognition of a number of languages representing different families, alphabets, phone sets and vocabulary sizes. In particular, it looks at ensembles of stimulated networks to ensure that improved generalisation will withstand system combination effects. In order to assess stimulated training beyond 1-best transcription accuracy, this paper looks at keyword search as a proxy for assessing quality of lattices. Experiments are conducted on IARPA Babel program languages including the surprise language of OpenKWS 2016 competition

Crossref

Apollo (Cambridge)

White Rose Research Online

Morphological, syntactic and diacritics rules for automatic diacritization of Arabic sentences

Author: Chennoufi Amine
Mazroui Azzeddine
Publication venue: The Authors. Production and hosting by Elsevier B.V. on behalf of King Saud University.
Publication date: 01/04/2017
Field of study

AbstractThe diacritical marks of Arabic language are characters other than letters and are in the majority of cases absent from Arab writings. This paper presents a hybrid system for automatic diacritization of Arabic sentences combining linguistic rules and statistical treatments. The used approach is based on four stages. The first phase consists of a morphological analysis using the second version of the morphological analyzer Alkhalil Morpho Sys. Morphosyntactic outputs from this step are used in the second phase to eliminate invalid word transitions according to the syntactic rules. Then, the system used in the third stage is a discrete hidden Markov model and Viterbi algorithm to determine the most probable diacritized sentence. The unseen transitions in the training corpus are processed using smoothing techniques. Finally, the last step deals with words not analyzed by Alkhalil analyzer, for which we use statistical treatments based on the letters. The word error rate of our system is around 2.58% if we ignore the diacritic of the last letter of the word and around 6.28% when this diacritic is taken into account

Elsevier - Publisher Connector

Directory of Open Access Journals

cmu gale speech-to-text system,”

Author: Florian Metze
Qin Jin
Roger Hsiao
Tanja Schultz
Udhyakumar Nallasamy
Publication venue
Publication date: 01/01/2010
Field of study

Abstract This paper describes the latest Speech-to-Text system developed for the Global Autonomous Language Exploitation ("GALE") domain by Carnegie Mellon University (CMU). This systems uses discriminative training, bottle-neck features and other techniques that were not used in previous versions of our system, and is trained on 1150 hours of data from a variety of Arabic speech sources. In this paper, we show how different lexica, pre-processing, and system combination techniques can be used to improve the final output, and provide analysis of the improvements achieved by the individual techniques

CiteSeerX