94,946 research outputs found
Dynamic Extension of ASR Lexicon Using Wikipedia Data
International audienceDespite recent progress in developing Large Vocabulary Continuous Speech Recognition Systems (LVCSR), these systems suffer from Out-Of-Vocabulary words (OOV). In many cases, the OOV words are Proper Nouns (PNs). The correct recognition of PNs is essential for broadcast news, audio indexing, etc. In this article, we address the problem of OOV PN retrieval in the framework of broadcast news LVCSR. We focused on dynamic (document dependent) extension of LVCSR lexicon. To retrieve relevant OOV PNs, we propose to use a very large multipurpose text corpus: Wikipedia. This corpus contains a huge number of PNs. These PNs are grouped in semantically similar classes using word embedding. We use a two-step approach: first, we select OOV PN pertinent classes with a multi-class Deep Neural Network (DNN). Secondly, we rank the OOVs of the selected classes. The experiments on French broadcast news show that the Bi-GRU model outperforms other studied models. Speech recognition experiments demonstrate the effectiveness of the proposed methodology
A Unified Multilingual Handwriting Recognition System using multigrams sub-lexical units
We address the design of a unified multilingual system for handwriting
recognition. Most of multi- lingual systems rests on specialized models that
are trained on a single language and one of them is selected at test time.
While some recognition systems are based on a unified optical model, dealing
with a unified language model remains a major issue, as traditional language
models are generally trained on corpora composed of large word lexicons per
language. Here, we bring a solution by con- sidering language models based on
sub-lexical units, called multigrams. Dealing with multigrams strongly reduces
the lexicon size and thus decreases the language model complexity. This makes
pos- sible the design of an end-to-end unified multilingual recognition system
where both a single optical model and a single language model are trained on
all the languages. We discuss the impact of the language unification on each
model and show that our system reaches state-of-the-art methods perfor- mance
with a strong reduction of the complexity.Comment: preprin
Development of a speech recognition system for Spanish broadcast news
This paper reports on the development process of a speech recognition system for Spanish broadcast news within the MESH FP6 project. The system uses the SONIC recognizer developed at the Center for Spoken Language Research (CSLR), University of Colorado. Acoustic and language models were trained using Hub4 broadcast news data. Experiments and evaluation results are reported
Sequence to Sequence Learning with Neural Networks
Deep Neural Networks (DNNs) are powerful models that have achieved excellent
performance on difficult learning tasks. Although DNNs work well whenever large
labeled training sets are available, they cannot be used to map sequences to
sequences. In this paper, we present a general end-to-end approach to sequence
learning that makes minimal assumptions on the sequence structure. Our method
uses a multilayered Long Short-Term Memory (LSTM) to map the input sequence to
a vector of a fixed dimensionality, and then another deep LSTM to decode the
target sequence from the vector. Our main result is that on an English to
French translation task from the WMT'14 dataset, the translations produced by
the LSTM achieve a BLEU score of 34.8 on the entire test set, where the LSTM's
BLEU score was penalized on out-of-vocabulary words. Additionally, the LSTM did
not have difficulty on long sentences. For comparison, a phrase-based SMT
system achieves a BLEU score of 33.3 on the same dataset. When we used the LSTM
to rerank the 1000 hypotheses produced by the aforementioned SMT system, its
BLEU score increases to 36.5, which is close to the previous best result on
this task. The LSTM also learned sensible phrase and sentence representations
that are sensitive to word order and are relatively invariant to the active and
the passive voice. Finally, we found that reversing the order of the words in
all source sentences (but not target sentences) improved the LSTM's performance
markedly, because doing so introduced many short term dependencies between the
source and the target sentence which made the optimization problem easier.Comment: 9 page
Infants segment words from songs - an EEG study
Childrenâs songs are omnipresent and highly attractive stimuli in infantsâ input. Previous work suggests that infants process linguisticâphonetic information from simplified sung melodies. The present study investigated whether infants learn words from ecologically valid childrenâs songs. Testing 40 Dutch-learning 10-month-olds in a familiarization-then-test electroencephalography (EEG) paradigm, this study asked whether infants can segment repeated target words embedded in songs during familiarization and subsequently recognize those words in continuous speech in the test phase. To replicate previous speech work and compare segmentation across modalities, infants participated in both song and speech sessions. Results showed a positive event-related potential (ERP) familiarity effect to the final compared to the first target occurrences during both song and speech familiarization. No evidence was found for word recognition in the test phase following either song or speech. Comparisons across the stimuli of the present and a comparable previous study suggested that acoustic prominence and speech rate may have contributed to the polarity of the ERP familiarity effect and its absence in the test phase. Overall, the present study provides evidence that 10-month-old infants can segment words embedded in songs, and it raises questions about the acoustic and other factors that enable or hinder infant word segmentation from songs and speech
Towards an Automatic Dictation System for Translators: the TransTalk Project
Professional translators often dictate their translations orally and have
them typed afterwards. The TransTalk project aims at automating the second part
of this process. Its originality as a dictation system lies in the fact that
both the acoustic signal produced by the translator and the source text under
translation are made available to the system. Probable translations of the
source text can be predicted and these predictions used to help the speech
recognition system in its lexical choices. We present the results of the first
prototype, which show a marked improvement in the performance of the speech
recognition task when translation predictions are taken into account.Comment: Published in proceedings of the International Conference on Spoken
Language Processing (ICSLP) 94. 4 pages, uuencoded compressed latex source
with 4 postscript figure
- âŠ