Search CORE

123 research outputs found

Automatsko raspoznavanje hrvatskoga govora velikoga vokabulara

Author: Ivo Ipšić
Miran Pobar
Sanda Martinčić-Ipšić
Publication venue: KoREMA - Croatian Society for Communications, Computing, Electronics, Measurement and Control
Publication date: 01/01/2011
Field of study

This paper presents procedures used for development of a Croatian large vocabulary automatic speech recognition system (LVASR). The proposed acoustic model is based on context-dependent triphone hidden Markov models and Croatian phonetic rules. Different acoustic and language models, developed using a large collection of Croatian speech, are discussed and compared. The paper proposes the best feature vectors and acoustic modeling procedures using which lowest word error rates for Croatian speech are achieved. In addition, Croatian language modeling procedures are evaluated and adopted for speaker independent spontaneous speech recognition. Presented experiments and results show that the proposed approach for automatic speech recognition using context-dependent acoustic modeling based on Croatian phonetic rules and a parameter tying procedure can be used for efﬁcient Croatian large vocabulary speech recognition with word error rates below 5%.Članak prikazuje postupke akustičkog i jezičnog modeliranja sustava za automatsko raspoznavanje hrvatskoga govora velikoga vokabulara. Predloženi akustički modeli su zasnovani na kontekstno-ovisnim skrivenim Markovljevim modelima trifona i hrvatskim fonetskim pravilima. Na hrvatskome govoru prikupljenom u korpusu su ocjenjeni i uspoređeni različiti akustički i jezični modeli. U članku su uspoređ eni i predloženi postupci za izračun vektora značajki za akustičko modeliranje kao i sam pristup akustičkome modeliranju hrvatskoga govora s kojim je postignuta najmanja mjera pogrešno raspoznatih riječi. Predstavljeni su rezultati raspoznavanja spontanog hrvatskog govora neovisni o govorniku. Postignuti rezultati eksperimenata s mjerom pogreške ispod 5% ukazuju na primjerenost predloženih postupaka za automatsko raspoznavanje hrvatskoga govora velikoga vokabulara pomoću vezanih kontekstnoovisnih akustičkih modela na osnovu hrvatskih fonetskih pravila

HRČAK - Portal of Croatian Scientific and Professional Journals

Hrčak - Portal of scientific journals of Croatia

Croatian Speech Recognition

Author: Ivo Ipsic
Sanda Martincic-Ipsic
Publication venue: 'IntechOpen'
Publication date: 16/08/2010
Field of study

IntechOpen

Speech Recognition for Agglutinative Languages

Author: Thangarajan R.
Publication venue: 'IntechOpen'
Publication date: 28/11/2012
Field of study

IntechOpen

Statistical Language Modeling for Automatic Speech Recognition of Agglutinative Languages

Author: Ebru Ar&#305
Ha&#351
Janne Pylkk&#246
Mikko Kurimo
Murat Sara&#231
Tanel Alum&#228
Teemu Hirsim&#228
Publication venue: 'IntechOpen'
Publication date: 01/11/2008
Field of study

IntechOpen

Speech Recognition System of Slovenian Broadcast News

Author: Sepesy Maučec Mirjam
Žgank Andrej
Publication venue: 'IntechOpen'
Publication date: 13/06/2011
Field of study

IntechOpen

Digital library of University of Maribor

Advances in unlimited-vocabulary speech recognition for morphologically rich languages

Author: Hirsimäki Teemu
Publication venue: Teknillinen korkeakoulu
Publication date: 01/01/2009
Field of study

Automatic speech recognition systems are devices or computer programs that convert human speech into text or make actions based on what is said to the system. Typical applications include dictation, automatic transcription of large audio or video databases, speech-controlled user interfaces, and automated telephone services, for example. If the recognition system is not limited to a certain topic and vocabulary, covering the words in the target languages as well as possible while maintaining a high recognition accuracy becomes an issue. The conventional way to model the target language, especially in English recognition systems, is to limit the recognition to the most common words of the language. A vocabulary of 60 000 words is usually enough to cover the language adequately for arbitrary topics. On the other hand, in morphologically rich languages, such as Finnish, Estonian and Turkish, long words can be formed by inflecting and compounding, which makes it difficult to cover the language adequately by vocabulary-based approaches. This thesis deals with methods that can be used to build efficient speech recognition systems for morphologically rich languages. Before training the statistical n-gram language models on a large text corpus, the words in the corpus are automatically segmented into smaller fragments, referred to as morphs. The morphs are then used as modelling units of the n-gram models instead of whole words. This makes it possible to train the model on the whole text corpus without limiting the vocabulary and enables the model to create even unseen words by joining morphs together. Since the segmentation algorithm is unsupervised and data-driven, it can be readily used for many languages. Speech recognition experiments are made on various Finnish recognition tasks and some of the experiments are also repeated on an Estonian task. It is shown that the morph-based language models reduce recognition errors when compared to word-based models. It seems to be important, however, that the n-gram models are allowed to use long morph contexts, especially if the morphs used by the model are short. This can be achieved by using growing and pruning algorithms to train variable-length n-gram models. The thesis also presents data structures that can be used for representing the variable-length n-gram models efficiently in recognition systems. By analysing the recognition errors made by Finnish recognition systems it is found out that speaker adaptive training and discriminative training methods help to reduce errors in different situations. The errors are also analysed according to word frequencies and manually defined error classes

Aaltodoc Publication Archive

The Zero Resource Speech Challenge 2019: TTS without T

Author: Algayres Robin
Benjumea Juan
Bernard Mathieu
Besacier Laurent
Black Alan,
Cao Xuan-Nga
Dugrain Charlotte
Dunbar Ewan
Dupoux Emmanuel
Karadayi Julien
Miskic Lucie
Ondel Lucas
Sakti Sakriani
Publication venue: HAL CCSD
Publication date: 15/09/2019
Field of study

International audienceWe present the Zero Resource Speech Challenge 2019, which proposes to build a speech synthesizer without any text or pho-netic labels: hence, TTS without T (text-to-speech without text). We provide raw audio for a target voice in an unknown language (the Voice dataset), but no alignment, text or labels. Participants must discover subword units in an unsupervised way (using the Unit Discovery dataset) and align them to the voice recordings in a way that works best for the purpose of synthesizing novel utterances from novel speakers, similar to the target speaker's voice. We describe the metrics used for evaluation , a baseline system consisting of unsupervised subword unit discovery plus a standard TTS system, and a topline TTS using gold phoneme transcriptions. We present an overview of the 19 submitted systems from 10 teams and discuss the main results

Acta Cybernetica : Volume 19. Number 4.

Author
Publication venue
Publication date: 01/01/2010
Field of study

University of Szeged

Towards Universal Speech Recognition

Author: Schultz Tanja
Topkara Umut
Waibel Alex
Wang Zhirong
Publication venue
Publication date: 12/06/2008
Field of study

KITopen