3,199 research outputs found
Automatic Speech Segmentation Based on HMM
This contribution deals with the problem of automatic phoneme segmentation using HMMs. Automatization of speech segmentation task is important for applications, where large amount of data is needed to process, so manual segmentation is out of the question. In this paper we focus on automatic segmentation of recordings, which will be used for triphone synthesis unit database creation. For speech synthesis, the speech unit quality is a crucial aspect, so the maximal accuracy in segmentation is needed here. In this work, different kinds of HMMs with various parameters have been trained and their usefulness for automatic segmentation is discussed. At the end of this work, some segmentation accuracy tests of all models are presented
Determination of Formant Features in Czech and Slovak for GMM Emotional Speech Classifier
The paper is aimed at determination of formant features (FF) which describe vocal tract characteristics. It comprises analysis of the first three formant positions together with their bandwidths and the formant tilts. Subsequently, the statistical evaluation and comparison of the FF was performed. This experiment was realized with the speech material in the form of sentences of male and female speakers expressing four emotional states (joy, sadness, anger, and a neutral state) in Czech and Slovak languages. The statistical distribution of the analyzed formant frequencies and formant tilts shows good differentiation between neutral and emotional styles for both voices. Contrary to it, the values of the formant 3-dB bandwidths have no correlation with the type of the speaking style or the type of the voice. These spectral parameters together with the values of the other speech characteristics were used in the feature vector for Gaussian mixture models (GMM) emotional speech style classifier that is currently developed. The overall mean classification error rate achieves about 18 %, and the best obtained error rate is 5 % for the sadness style of the female voice. These values are acceptable in this first stage of development of the GMM classifier that should be used for evaluation of the synthetic speech quality after applied voice conversion and emotional speech style transformation
A Finite State and Data-Oriented Method for Grapheme to Phoneme Conversion
A finite-state method, based on leftmost longest-match replacement, is
presented for segmenting words into graphemes, and for converting graphemes
into phonemes. A small set of hand-crafted conversion rules for Dutch achieves
a phoneme accuracy of over 93%. The accuracy of the system is further improved
by using transformation-based learning. The phoneme accuracy of the best system
(using a large set of rule templates and a `lazy' variant of Brill's algoritm),
trained on only 40K words, reaches 99% accuracy.Comment: 8 page
Lattice score based data cleaning for phrase-based statistical machine translation
Statistical machine translation relies heavily
on parallel corpora to train its models
for translation tasks. While more and
more bilingual corpora are readily available,
the quality of the sentence pairs
should be taken into consideration. This
paper presents a novel lattice score-based
data cleaning method to select proper sentence
pairs from the ones extracted from a
bilingual corpus by the sentence alignment
methods. The proposed method is carried
out as follows: firstly, an initial phrasebased
model is trained on the full sentencealigned
corpus; then for each of the sentence
pairs in the corpus, word alignments
are used to create anchor pairs and sourceside
lattices; thirdly, based on the translation
model, target-side phrase networks
are expanded on the lattices and Viterbi
searching is used to find approximated decoding
results; finally, BLEU score thresholds
are used to filter out the low-score
sentence pairs for the data cleaning purpose.
Our experiments on the FBIS corpus
showed improvements of BLEU score
from 23.78 to 24.02 in Chinese-English
Natural language processing
Beginning with the basic issues of NLP, this chapter aims to chart the major research activities in this area since the last ARIST Chapter in 1996 (Haas, 1996), including: (i) natural language text processing systems - text summarization, information extraction, information retrieval, etc., including domain-specific applications; (ii) natural language interfaces; (iii) NLP in the context of www and digital libraries ; and (iv) evaluation of NLP systems
Automatsko raspoznavanje hrvatskoga govora velikoga vokabulara
This paper presents procedures used for development of a Croatian large vocabulary automatic speech recognition system (LVASR). The proposed acoustic model is based on context-dependent triphone hidden Markov models and Croatian phonetic rules. Different acoustic and language models, developed using a large collection of Croatian speech, are discussed and compared. The paper proposes the best feature vectors and acoustic modeling procedures using which lowest word error rates for Croatian speech are achieved. In addition, Croatian language modeling procedures are evaluated and adopted for speaker independent spontaneous speech recognition. Presented experiments and results show that the proposed approach for automatic speech recognition using context-dependent acoustic modeling based on Croatian phonetic rules and a parameter tying procedure can be used for efficient Croatian large vocabulary speech recognition with word error rates below 5%.Članak prikazuje postupke akustičkog i jezičnog modeliranja sustava za automatsko raspoznavanje hrvatskoga govora velikoga vokabulara. Predloženi akustički modeli su zasnovani na kontekstno-ovisnim skrivenim Markovljevim modelima trifona i hrvatskim fonetskim pravilima. Na hrvatskome govoru prikupljenom u korpusu su ocjenjeni i uspoređeni različiti akustički i jezični modeli. U članku su uspoređ eni i predloženi postupci za izračun vektora značajki za akustičko modeliranje kao i sam pristup akustičkome modeliranju hrvatskoga govora s kojim je postignuta najmanja mjera pogrešno raspoznatih riječi. Predstavljeni su rezultati raspoznavanja spontanog hrvatskog govora neovisni o govorniku. Postignuti rezultati eksperimenata s mjerom pogreške ispod 5% ukazuju na primjerenost predloženih postupaka za automatsko raspoznavanje hrvatskoga govora velikoga vokabulara pomoću vezanih kontekstnoovisnih akustičkih modela na osnovu hrvatskih fonetskih pravila
- …