8,400 research outputs found
Spoken content retrieval: A survey of techniques and technologies
Speech media, that is, digital audio and video containing spoken content, has blossomed in recent years. Large collections are accruing on the Internet as well as in private and enterprise settings. This growth has motivated extensive research on techniques and technologies that facilitate reliable indexing and retrieval. Spoken content retrieval (SCR) requires the combination of audio and speech processing technologies with methods from information retrieval (IR). SCR research initially investigated planned speech structured in document-like units, but has subsequently shifted focus to more informal spoken content produced spontaneously, outside of the studio and in conversational settings. This survey provides an overview of the field of SCR encompassing component technologies, the relationship of SCR to text IR and automatic speech recognition and user interaction issues. It is aimed at researchers with backgrounds in speech technology or IR who are seeking deeper insight on how these fields are integrated to support research and development, thus addressing the core challenges of SCR
Automatsko raspoznavanje hrvatskoga govora velikoga vokabulara
This paper presents procedures used for development of a Croatian large vocabulary automatic speech recognition system (LVASR). The proposed acoustic model is based on context-dependent triphone hidden Markov models and Croatian phonetic rules. Different acoustic and language models, developed using a large collection of Croatian speech, are discussed and compared. The paper proposes the best feature vectors and acoustic modeling procedures using which lowest word error rates for Croatian speech are achieved. In addition, Croatian language modeling procedures are evaluated and adopted for speaker independent spontaneous speech recognition. Presented experiments and results show that the proposed approach for automatic speech recognition using context-dependent acoustic modeling based on Croatian phonetic rules and a parameter tying procedure can be used for efļ¬cient Croatian large vocabulary speech recognition with word error rates below 5%.Älanak prikazuje postupke akustiÄkog i jeziÄnog modeliranja sustava za automatsko raspoznavanje hrvatskoga govora velikoga vokabulara. Predloženi akustiÄki modeli su zasnovani na kontekstno-ovisnim skrivenim Markovljevim modelima trifona i hrvatskim fonetskim pravilima. Na hrvatskome govoru prikupljenom u korpusu su ocjenjeni i usporeÄeni razliÄiti akustiÄki i jeziÄni modeli. U Älanku su usporeÄ eni i predloženi postupci za izraÄun vektora znaÄajki za akustiÄko modeliranje kao i sam pristup akustiÄkome modeliranju hrvatskoga govora s kojim je postignuta najmanja mjera pogreÅ”no raspoznatih rijeÄi. Predstavljeni su rezultati raspoznavanja spontanog hrvatskog govora neovisni o govorniku. Postignuti rezultati eksperimenata s mjerom pogreÅ”ke ispod 5% ukazuju na primjerenost predloženih postupaka za automatsko raspoznavanje hrvatskoga govora velikoga vokabulara pomoÄu vezanih kontekstnoovisnih akustiÄkih modela na osnovu hrvatskih fonetskih pravila
Speech Enhancement Guided by Contextual Articulatory Information
Previous studies have confirmed the effectiveness of leveraging articulatory
information to attain improved speech enhancement (SE) performance. By
augmenting the original acoustic features with the place/manner of articulatory
features, the SE process can be guided to consider the articulatory properties
of the input speech when performing enhancement. Hence, we believe that the
contextual information of articulatory attributes should include useful
information and can further benefit SE in different languages. In this study,
we propose an SE system that improves its performance through optimizing the
contextual articulatory information in enhanced speech for both English and
Mandarin. We optimize the contextual articulatory information through
joint-train the SE model with an end-to-end automatic speech recognition (E2E
ASR) model, predicting the sequence of broad phone classes (BPC) instead of the
word sequences. Meanwhile, two training strategies are developed to train the
SE system based on the BPC-based ASR: multitask-learning and deep-feature
training strategies. Experimental results on the TIMIT and TMHINT dataset
confirm that the contextual articulatory information facilitates an SE system
in achieving better results than the traditional Acoustic Model(AM). Moreover,
in contrast to another SE system that is trained with monophonic ASR, the
BPC-based ASR (providing contextual articulatory information) can improve the
SE performance more effectively under different signal-to-noise ratios(SNR).Comment: Will be submitted to TASL
Speech Communication
Contains reports on eight research projects.C.J. LeBel FellowshipSystems Development FoundationNational Institutes of Health (Grant 5 T32 NS07040)National Institutes of Health (Grant 5 R01 NS04332)National Science Foundation (Grant 1ST 80-17599)U.S. Navy - Office of Naval Research (Contract N00014-82-K-0727
- ā¦