Search CORE

937 research outputs found

Statistical Language Modeling for Automatic Speech Recognition of Agglutinative Languages

Author: Ebru Ar&#305
Ha&#351
Janne Pylkk&#246
Mikko Kurimo
Murat Sara&#231
Tanel Alum&#228
Teemu Hirsim&#228
Publication venue: 'IntechOpen'
Publication date: 01/11/2008
Field of study

Multimedia information technology and the annotation of video

Author: Jong F.M.G. de
Smeulders A.
Worring M.
Publication venue: Stichting Archiefpublicaties
Publication date: 01/01/2006
Field of study

The state of the art in multimedia information technology has not progressed to the point where a single solution is available to meet all reasonable needs of documentalists and users of video archives. In general, we do not have an optimistic view of the usability of new technology in this domain, but digitization and digital power can be expected to cause a small revolution in the area of video archiving. The volume of data leads to two views of the future: on the pessimistic side, overload of data will cause lack of annotation capacity, and on the optimistic side, there will be enough data from which to learn selected concepts that can be deployed to support automatic annotation. At the threshold of this interesting era, we make an attempt to describe the state of the art in technology. We sample the progress in text, sound, and image processing, as well as in machine learning

University of Twente Research Information

Challenges in speech processing of Slavic languages (case studies in speech recognition of Czech and

Author: Jan Nouza
Jan Silovsky
Jindrich Zdansky
Petr Cerva
Publication venue
Publication date: 01/01/2010
Field of study

Abstract. Slavic languages pose a big challenge for researchers dealing with speech technology. They exhibit a large degree of inflection, namely declension of nouns, pronouns and adjectives, and conjugation of verbs. This has a large impact on the size of lexical inventories in these languages, and significantly complicates the design of text-to-speech and, in particular, speech-to-text systems. In the paper, we demonstrate some of the typical features of the Slavic languages and show how they can be handled in the development of practical speech processing systems. We present our solutions we applied in the design of voice dictation and broadcast speech transcription systems developed for Czech. Furthermore, we demonstrate how these systems can be converted to another similar Slavic language, in our case Slovak. All the presented systems operate in real time with very large vocabularies (350K words in Czech, 170K words in Slovak) and some of them have been already deployed in practice

CiteSeerX

Konuşma Tanıma için İnsan-makine Karşılaştırması

Author: Ayşe Gürel
Levent M. Arslan
Publication venue: BÜTEK Boğaziçi Eğitim Turizm Teknopark Uygulama ve Dan. Hiz. San. Tic. A.Ş.
Publication date: 01/07/2008
Field of study

Speech/voice recognition by machines has been a topic of interest since 1950s. Research that initially adopted dynamic programming methodologies now mostly uses the hidden Markov model as the method for speech recognition. Nevertheless, even the most advanced speech recognition system makes, depending on the context, 2-20 times more errors than humans. Although the basic principles behind human speech recognition have not been completely understood, there are some theories that attempt to explain biological mechanisms for speech recognition. This paper aims to provide a review of these theories as well as a brief history of developments in automatic speech recognition technology. Furthermore, the paper discusses some recent studies on Turkish speech recognition. The paper concludes with a comparison between human and machine speech recognition performance

Directory of Open Access Journals

Spoken content retrieval: A survey of techniques and technologies

Author: Ani Nenkova
C A. Nenkova
K. Mckeown
Kathleen Mckeown
Publication venue: 'Now Publishers'
Publication date: 01/01/2012
Field of study

Speech media, that is, digital audio and video containing spoken content, has blossomed in recent years. Large collections are accruing on the Internet as well as in private and enterprise settings. This growth has motivated extensive research on techniques and technologies that facilitate reliable indexing and retrieval. Spoken content retrieval (SCR) requires the combination of audio and speech processing technologies with methods from information retrieval (IR). SCR research initially investigated planned speech structured in document-like units, but has subsequently shifted focus to more informal spoken content produced spontaneously, outside of the studio and in conversational settings. This survey provides an overview of the field of SCR encompassing component technologies, the relationship of SCR to text IR and automatic speech recognition and user interaction issues. It is aimed at researchers with backgrounds in speech technology or IR who are seeking deeper insight on how these fields are integrated to support research and development, thus addressing the core challenges of SCR

CiteSeerX

Crossref

Irish Universities

DCU Online Research Access Service

The George-Anne

Author: Georgia Southern University
Publication venue: Digital Commons@Georgia Southern
Publication date: 28/03/2003
Field of study

Georgia Southern University: Digital Commons@Georgia Southern

Speech Recognition

Author
Publication venue: 'IntechOpen'
Publication date: 20/04/2021
Field of study

Chapters in the first part of the book cover all the essential speech processing techniques for building robust, automatic speech recognition systems: the representation for speech signals and the methods for speech-features extraction, acoustic and language modeling, efficient algorithms for searching the hypothesis space, and multimodal approaches to speech recognition. The last part of the book is devoted to other speech processing applications that can use the information from automatic speech recognition for speaker identification and tracking, for prosody modeling in emotion-detection systems and in other speech processing applications that are able to operate in real-world environments, like mobile communication services and smart homes

Directory of Open Access Books (DOAB)

The George-Anne

Author: Georgia Southern University
Publication venue: Digital Commons@Georgia Southern
Publication date: 04/04/2005
Field of study

Georgia Southern University: Digital Commons@Georgia Southern