937 research outputs found

    Multimedia information technology and the annotation of video

    Get PDF
    The state of the art in multimedia information technology has not progressed to the point where a single solution is available to meet all reasonable needs of documentalists and users of video archives. In general, we do not have an optimistic view of the usability of new technology in this domain, but digitization and digital power can be expected to cause a small revolution in the area of video archiving. The volume of data leads to two views of the future: on the pessimistic side, overload of data will cause lack of annotation capacity, and on the optimistic side, there will be enough data from which to learn selected concepts that can be deployed to support automatic annotation. At the threshold of this interesting era, we make an attempt to describe the state of the art in technology. We sample the progress in text, sound, and image processing, as well as in machine learning

    Challenges in speech processing of Slavic languages (case studies in speech recognition of Czech and

    Get PDF
    Abstract. Slavic languages pose a big challenge for researchers dealing with speech technology. They exhibit a large degree of inflection, namely declension of nouns, pronouns and adjectives, and conjugation of verbs. This has a large impact on the size of lexical inventories in these languages, and significantly complicates the design of text-to-speech and, in particular, speech-to-text systems. In the paper, we demonstrate some of the typical features of the Slavic languages and show how they can be handled in the development of practical speech processing systems. We present our solutions we applied in the design of voice dictation and broadcast speech transcription systems developed for Czech. Furthermore, we demonstrate how these systems can be converted to another similar Slavic language, in our case Slovak. All the presented systems operate in real time with very large vocabularies (350K words in Czech, 170K words in Slovak) and some of them have been already deployed in practice

    Konuşma Tanıma için İnsan-makine Karşılaştırması

    Get PDF
    Speech/voice recognition by machines has been a topic of interest since 1950s. Research that initially adopted dynamic programming methodologies now mostly uses the hidden Markov model as the method for speech recognition. Nevertheless, even the most advanced speech recognition system makes, depending on the context, 2-20 times more errors than humans. Although the basic principles behind human speech recognition have not been completely understood, there are some theories that attempt to explain biological mechanisms for speech recognition. This paper aims to provide a review of these theories as well as a brief history of developments in automatic speech recognition technology. Furthermore, the paper discusses some recent studies on Turkish speech recognition. The paper concludes with a comparison between human and machine speech recognition performance

    Spoken content retrieval: A survey of techniques and technologies

    Get PDF
    Speech media, that is, digital audio and video containing spoken content, has blossomed in recent years. Large collections are accruing on the Internet as well as in private and enterprise settings. This growth has motivated extensive research on techniques and technologies that facilitate reliable indexing and retrieval. Spoken content retrieval (SCR) requires the combination of audio and speech processing technologies with methods from information retrieval (IR). SCR research initially investigated planned speech structured in document-like units, but has subsequently shifted focus to more informal spoken content produced spontaneously, outside of the studio and in conversational settings. This survey provides an overview of the field of SCR encompassing component technologies, the relationship of SCR to text IR and automatic speech recognition and user interaction issues. It is aimed at researchers with backgrounds in speech technology or IR who are seeking deeper insight on how these fields are integrated to support research and development, thus addressing the core challenges of SCR

    The George-Anne

    Get PDF

    Speech Recognition

    Get PDF
    Chapters in the first part of the book cover all the essential speech processing techniques for building robust, automatic speech recognition systems: the representation for speech signals and the methods for speech-features extraction, acoustic and language modeling, efficient algorithms for searching the hypothesis space, and multimodal approaches to speech recognition. The last part of the book is devoted to other speech processing applications that can use the information from automatic speech recognition for speaker identification and tracking, for prosody modeling in emotion-detection systems and in other speech processing applications that are able to operate in real-world environments, like mobile communication services and smart homes

    The George-Anne

    Get PDF
    corecore