425 research outputs found

    Spoken content retrieval: A survey of techniques and technologies

    Get PDF
    Speech media, that is, digital audio and video containing spoken content, has blossomed in recent years. Large collections are accruing on the Internet as well as in private and enterprise settings. This growth has motivated extensive research on techniques and technologies that facilitate reliable indexing and retrieval. Spoken content retrieval (SCR) requires the combination of audio and speech processing technologies with methods from information retrieval (IR). SCR research initially investigated planned speech structured in document-like units, but has subsequently shifted focus to more informal spoken content produced spontaneously, outside of the studio and in conversational settings. This survey provides an overview of the field of SCR encompassing component technologies, the relationship of SCR to text IR and automatic speech recognition and user interaction issues. It is aimed at researchers with backgrounds in speech technology or IR who are seeking deeper insight on how these fields are integrated to support research and development, thus addressing the core challenges of SCR

    Searching Spontaneous Conversational Speech:Proceedings of ACM SIGIR Workshop (SSCS2008)

    Get PDF

    Access to recorded interviews: A research agenda

    Get PDF
    Recorded interviews form a rich basis for scholarly inquiry. Examples include oral histories, community memory projects, and interviews conducted for broadcast media. Emerging technologies offer the potential to radically transform the way in which recorded interviews are made accessible, but this vision will demand substantial investments from a broad range of research communities. This article reviews the present state of practice for making recorded interviews available and the state-of-the-art for key component technologies. A large number of important research issues are identified, and from that set of issues, a coherent research agenda is proposed

    Open-vocabulary spoken utterance retrieval using confusion networks

    Get PDF
    This paper presents a novel approach to open-vocabulary spoken utterance retrieval using confusion networks. If out-of-vocabulary (OOV) words are present in queries and the corpus, word-based indexing will not be sufficient. For this problem, we apply phone confusion networks and combine them with word confusion networks. With this approach, we can generate a more compact index table that enables robust keyword matching compared with typical lattice-based methods. In the retrieval experiments with speech recordings in MIT lecture corpus, our method using phone confusion networks outperformed lattice-based methods especially for OOV queries

    Subword-based approaches for spoken document retrieval

    Get PDF
    Thesis (Ph.D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2000.Includes bibliographical references (p. 181-187).This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.This thesis explores approaches to the problem of spoken document retrieval (SDR), which is the task of automatically indexing and then retrieving relevant items from a large collection of recorded speech messages in response to a user specified natural language text query. We investigate the use of subword unit representations for SDR as an alternative to words generated by either keyword spotting or continuous speech recognition. Our investigation is motivated by the observation that word-based retrieval approaches face the problem of either having to know the keywords to search for [\em a priori], or requiring a very large recognition vocabulary in order to cover the contents of growing and diverse message collections. The use of subword units in the recognizer constrains the size of the vocabulary needed to cover the language; and the use of subword units as indexing terms allows for the detection of new user-specified query terms during retrieval. Four research issues are addressed. First, what are suitable subword units and how well can they perform? Second, how can these units be reliably extracted from the speech signal? Third, what is the behavior of the subword units when there are speech recognition errors and how well do they perform? And fourth, how can the indexing and retrieval methods be modified to take into account the fact that the speech recognition output will be errorful?(cont.) We first explore a range of subword units ofvarying complexity derived from error-free phonetic transcriptions and measure their ability to effectively index and retrieve speech messages. We find that many subword units capture enough information to perform effective retrieval and that it is possible to achieve performance comparable to that of text-based word units. Next, we develop a phonetic speech recognizer and process the spoken document collection to generate phonetic transcriptions. We then measure the ability of subword units derived from these transcriptions to perform spoken document retrieval and examine the effects of recognition errors on retrieval performance. Retrieval performance degrades for all subword units (to 60% of the clean reference), but remains reasonable for some subword units even without the use of any error compensation techniques. We then investigate a number of robust methods that take into account the characteristics of the recognition errors and try to compensate for them in an effort to improve spoken document retrieval performance when there are speech recognition errors. We study the methods individually and explore the effects of combining them. Using these robust methods improves retrieval performance by 23%. We also propose a novel approach to SDR where the speech recognition and information retrieval components are more tightly integrated.(cont.) This is accomplished by developing new recognizer and retrieval models where the interface between the two components is better matched and the goals of the two components are consistent with each other and with the overall goal of the combined system. Using this new integrated approach improves retrieval performance by 28%. ...by Kenney Ng.Ph.D

    Holistic Vocabulary Independent Spoken Term Detection

    Get PDF
    Within this thesis, we aim at designing a loosely coupled holistic system for Spoken Term Detection (STD) on heterogeneous German broadcast data in selected application scenarios. Starting from STD on the 1-best output of a word-based speech recognizer, we study the performance of several subword units for vocabulary independent STD on a linguistically and acoustically challenging German corpus. We explore the typical error sources in subword STD, and find that they differ from the error sources in word-based speech search. We select, extend and combine a set of state-of-the-art methods for error compensation in STD in order to explicitly merge the corresponding STD error spaces through anchor-based approximate lattice retrieval. Novel methods for STD result verification are proposed in order to increase retrieval precision by exploiting external knowledge at search time. Error-compensating methods for STD typically suffer from high response times on large scale databases, and we propose scalable approaches suitable for large corpora. Highest STD accuracy is obtained by combining anchor-based approximate retrieval from both syllable lattice ASR and syllabified word ASR into a hybrid STD system, and pruning the result list using external knowledge with hybrid contextual and anti-query verification.Die vorliegende Arbeit beschreibt ein lose gekoppeltes, ganzheitliches System zur Sprachsuche auf heterogenenen deutschen Sprachdaten in unterschiedlichen Anwendungsszenarien. Ausgehend von einer wortbasierten Sprachsuche auf dem Transkript eines aktuellen Wort-Erkenners werden zunächst unterschiedliche Subwort-Einheiten für die vokabularunabhängige Sprachsuche auf deutschen Daten untersucht. Auf dieser Basis werden die typischen Fehlerquellen in der Subwort-basierten Sprachsuche analysiert. Diese Fehlerquellen unterscheiden sich vom Fall der klassichen Suche im Worttranskript und müssen explizit adressiert werden. Die explizite Kompensation der unterschiedlichen Fehlerquellen erfolgt durch einen neuartigen hybriden Ansatz zur effizienten Ankerbasierten unscharfen Wortgraph-Suche. Darüber hinaus werden neuartige Methoden zur Verifikation von Suchergebnissen vorgestellt, die zur Suchzeit verfügbares externes Wissen einbeziehen. Alle vorgestellten Verfahren werden auf einem umfangreichen Satz von deutschen Fernsehdaten mit Fokus auf ausgewählte, repräsentative Einsatzszenarien evaluiert. Da Methoden zur Fehlerkompensation in der Sprachsuchforschung typischerweise zu hohen Laufzeiten bei der Suche in großen Archiven führen, werden insbesondere auch Szenarien mit sehr großen Datenmengen betrachtet. Die höchste Suchleistung für Archive mittlerer Größe wird durch eine unscharfe und Anker-basierte Suche auf einem hybriden Index aus Silben-Wortgraphen und silbifizierter Wort-Erkennung erreicht, bei der die Suchergebnisse mit hybrider Verifikation bereinigt werden

    Cross-language Information Retrieval

    Full text link
    Two key assumptions shape the usual view of ranked retrieval: (1) that the searcher can choose words for their query that might appear in the documents that they wish to see, and (2) that ranking retrieved documents will suffice because the searcher will be able to recognize those which they wished to find. When the documents to be searched are in a language not known by the searcher, neither assumption is true. In such cases, Cross-Language Information Retrieval (CLIR) is needed. This chapter reviews the state of the art for CLIR and outlines some open research questions.Comment: 49 pages, 0 figure
    • …
    corecore