126 research outputs found

    Spoken content retrieval: A survey of techniques and technologies

    Get PDF
    Speech media, that is, digital audio and video containing spoken content, has blossomed in recent years. Large collections are accruing on the Internet as well as in private and enterprise settings. This growth has motivated extensive research on techniques and technologies that facilitate reliable indexing and retrieval. Spoken content retrieval (SCR) requires the combination of audio and speech processing technologies with methods from information retrieval (IR). SCR research initially investigated planned speech structured in document-like units, but has subsequently shifted focus to more informal spoken content produced spontaneously, outside of the studio and in conversational settings. This survey provides an overview of the field of SCR encompassing component technologies, the relationship of SCR to text IR and automatic speech recognition and user interaction issues. It is aimed at researchers with backgrounds in speech technology or IR who are seeking deeper insight on how these fields are integrated to support research and development, thus addressing the core challenges of SCR

    Access to recorded interviews: A research agenda

    Get PDF
    Recorded interviews form a rich basis for scholarly inquiry. Examples include oral histories, community memory projects, and interviews conducted for broadcast media. Emerging technologies offer the potential to radically transform the way in which recorded interviews are made accessible, but this vision will demand substantial investments from a broad range of research communities. This article reviews the present state of practice for making recorded interviews available and the state-of-the-art for key component technologies. A large number of important research issues are identified, and from that set of issues, a coherent research agenda is proposed

    Searching Spontaneous Conversational Speech:Proceedings of ACM SIGIR Workshop (SSCS2008)

    Get PDF

    Proceedings of the ACM SIGIR Workshop ''Searching Spontaneous Conversational Speech''

    Get PDF

    Overview of the NTCIR-12 SpokenQuery&Doc-2 task

    Get PDF
    This paper presents an overview of the Spoken Query and Spoken Document retrieval (SpokenQuery&Doc-2) task at the NTCIR-12 Workshop. This task included spoken query driven spoken content retrieval (SQ-SCR) and a spoken query driven spoken term detection (SQ-STD) as the two subtasks. The paper describes details of each sub-task, the data used, the creation of the speech recognition systems used to create the transcripts, the design of the retrieval test collections, the metrics used to evaluate the sub-tasks and a summary of the results of submissions by the task participants

    Unsupervised pattern discovery in speech : applications to word acquisition and speaker segmentation

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, February 2007.Includes bibliographical references (p. 167-176).We present a novel approach to speech processing based on the principle of pattern discovery. Our work represents a departure from traditional models of speech recognition, where the end goal is to classify speech into categories defined by a pre-specified inventory of lexical units (i.e. phones or words). Instead, we attempt to discover such an inventory in an unsupervised manner by exploiting the structure of repeating patterns within the speech signal. We show how pattern discovery can be used to automatically acquire lexical entities directly from an untranscribed audio stream. Our approach to unsupervised word acquisition utilizes a segmental variant of a widely used dynamic programming technique, which allows us to find matching acoustic patterns between spoken utterances. By aggregating information about these matching patterns across audio streams, we demonstrate how to group similar acoustic sequences together to form clusters corresponding to lexical entities such as words and short multi-word phrases. On a corpus of academic lecture material, we demonstrate that clusters found using this technique exhibit high purity and that many of the corresponding lexical identities are relevant to the underlying audio stream.(cont.) We demonstrate two applications of our pattern discovery procedure. First, we propose and evaluate two methods for automatically identifying sound clusters generated through pattern discovery. Our results show that high identification accuracy can be achieved for single word clusters using a constrained isolated word recognizer. Second, we apply acoustic pattern matching to the problem of speaker segmentation by attempting to find word-level speech patterns that are repeated by the same speaker. When used to segment a ten hour corpus of multi-speaker lectures, we found that our approach is able to generate segmentations that correlate well to independently generated human segmentations.by Alex Seungryong Park.Ph.D

    Audio Information Retrieval using Sinusoidal Modeling based Features

    Get PDF
    In this paper, we propose an approach for searching a speech query word in an audio data base. In this approach, we explore Sinusoidal modeling based features such as Amplitude, Frequency and Phase of the speech signal. Three independent systems are built using these features. Further, the Majority voting logic is used to arrive at a conclusion to locate (time stamp) the query word in the reference utterances. The studies are performed on the TIMIT database. The results show that, Sinusoidal based features can be used for speech processing in place of conventional approaches
    • …
    corecore