Search CORE

15,998 research outputs found

Spoken content retrieval: A survey of techniques and technologies

Author: Ani Nenkova
C A. Nenkova
K. Mckeown
Kathleen Mckeown
Publication venue: 'Now Publishers'
Publication date: 01/01/2012
Field of study

Speech media, that is, digital audio and video containing spoken content, has blossomed in recent years. Large collections are accruing on the Internet as well as in private and enterprise settings. This growth has motivated extensive research on techniques and technologies that facilitate reliable indexing and retrieval. Spoken content retrieval (SCR) requires the combination of audio and speech processing technologies with methods from information retrieval (IR). SCR research initially investigated planned speech structured in document-like units, but has subsequently shifted focus to more informal spoken content produced spontaneously, outside of the studio and in conversational settings. This survey provides an overview of the field of SCR encompassing component technologies, the relationship of SCR to text IR and automatic speech recognition and user interaction issues. It is aimed at researchers with backgrounds in speech technology or IR who are seeking deeper insight on how these fields are integrated to support research and development, thus addressing the core challenges of SCR

CiteSeerX

Crossref

Irish Universities

DCU Online Research Access Service

Beyond English text: Multilingual and multimedia information retrieval.

Author: Jones Gareth J.F.
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 01/01/2005
Field of study

Non

CiteSeerX

DCU Online Research Access Service

Language-based multimedia information retrieval

Author: Gauvain J.L.
Hiemstra D.
Jong F.M.G. de
Netter K.
Publication venue
Publication date: 01/01/2000
Field of study

This paper describes various methods and approaches for language-based multimedia information retrieval, which have been developed in the projects POP-EYE and OLIVE and which will be developed further in the MUMIS project. All of these project aim at supporting automated indexing of video material by use of human language technologies. Thus, in contrast to image or sound-based retrieval methods, where both the query language and the indexing methods build on non-linguistic data, these methods attempt to exploit advanced text retrieval technologies for the retrieval of non-textual material. While POP-EYE was building on subtitles or captions as the prime language key for disclosing video fragments, OLIVE is making use of speech recognition to automatically derive transcriptions of the sound tracks, generating time-coded linguistic elements which then serve as the basis for text-based retrieval functionality

CiteSeerX

Radboud Repository

University of Twente Research Information

Exploration of audiovisual heritage using audio indexing technology

Author: Heeren Willemijn
Jong Franciska de
Ordelman Roeland
Publication venue
Publication date: 01/01/2006
Field of study

This paper discusses audio indexing tools that have been implemented for the disclosure of Dutch audiovisual cultural heritage collections. It explains the role of language models and their adaptation to historical settings and the adaptation of acoustic models for homogeneous audio collections. In addition to the benefits of cross-media linking, the requirements for successful tuning and improvement of available tools for indexing the heterogeneous A/V collections from the cultural heritage domain are reviewed. And finally the paper argues that research is needed to cope with the varying information needs for different types of users

University of Twente Research Information

Automated speech and audio analysis for semantic access to multimedia

Author: Huijbregts Marijn
Jong Franciska de
Ordelman Roeland
Publication venue: Springer Verlag
Publication date: 01/01/2006
Field of study

The deployment and integration of audio processing tools can enhance the semantic annotation of multimedia content, and as a consequence, improve the effectiveness of conceptual access tools. This paper overviews the various ways in which automatic speech and audio analysis can contribute to increased granularity of automatically extracted metadata. A number of techniques will be presented, including the alignment of speech and text resources, large vocabulary speech recognition, key word spotting and speaker classification. The applicability of techniques will be discussed from a media crossing perspective. The added value of the techniques and their potential contribution to the content value chain will be illustrated by the description of two (complementary) demonstrators for browsing broadcast news archives

University of Twente Research Information

Examining the contributions of automatic speech transcriptions and metadata sources for searching spontaneous conversational speech

Author: Jones Gareth J.F.
Lam-Adesina Adenike M.
Newman Eamonn
Zhang Ke
Publication venue: Centre for Telematics and Information Technology, Enschede, The Netherlands
Publication date: 01/07/2007
Field of study

The searching spontaneous speech can be enhanced by combining automatic speech transcriptions with semantically related metadata. An important question is what can be expected from search of such transcriptions and different sources of related metadata in terms of retrieval effectiveness. The Cross-Language Speech Retrieval (CL-SR) track at recent CLEF workshops provides a spontaneous speech test collection with manual and automatically derived metadata fields. Using this collection we investigate the comparative search effectiveness of individual fields comprising automated transcriptions and the available metadata. A further important question is how transcriptions and metadata should be combined for the greatest benefit to search accuracy. We compare simple field merging of individual fields with the extended BM25 model for weighted field combination (BM25F). Results indicate that BM25F can produce improved search accuracy, but that it is currently important to set its parameters suitably using a suitable training set

Irish Universities

DCU Online Research Access Service

Multimedia search without visual analysis: the value of linguistic and contextual information

Author: Jong Franciska M.G. de
Vries Arjen P. de
Westerveld Thijs
Publication venue: IEEE Computer Society Press
Publication date: 01/01/2007
Field of study

This paper addresses the focus of this special issue by analyzing the potential contribution of linguistic content and other non-image aspects to the processing of audiovisual data. It summarizes the various ways in which linguistic content analysis contributes to enhancing the semantic annotation of multimedia content, and, as a consequence, to improving the effectiveness of conceptual media access tools. A number of techniques are presented, including the time-alignment of textual resources, audio and speech processing, content reduction and reasoning tools, and the exploitation of surface features

CiteSeerX

CWI's Institutional Repository

University of Twente Research Information

InfoLink: analysis of Dutch broadcast news and cross-media browsing

Author: Hessen Arjan van
Jong Franciska de
Morang Jeroen
Ordelman Roeland
Publication venue: IEEE
Publication date: 01/01/2005
Field of study

In this paper, a cross-media browsing demonstrator named InfoLink is described. InfoLink automatically links the content of Dutch broadcast news videos to related information sources in parallel collections containing text and/or video. Automatic segmentation, speech recognition and available meta-data are used to index and link items. The concept is visualised using SMIL-scripts for presenting the streaming broadcast news video and the information links

University of Twente Research Information