Search CORE

834 research outputs found

Access to recorded interviews: A research agenda

Author: Heeren W.F.L.
Jong F.M.G. de
Oard D.W.
Ordelman R.J.F.
Publication venue: ACM
Publication date: 01/01/2008
Field of study

Recorded interviews form a rich basis for scholarly inquiry. Examples include oral histories, community memory projects, and interviews conducted for broadcast media. Emerging technologies offer the potential to radically transform the way in which recorded interviews are made accessible, but this vision will demand substantial investments from a broad range of research communities. This article reviews the present state of practice for making recorded interviews available and the state-of-the-art for key component technologies. A large number of important research issues are identified, and from that set of issues, a coherent research agenda is proposed

University of Twente Research Information

Searching Spontaneous Conversational Speech

Author: Jong Franciska de
Oard Douglas W.
Ordelman Roeland
Raaijmakers Stephan
Publication venue: ACM SIGIR
Publication date: 01/01/2007
Field of study

The ACM SIGIR Workshop on Searching Spontaneous Conversational Speech was held as part of the 2007 ACM SIGIR Conference in Amsterdam.\ud The workshop program was a mix of elements, including a keynote speech, paper presentations and panel discussions. This brief report describes the organization of this workshop and summarizes the discussions

University of Twente Research Information

Examining the contributions of automatic speech transcriptions and metadata sources for searching spontaneous conversational speech

Author: Jones Gareth J.F.
Lam-Adesina Adenike M.
Newman Eamonn
Zhang Ke
Publication venue: Centre for Telematics and Information Technology, Enschede, The Netherlands
Publication date: 01/07/2007
Field of study

The searching spontaneous speech can be enhanced by combining automatic speech transcriptions with semantically related metadata. An important question is what can be expected from search of such transcriptions and different sources of related metadata in terms of retrieval effectiveness. The Cross-Language Speech Retrieval (CL-SR) track at recent CLEF workshops provides a spontaneous speech test collection with manual and automatically derived metadata fields. Using this collection we investigate the comparative search effectiveness of individual fields comprising automated transcriptions and the available metadata. A further important question is how transcriptions and metadata should be combined for the greatest benefit to search accuracy. We compare simple field merging of individual fields with the extended BM25 model for weighted field combination (BM25F). Results indicate that BM25F can produce improved search accuracy, but that it is currently important to set its parameters suitably using a suitable training set

Irish Universities

DCU Online Research Access Service

Overview of the CLEF-2005 cross-language speech retrieval track

Author: Huang Xiaoli
Jones Gareth J.F.
Oard Douglas W.
Soergel Dagobert
White Ryen W.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2006
Field of study

The task for the CLEF-2005 cross-language speech retrieval track was to identify topically coherent segments of English interviews in a known-boundary condition. Seven teams participated, performing both monolingual and cross-language searches of ASR transcripts, automatically generated metadata, and manually generated metadata. Results indicate that monolingual search technology is sufficiently accurate to be useful for some purposes (the best mean average precision was 0.18) and cross-language searching yielded results typical of those seen in other applications (with the best systems approximating monolingual mean average precision)

DCU Online Research Access Service

Exploration of audiovisual heritage using audio indexing technology

Author: Heeren Willemijn
Jong Franciska de
Ordelman Roeland
Publication venue
Publication date: 01/01/2006
Field of study

This paper discusses audio indexing tools that have been implemented for the disclosure of Dutch audiovisual cultural heritage collections. It explains the role of language models and their adaptation to historical settings and the adaptation of acoustic models for homogeneous audio collections. In addition to the benefits of cross-media linking, the requirements for successful tuning and improvement of available tools for indexing the heterogeneous A/V collections from the cultural heritage domain are reviewed. And finally the paper argues that research is needed to cope with the varying information needs for different types of users

University of Twente Research Information

NLP and the Humanities: The Revival of an Old Liaison

Author: Jong F.M.G. de
Publication venue: Association for Computational Linguistics
Publication date: 01/01/2009
Field of study

This paper presents an overview of some\ud emerging trends in the application of NLP\ud in the domain of the so-called Digital Humanities\ud and discusses the role and nature\ud of metadata, the annotation layer that is so\ud characteristic of documents that play a role\ud in the scholarly practises of the humanities.\ud It is explained how metadata are the\ud key to the added value of techniques such\ud as text and link mining, and an outline is\ud given of what measures could be taken to\ud increase the chances for a bright future for\ud the old ties between NLP and the humanities.\ud There is no data like metadata

University of Twente Research Information

Interact: A Mixed Reality Virtual Survivor for Holocaust Testimonies

Author: Coward Sarah
Ma Minhua
Walker Chris
Publication venue: 'American College of Medical Physics (ACMP)'
Publication date
Field of study

In this paper we present Interact---a mixed reality virtual survivor for Holocaust education. It was created to preserve the powerful and engaging experience of listening to, and interacting with, Holocaust survivors, allowing future generations of audience access to their unique stories. Interact demonstrates how advanced filming techniques, 3D graphics and natural language processing can be integrated and applied to specially-recorded testimonies to enable users to ask questions and receive answers from that virtualised individuals. This provides a new and rich interactive narratives of remembrance to engage with primary testimony. We discuss the design and development of Interact, and argue that this new form of mixed reality is promising media to overcome the uncanny valley

University of Huddersfield Repository

Radio Oranje: Enhanced Access to a Historical Spoken Word Collection

Author: Heeren Willemijn
Jong Franciska de
Ordelman Roeland
Werff Laurens van der
Publication venue: Landelijke Onderzoekschool Taalwetenschap
Publication date: 01/01/2007
Field of study

Access to historical audio collections is typically very restricted:\ud content is often only available on physical (analog) media and the\ud metadata is usually limited to keywords, giving access at the level\ud of relatively large fragments, e.g., an entire tape. Many spoken\ud word heritage collections are now being digitized, which allows the\ud introduction of more advanced search technology. This paper presents\ud an approach that supports online access and search for recordings of\ud historical speeches. A demonstrator has been built, based on the\ud so-called Radio Oranje collection, which contains radio speeches by\ud the Dutch Queen Wilhelmina that were broadcast during World War II.\ud The audio has been aligned with its original 1940s manual\ud transcriptions to create a time-stamped index that enables the speeches to be\ud searched at the word level. Results are presented together with\ud related photos from an external database

University of Twente Research Information

Utrecht University Repository