834 research outputs found

    Access to recorded interviews: A research agenda

    Get PDF
    Recorded interviews form a rich basis for scholarly inquiry. Examples include oral histories, community memory projects, and interviews conducted for broadcast media. Emerging technologies offer the potential to radically transform the way in which recorded interviews are made accessible, but this vision will demand substantial investments from a broad range of research communities. This article reviews the present state of practice for making recorded interviews available and the state-of-the-art for key component technologies. A large number of important research issues are identified, and from that set of issues, a coherent research agenda is proposed

    Searching Spontaneous Conversational Speech

    Get PDF
    The ACM SIGIR Workshop on Searching Spontaneous Conversational Speech was held as part of the 2007 ACM SIGIR Conference in Amsterdam.\ud The workshop program was a mix of elements, including a keynote speech, paper presentations and panel discussions. This brief report describes the organization of this workshop and summarizes the discussions

    Examining the contributions of automatic speech transcriptions and metadata sources for searching spontaneous conversational speech

    Get PDF
    The searching spontaneous speech can be enhanced by combining automatic speech transcriptions with semantically related metadata. An important question is what can be expected from search of such transcriptions and different sources of related metadata in terms of retrieval effectiveness. The Cross-Language Speech Retrieval (CL-SR) track at recent CLEF workshops provides a spontaneous speech test collection with manual and automatically derived metadata fields. Using this collection we investigate the comparative search effectiveness of individual fields comprising automated transcriptions and the available metadata. A further important question is how transcriptions and metadata should be combined for the greatest benefit to search accuracy. We compare simple field merging of individual fields with the extended BM25 model for weighted field combination (BM25F). Results indicate that BM25F can produce improved search accuracy, but that it is currently important to set its parameters suitably using a suitable training set

    Overview of the CLEF-2005 cross-language speech retrieval track

    Get PDF
    The task for the CLEF-2005 cross-language speech retrieval track was to identify topically coherent segments of English interviews in a known-boundary condition. Seven teams participated, performing both monolingual and cross-language searches of ASR transcripts, automatically generated metadata, and manually generated metadata. Results indicate that monolingual search technology is sufficiently accurate to be useful for some purposes (the best mean average precision was 0.18) and cross-language searching yielded results typical of those seen in other applications (with the best systems approximating monolingual mean average precision)

    Exploration of audiovisual heritage using audio indexing technology

    Get PDF
    This paper discusses audio indexing tools that have been implemented for the disclosure of Dutch audiovisual cultural heritage collections. It explains the role of language models and their adaptation to historical settings and the adaptation of acoustic models for homogeneous audio collections. In addition to the benefits of cross-media linking, the requirements for successful tuning and improvement of available tools for indexing the heterogeneous A/V collections from the cultural heritage domain are reviewed. And finally the paper argues that research is needed to cope with the varying information needs for different types of users

    NLP and the Humanities: The Revival of an Old Liaison

    Get PDF
    This paper presents an overview of some\ud emerging trends in the application of NLP\ud in the domain of the so-called Digital Humanities\ud and discusses the role and nature\ud of metadata, the annotation layer that is so\ud characteristic of documents that play a role\ud in the scholarly practises of the humanities.\ud It is explained how metadata are the\ud key to the added value of techniques such\ud as text and link mining, and an outline is\ud given of what measures could be taken to\ud increase the chances for a bright future for\ud the old ties between NLP and the humanities.\ud There is no data like metadata

    Interact: A Mixed Reality Virtual Survivor for Holocaust Testimonies

    Get PDF
    In this paper we present Interact---a mixed reality virtual survivor for Holocaust education. It was created to preserve the powerful and engaging experience of listening to, and interacting with, Holocaust survivors, allowing future generations of audience access to their unique stories. Interact demonstrates how advanced filming techniques, 3D graphics and natural language processing can be integrated and applied to specially-recorded testimonies to enable users to ask questions and receive answers from that virtualised individuals. This provides a new and rich interactive narratives of remembrance to engage with primary testimony. We discuss the design and development of Interact, and argue that this new form of mixed reality is promising media to overcome the uncanny valley

    Radio Oranje: Enhanced Access to a Historical Spoken Word Collection

    Get PDF
    Access to historical audio collections is typically very restricted:\ud content is often only available on physical (analog) media and the\ud metadata is usually limited to keywords, giving access at the level\ud of relatively large fragments, e.g., an entire tape. Many spoken\ud word heritage collections are now being digitized, which allows the\ud introduction of more advanced search technology. This paper presents\ud an approach that supports online access and search for recordings of\ud historical speeches. A demonstrator has been built, based on the\ud so-called Radio Oranje collection, which contains radio speeches by\ud the Dutch Queen Wilhelmina that were broadcast during World War II.\ud The audio has been aligned with its original 1940s manual\ud transcriptions to create a time-stamped index that enables the speeches to be\ud searched at the word level. Results are presented together with\ud related photos from an external database
    corecore