26,023 research outputs found

    Entity Query Feature Expansion Using Knowledge Base Links

    Get PDF
    Recent advances in automatic entity linking and knowledge base construction have resulted in entity annotations for document and query collections. For example, annotations of entities from large general purpose knowledge bases, such as Freebase and the Google Knowledge Graph. Understanding how to leverage these entity annotations of text to improve ad hoc document retrieval is an open research area. Query expansion is a commonly used technique to improve retrieval effectiveness. Most previous query expansion approaches focus on text, mainly using unigram concepts. In this paper, we propose a new technique, called entity query feature expansion (EQFE) which enriches the query with features from entities and their links to knowledge bases, including structured attributes and text. We experiment using both explicit query entity annotations and latent entities. We evaluate our technique on TREC text collections automatically annotated with knowledge base entity links, including the Google Freebase Annotations (FACC1) data. We find that entity-based feature expansion results in significant improvements in retrieval effectiveness over state-of-the-art text expansion approaches

    Using the Annotated Bibliography as a Resource for Indicative Summarization

    Get PDF
    We report on a language resource consisting of 2000 annotated bibliography entries, which is being analyzed as part of our research on indicative document summarization. We show how annotated bibliographies cover certain aspects of summarization that have not been well-covered by other summary corpora, and motivate why they constitute an important form to study for information retrieval. We detail our methodology for collecting the corpus, and overview our document feature markup that we introduced to facilitate summary analysis. We present the characteristics of the corpus, methods of collection, and show its use in finding the distribution of types of information included in indicative summaries and their relative ordering within the summaries.Comment: 8 pages, 3 figure

    Bridging the Semantic Gap in Multimedia Information Retrieval: Top-down and Bottom-up approaches

    No full text
    Semantic representation of multimedia information is vital for enabling the kind of multimedia search capabilities that professional searchers require. Manual annotation is often not possible because of the shear scale of the multimedia information that needs indexing. This paper explores the ways in which we are using both top-down, ontologically driven approaches and bottom-up, automatic-annotation approaches to provide retrieval facilities to users. We also discuss many of the current techniques that we are investigating to combine these top-down and bottom-up approaches

    Hierarchical Event Descriptors (HED): Semi-Structured Tagging for Real-World Events in Large-Scale EEG.

    Get PDF
    Real-world brain imaging by EEG requires accurate annotation of complex subject-environment interactions in event-rich tasks and paradigms. This paper describes the evolution of the Hierarchical Event Descriptor (HED) system for systematically describing both laboratory and real-world events. HED version 2, first described here, provides the semantic capability of describing a variety of subject and environmental states. HED descriptions can include stimulus presentation events on screen or in virtual worlds, experimental or spontaneous events occurring in the real world environment, and events experienced via one or multiple sensory modalities. Furthermore, HED 2 can distinguish between the mere presence of an object and its actual (or putative) perception by a subject. Although the HED framework has implicit ontological and linked data representations, the user-interface for HED annotation is more intuitive than traditional ontological annotation. We believe that hiding the formal representations allows for a more user-friendly interface, making consistent, detailed tagging of experimental, and real-world events possible for research users. HED is extensible while retaining the advantages of having an enforced common core vocabulary. We have developed a collection of tools to support HED tag assignment and validation; these are available at hedtags.org. A plug-in for EEGLAB (sccn.ucsd.edu/eeglab), CTAGGER, is also available to speed the process of tagging existing studies

    Examining the contributions of automatic speech transcriptions and metadata sources for searching spontaneous conversational speech

    Get PDF
    The searching spontaneous speech can be enhanced by combining automatic speech transcriptions with semantically related metadata. An important question is what can be expected from search of such transcriptions and different sources of related metadata in terms of retrieval effectiveness. The Cross-Language Speech Retrieval (CL-SR) track at recent CLEF workshops provides a spontaneous speech test collection with manual and automatically derived metadata fields. Using this collection we investigate the comparative search effectiveness of individual fields comprising automated transcriptions and the available metadata. A further important question is how transcriptions and metadata should be combined for the greatest benefit to search accuracy. We compare simple field merging of individual fields with the extended BM25 model for weighted field combination (BM25F). Results indicate that BM25F can produce improved search accuracy, but that it is currently important to set its parameters suitably using a suitable training set

    Unravelling the voice of Willem Frederik Hermans: an oral history indexing case study

    Get PDF

    Robust audio indexing for Dutch spoken-word collections

    Get PDF
    Abstract—Whereas the growth of storage capacity is in accordance with widely acknowledged predictions, the possibilities to index and access the archives created is lagging behind. This is especially the case in the oral history domain and much of the rich content in these collections runs the risk to remain inaccessible for lack of robust search technologies. This paper addresses the history and development of robust audio indexing technology for searching Dutch spoken-word collections and compares Dutch audio indexing in the well-studied broadcast news domain with an oral-history case-study. It is concluded that despite significant advances in Dutch audio indexing technology and demonstrated applicability in several domains, further research is indispensable for successful automatic disclosure of spoken-word collections

    Evaluation of a prototype interface for structured document retrieval

    Get PDF
    Document collections often display either internal structure, in the form of the logical arrangement of document components, or external structure, in the form of links between documents. Structured document retrieval systems aim to exploit this structural information to provide users with more effective access to structured documents. To do this, the associated interface must both represent this information explicitly and support users in their browsing behaviour. This paper describes the implementation and user-centred evaluation of a prototype interface, the RelevanceLinkBar interface. The results of the evaluation show that the RelevanceLinkBar interface supported users in their browsing behaviour, allowing them to find more relevant documents, and was strongly preferred over a standard results interface

    Visual exploration and retrieval of XML document collections with the generic system X2

    Get PDF
    This article reports on the XML retrieval system X2 which has been developed at the University of Munich over the last five years. In a typical session with X2, the user first browses a structural summary of the XML database in order to select interesting elements and keywords occurring in documents. Using this intermediate result, queries combining structure and textual references are composed semiautomatically. After query evaluation, the full set of answers is presented in a visual and structured way. X2 largely exploits the structure found in documents, queries and answers to enable new interactive visualization and exploration techniques that support mixed IR and database-oriented querying, thus bridging the gap between these three views on the data to be retrieved. Another salient characteristic of X2 which distinguishes it from other visual query systems for XML is that it supports various degrees of detailedness in the presentation of answers, as well as techniques for dynamically reordering and grouping retrieved elements once the complete answer set has been computed
    • …
    corecore