26,030 research outputs found
Entity Query Feature Expansion Using Knowledge Base Links
Recent advances in automatic entity linking and knowledge base
construction have resulted in entity annotations for document and
query collections. For example, annotations of entities from large
general purpose knowledge bases, such as Freebase and the Google
Knowledge Graph. Understanding how to leverage these entity
annotations of text to improve ad hoc document retrieval is an open
research area. Query expansion is a commonly used technique to
improve retrieval effectiveness. Most previous query expansion
approaches focus on text, mainly using unigram concepts. In this
paper, we propose a new technique, called entity query feature
expansion (EQFE) which enriches the query with features from
entities and their links to knowledge bases, including structured
attributes and text. We experiment using both explicit query entity
annotations and latent entities. We evaluate our technique on TREC
text collections automatically annotated with knowledge base entity
links, including the Google Freebase Annotations (FACC1) data.
We find that entity-based feature expansion results in significant
improvements in retrieval effectiveness over state-of-the-art text
expansion approaches
Using the Annotated Bibliography as a Resource for Indicative Summarization
We report on a language resource consisting of 2000 annotated bibliography
entries, which is being analyzed as part of our research on indicative document
summarization. We show how annotated bibliographies cover certain aspects of
summarization that have not been well-covered by other summary corpora, and
motivate why they constitute an important form to study for information
retrieval. We detail our methodology for collecting the corpus, and overview
our document feature markup that we introduced to facilitate summary analysis.
We present the characteristics of the corpus, methods of collection, and show
its use in finding the distribution of types of information included in
indicative summaries and their relative ordering within the summaries.Comment: 8 pages, 3 figure
Bridging the Semantic Gap in Multimedia Information Retrieval: Top-down and Bottom-up approaches
Semantic representation of multimedia information is vital for enabling the kind of multimedia search capabilities that professional searchers require. Manual annotation is often not possible because of the shear scale of the multimedia information that needs indexing. This paper explores the ways in which we are using both top-down, ontologically driven approaches and bottom-up, automatic-annotation approaches to provide retrieval facilities to users. We also discuss many of the current techniques that we are investigating to combine these top-down and bottom-up approaches
Hierarchical Event Descriptors (HED): Semi-Structured Tagging for Real-World Events in Large-Scale EEG.
Real-world brain imaging by EEG requires accurate annotation of complex subject-environment interactions in event-rich tasks and paradigms. This paper describes the evolution of the Hierarchical Event Descriptor (HED) system for systematically describing both laboratory and real-world events. HED version 2, first described here, provides the semantic capability of describing a variety of subject and environmental states. HED descriptions can include stimulus presentation events on screen or in virtual worlds, experimental or spontaneous events occurring in the real world environment, and events experienced via one or multiple sensory modalities. Furthermore, HED 2 can distinguish between the mere presence of an object and its actual (or putative) perception by a subject. Although the HED framework has implicit ontological and linked data representations, the user-interface for HED annotation is more intuitive than traditional ontological annotation. We believe that hiding the formal representations allows for a more user-friendly interface, making consistent, detailed tagging of experimental, and real-world events possible for research users. HED is extensible while retaining the advantages of having an enforced common core vocabulary. We have developed a collection of tools to support HED tag assignment and validation; these are available at hedtags.org. A plug-in for EEGLAB (sccn.ucsd.edu/eeglab), CTAGGER, is also available to speed the process of tagging existing studies
Examining the contributions of automatic speech transcriptions and metadata sources for searching spontaneous conversational speech
The searching spontaneous speech can be enhanced by combining automatic speech transcriptions with semantically
related metadata. An important question is what can be expected from search of such transcriptions and different
sources of related metadata in terms of retrieval effectiveness. The Cross-Language Speech Retrieval (CL-SR) track at recent CLEF workshops provides a spontaneous speech
test collection with manual and automatically derived metadata fields. Using this collection we investigate the comparative search effectiveness of individual fields comprising automated transcriptions and the available metadata. A further important question is how transcriptions and metadata should be combined for the greatest benefit to search accuracy. We compare simple field merging of individual fields with the extended BM25 model for weighted field combination (BM25F). Results indicate that BM25F can produce improved search accuracy, but that it is currently important to set its parameters suitably using a suitable training set
Robust audio indexing for Dutch spoken-word collections
AbstractâWhereas the growth of storage capacity is in accordance with widely acknowledged predictions, the possibilities to index and access the archives created is lagging behind. This is especially the case in the oral history domain and much of the rich content in these collections runs the risk to remain inaccessible for lack of robust search technologies. This paper addresses the history and development of robust audio indexing technology for searching Dutch spoken-word collections and compares Dutch audio indexing in the well-studied broadcast news domain with an oral-history case-study. It is concluded that despite significant advances in Dutch audio indexing technology and demonstrated applicability in several domains, further research is indispensable for successful automatic disclosure of spoken-word collections
Evaluation of a prototype interface for structured document retrieval
Document collections often display either internal structure, in the form of the logical arrangement of document components, or external structure, in the form of links between documents. Structured document retrieval systems aim to exploit this structural information to provide users with more effective access to structured documents. To do this, the associated interface must both represent this information explicitly and support users in their browsing behaviour. This paper describes the implementation and user-centred evaluation of a prototype interface, the RelevanceLinkBar interface. The results of the evaluation show that the RelevanceLinkBar interface supported users in their browsing behaviour, allowing them to find more relevant documents, and was strongly preferred over a standard results interface
Visual exploration and retrieval of XML document collections with the generic system X2
This article reports on the XML retrieval system X2 which has been developed at the University of Munich over the last five years. In a typical session with X2, the user
first browses a structural summary of the XML database in order to select interesting elements and keywords occurring in documents. Using this intermediate result, queries combining structure and textual references are composed semiautomatically.
After query evaluation, the full set of answers is presented in a visual and structured way. X2 largely exploits the structure found in documents, queries and answers to enable new interactive visualization and exploration techniques that support mixed IR and database-oriented querying, thus bridging the gap between these three views on the data to be retrieved. Another salient characteristic of X2 which distinguishes it from other visual query systems for XML is that it supports various degrees of detailedness in the presentation of answers, as well as techniques for dynamically reordering and grouping retrieved elements once the complete answer set has been computed
- âŚ