Search CORE

21 research outputs found

Accurate user directed summarization from existing tools

Author: Sanderson M.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/1998
Field of study

This paper describes a set of experimental results produced from the TIPSTER SUMMAC initiative on user directed summaries: document summaries generated in the context of an information need expressed as a query. The summarizer that was evaluated was based on a set of existing statistical techniques that had been applied successfully to the INQUERY retrieval system. The techniques proved to have a wider utility, however, as the summarizer was one of the better performing systems in the SUMMAC evaluation. The design of this summarizer is presented with a range of evaluations: both those provided by SUMMAC as well as a set of preliminary, more informal, evaluations that examined additional aspects of the summaries. Amongst other conclusions, the results reveal that users can judge the relevance of documents from their summary almost as accurately as if they had had access to the document’s full text

CiteSeerX

Crossref

White Rose Research Online

Evaluating document clustering for interactive information retrieval

Author: Anton Leuski
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2004
Field of study

Crossref

Entity Query Feature Expansion Using Knowledge Base Links

Author: Allan James
Dalton Jeffrey
Dietz Laura
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 03/07/2014
Field of study

Recent advances in automatic entity linking and knowledge base construction have resulted in entity annotations for document and query collections. For example, annotations of entities from large general purpose knowledge bases, such as Freebase and the Google Knowledge Graph. Understanding how to leverage these entity annotations of text to improve ad hoc document retrieval is an open research area. Query expansion is a commonly used technique to improve retrieval effectiveness. Most previous query expansion approaches focus on text, mainly using unigram concepts. In this paper, we propose a new technique, called entity query feature expansion (EQFE) which enriches the query with features from entities and their links to knowledge bases, including structured attributes and text. We experiment using both explicit query entity annotations and latent entities. We evaluate our technique on TREC text collections automatically annotated with knowledge base entity links, including the Google Freebase Annotations (FACC1) data. We find that entity-based feature expansion results in significant improvements in retrieval effectiveness over state-of-the-art text expansion approaches

CiteSeerX

Enlighten

Client-system collaboration for legal corpus selection in an online production environment

Author
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2003
Field of study

Crossref

Indexing and retrieval of broadcast news

Author: Abberley Dave
Kirby David
Renals Steve
Robinson Tony
Publication venue: 'Elsevier BV'
Publication date: 01/01/2000
Field of study

This paper describes a spoken document retrieval (SDR) system for British and North American Broadcast News. The system is based on a connectionist large vocabulary speech recognizer and a probabilistic information retrieval system. We discuss the development of a realtime Broadcast News speech recognizer, and its integration into an SDR system. Two advances were made for this task: automatic segmentation and statistical query expansion using a secondary corpus. Precision and recall results using the Text Retrieval Conference (TREC) SDR evaluation infrastructure are reported throughout the paper, and we discuss the application of these developments to a large scale SDR task based on an archive of British English broadcast news

CiteSeerX

Edinburgh Research Archive

Edinburgh Research Explorer

Improving document representation by accumulating relevance feedback : the relevance feedback accumulation (RFA) algorithm

Author: Bot Razvan Stefan
Publication venue: Digital Commons @ NJIT
Publication date: 31/05/2005
Field of study

Document representation (indexing) techniques are dominated by variants of the term-frequency analysis approach, based on the assumption that the more occurrences a term has throughout a document the more important the term is in that document. Inherent drawbacks associated with this approach include: poor index quality, high document representation size and the word mismatch problem. To tackle these drawbacks, a document representation improvement method called the Relevance Feedback Accumulation (RFA) algorithm is presented. The algorithm provides a mechanism to continuously accumulate relevance assessments over time and across users. It also provides a document representation modification function, or document representation learning function that gradually improves the quality of the document representations. To improve document representations, the learning function uses a data mining measure called support for analyzing the accumulated relevance feedback. Evaluation is done by comparing the RFA algorithm to other four algorithms. The four measures used for evaluation are (a) average number of index terms per document; (b) the quality of the document representations assessed by human judges; (c) retrieval effectiveness; and (d) the quality of the document representation learning function. The evaluation results show that (1) the algorithm is able to substantially reduce the document representations size while maintaining retrieval effectiveness parameters; (2) the algorithm provides a smooth and steady document representation learning function; and (3) the algorithm improves the quality of the document representations. The RFA algorithm\u27s approach is consistent with efficiency considerations that hold in real information retrieval systems. The major contribution made by this research is the design and implementation of a novel, simple, efficient, and scalable technique for document representation improvement

Digital Commons @ New Jersey Institute of Technology (NJIT)

Term Association Modelling in Information Retrieval

Author: Zhao Jiashu
Publication venue
Publication date: 28/08/2015
Field of study

Many traditional Information Retrieval (IR) models assume that query terms are independent of each other. For those models, a document is normally represented as a bag of words/terms and their frequencies. Although traditional retrieval models can achieve reasonably good performance in many applications, the corresponding independence assumption has limitations. There are some recent studies that investigate how to model term associations/dependencies by proximity measures. However, the modeling of term associations theoretically under the probabilistic retrieval framework is still largely unexplored. In this thesis, I propose a new concept named Cross Term, to model term proximity, with the aim of boosting retrieval performance. With Cross Terms, the association of multiple query terms can be modeled in the same way as a simple unigram term. In particular, an occurrence of a query term is assumed to have an impact on its neighboring text. The degree of the query term impact gradually weakens with increasing distance from the place of occurrence. Shape functions are used to characterize such impacts. Based on this assumption, I first propose a bigram CRoss TErm Retrieval (CRTER2) model for probabilistic IR and a Language model based model CRTER2LM. Specifically, a bigram Cross Term occurs when the corresponding query terms appear close to each other, and its impact can be modeled by the intersection of the respective shape functions of the query terms. Second, I propose a generalized n-gram CRoss TErm Retrieval (CRTERn) model recursively for n query terms where n>2. For n-gram Cross Term, I develop several distance metrics with different properties and employ them in the proposed models for ranking. Third, an enhanced context-sensitive proximity model is proposed to boost the CRTER models, where the contextual relevance of term proximity is studied. The models are validated on several large standard data sets, and show improved performance over other state-of-art approaches. I also discusse the practical impact of the proposed models. The approaches in this thesis can also provide helpful benefit for term association modeling in other domains

YorkSpace

A Boolean based Question Answering System

Author: Wu Jiayi
Publication venue: 'University of Windsor Leddy Library'
Publication date: 01/01/2015
Field of study

The search engine searches the information according to the key words and provides users with related links, which need users to review and find the direct information among a large number of webpages. To avoid this drawback and improve the search results from search engine, we implemented a Boolean based Question Answering System. This system used Boolean Retrieval Model to analyze and match the text information from corresponding webpages in the document indexing step when users ask a Boolean expression based question. To evaluate system and analyze Boolean Retrieval Model, we used the data set from TREC (Text Retrieval Conference) to finish our experiment. Different Boolean operators in the questions such as AND, OR has been evaluated separately which is clear to analyze the effectiveness for each of them. We also evaluate the overall performance for this system

Scholarship at UWindsor