17,403 research outputs found
Exploiting Query Structure and Document Structure to Improve Document Retrieval Effectiveness
In this paper we present a systematic analysis of document
retrieval using unstructured and structured queries within
the score region algebra (SRA) structured retrieval framework. The behavior of di®erent retrieval models, namely
Boolean, tf.idf, GPX, language models, and Okapi, is tested
using the transparent SRA framework in our three-level structured retrieval system called TIJAH. The retrieval models are implemented along four elementary retrieval aspects: element and term selection, element score computation, score combination, and score propagation.
The analysis is performed on a numerous experiments
evaluated on TREC and CLEF collections, using manually
generated unstructured and structured queries. Unstructured queries range from the short title queries to long title
+ description + narrative queries. For generating structured
queries we exploit the knowledge of the document structure
and the content used to semantically describe or classify
documents. We show that such structured information can
be utilized in retrieval engines to give more precise answers to user queries then when using unstructured queries
Knowledge-based Query Expansion in Real-Time Microblog Search
Since the length of microblog texts, such as tweets, is strictly limited to
140 characters, traditional Information Retrieval techniques suffer from the
vocabulary mismatch problem severely and cannot yield good performance in the
context of microblogosphere. To address this critical challenge, in this paper,
we propose a new language modeling approach for microblog retrieval by
inferring various types of context information. In particular, we expand the
query using knowledge terms derived from Freebase so that the expanded one can
better reflect users' search intent. Besides, in order to further satisfy
users' real-time information need, we incorporate temporal evidences into the
expansion method, which can boost recent tweets in the retrieval results with
respect to a given topic. Experimental results on two official TREC Twitter
corpora demonstrate the significant superiority of our approach over baseline
methods.Comment: 9 pages, 9 figure
- …