Article thumbnail

Exploiting Query Structure and Document Structure to Improve Document Retrieval Effectiveness

By V. Mihajlovic, D. Hiemstra, H.E. Blok and P.M.G. Apers

Abstract

In this paper we present a systematic analysis of document retrieval using unstructured and structured queries within the score region algebra (SRA) structured retrieval framework. The behavior of di®erent retrieval models, namely Boolean, tf.idf, GPX, language models, and Okapi, is tested using the transparent SRA framework in our three-level structured retrieval system called TIJAH. The retrieval models are implemented along four elementary retrieval aspects: element and term selection, element score computation, score combination, and score propagation. The analysis is performed on a numerous experiments evaluated on TREC and CLEF collections, using manually generated unstructured and structured queries. Unstructured queries range from the short title queries to long title + description + narrative queries. For generating structured queries we exploit the knowledge of the document structure and the content used to semantically describe or classify documents. We show that such structured information can be utilized in retrieval engines to give more precise answers to user queries then when using unstructured queries

Publisher: Centre for Telematics and Information Technology, University of Twente
Year: 2006
OAI identifier: oai:doc.utwente.nl:66353

Suggested articles

Citations

  1. (1989). A Retrieval Model for Incorporating Hypertext Links. doi
  2. (1995). An Algebra for Structured Text Search and a Framework for its Implementation. doi
  3. (2004). An XML-IR-DB Sandwich: Is it Better with an Algebra in Between?
  4. (1994). Automatic Query Expansion Using SMART: TREC 3. doi
  5. (2004). Combining the Language Model and Inference Network Approaches to Retrieval. doi
  6. (1998). Document Retrieval Using the MPS Information Server (a report on the TREC-6 experiment).
  7. (1992). Expressions: An Algebra for Text Search.
  8. (2004). GPX - Gardens Point XML Information Retrieval at INEX doi
  9. (2004). How Valuable is External Link Evidence when Searching Enterprise Webs?
  10. (1990). Improving Retrieval Performance By Relevance Feedback. doi
  11. (2005). Indri: A Language Based Search Engine for Complex Queries.
  12. (2002). Information Retrieval on the Semantic Web. doi
  13. (1983). Introduction to Modern Information Retrieval. McGrow-Hill, 1st edition,
  14. (1999). Modern Information Retrieval. doi
  15. (2002). Monet: a Next Generation Database Kernel for Query Intensive Applications.
  16. (2005). ornsson. Structured Queries in XML Retrieval. doi
  17. (2005). ornsson. Understanding Content-and-Structure.
  18. (2005). Personalized Search on the World Wide Web. In The Adaptive Web: Methods and Strategies of Web Personalization, doi
  19. (2002). Personalized Search. doi
  20. (1995). Providing Government Information on the Internet: Experiences with THOMAS.
  21. (1996). Query Expansion Using Local and Global Document Analysis. doi
  22. (1996). Query Expansion. Annual Review of
  23. (1992). Retrieval Activities in a Database Consisting of Heterogeneous Collections of Structured Texts. doi
  24. (2005). Score Region Algebra: Building a Transparend XML-IR Database. doi
  25. (1994). Some Simple E®ective Approximations to the 2-Poisson Model for Probabilistic Weighted Retrieval. doi
  26. (2005). Terrier Information Retrieval Platform. doi
  27. (2004). TeXQuery: A Full-Text Search Extension to XQuery. doi
  28. (2000). The E®ect of Query Complexity on Web Searching Results.
  29. (1992). The INQUERY Retrieval System. doi
  30. (2004). The Simplest Query Language That Could Possibly Work.
  31. (2005). TIJAH at INEX 2004: Modeling Phrases and Relevance Feedback. doi
  32. (1999). User Interfaces and Visualization, chapter 5. doi
  33. (2004). Using Language Models for Flat Text Queries in XML Retrieval. doi
  34. (2001). Using Language Models for Information Retrieval. doi
  35. WordNet: A Lexical Database for the English Language.
  36. (2004). XIRQL: An XML Query Language Based on Information Retrieval Concepts. doi

To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.