2 research outputs found
Recommended from our members
Okapi-based XML indexing
Purpose
â Being an important data exchange and information storage standard, XML has generated a great deal of interest and particular attention has been paid to the issue of XML indexing. Clear use cases for structured search in XML have been established. However, most of the research in the area is either based on relational database systems or specialized semiâstructured data management systems. This paper aims to propose a method for XML indexing based on the information retrieval (IR) system Okapi.
Design/methodology/approach
â First, the paper reviews the structure of inverted files and gives an overview of the issues of why this indexing mechanism cannot properly support XML retrieval, using the underlying data structures of Okapi as an example. Then the paper explores a revised method implemented on Okapi using path indexing structures. The paper evaluates these index structures through the metrics of indexing run time, path search run time and space costs using the INEX and Reuters RVC1 collections.
Findings
â Initial results on the INEX collections show that there is a substantial overhead in space costs for the method, but this increase does not affect run time adversely. Indexing results on differing sized Reuters RVC1 subâcollections show that the increase in space costs with increasing the size of a collection is significant, but in terms of run time the increase is linear. Path search results show subâmillisecond run times, demonstrating minimal overhead for XML search.
Practical implications
â Overall, the results show the method implemented to support XML search in a traditional IR system such as Okapi is viable.
Originality/value
â The paper provides useful information on a method for XML indexing based on the IR system Okapi
What XML-IR Users May Want
It is assumed that by focusing on retrieval at a granularity lower than documents that XML-IR systems will better satisfy usersâ information need than traditional IR systems. Participates in INEXâs Ad-hoc track develop XMLIR systems based upon this assumption, using an evaluation methodology in the tradition of Cranfield. However, since the inception of INEX, debate has raged on how applicable some of the Ad-hoc tasks are to real user tasks. The purpose of the User-Case Studies track from to explore the application of XML-IR systems from the usersâ perspective. This paper outlines QUTâs involvement in this task. For our involvement we conducted a user experiment using an XMLIR system (GPX) and three interfaces: a standard keyword interface, a natural language interface (NLPX) and a query-by-template interface (Bricks). Following the experiment we interviewed the users about their experience and asked them - in comparison with a traditional XML-IR system - what type of tasks would they use an XML-IR system for, what extra information they would need to interact with an XML-IR system and how would they want to see XML-IR results presented. It is hoped that the outcomes of this study will bring us closer to understanding what users want from XML-IR systems