Search CORE

32 research outputs found

The State-of-the-arts in Focused Search

Author: Li Rongmei
Publication venue: University of Twente, Centre for Telematics and Information Technology
Publication date: 01/01/2009
Field of study

The continuous influx of various text data on the Web requires search engines to improve their retrieval abilities for more specific information. The need for relevant results to a user’s topic of interest has gone beyond search for domain or type specific documents to more focused result (e.g. document fragments or answers to a query). The introduction of XML provides a format standard for data representation, storage, and exchange. It helps focused search to be carried out at different granularities of a structured document with XML markups. This report aims at reviewing the state-of-the-arts in focused search, particularly techniques for topic-specific document retrieval, passage retrieval, XML retrieval, and entity ranking. It is concluded with highlight of open problems

University of Twente Research Information

An Exponentiation Method for XML Element Retrieval

Author: Tanakorn Wichaiwong
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2014
Field of study

XML document is now widely used for modelling and storing structured documents. The structure is very rich and carries important information about contents and their relationships, for example, e-Commerce. XML data-centric collections require query terms allowing users to specify constraints on the document structure; mapping structure queries and assigning the weight are significant for the set of possibly relevant documents with respect to structural conditions. In this paper, we present an extension to the MEXIR search system that supports the combination of structural and content queries in the form of content-and-structure queries, which we call the Exponentiation function. It has been shown the structural information improve the effectiveness of the search system up to 52.60% over the baseline BM25 at MAP

Crossref

Directory of Open Access Journals

PubMed Central

The State-of-the-arts in Focused Search

Author
Publication venue: Centre for Telematics and Information Technology (CTIT)
Publication date: 08/07/2009
Field of study

University of Twente Research Information

Recommended from our members

Okapi-based XML indexing

Author: Lu W.
MacFarlane A.
Venuti F.
Publication venue: 'Emerald'
Publication date: 18/09/2009
Field of study

Purpose – Being an important data exchange and information storage standard, XML has generated a great deal of interest and particular attention has been paid to the issue of XML indexing. Clear use cases for structured search in XML have been established. However, most of the research in the area is either based on relational database systems or specialized semi‐structured data management systems. This paper aims to propose a method for XML indexing based on the information retrieval (IR) system Okapi. Design/methodology/approach – First, the paper reviews the structure of inverted files and gives an overview of the issues of why this indexing mechanism cannot properly support XML retrieval, using the underlying data structures of Okapi as an example. Then the paper explores a revised method implemented on Okapi using path indexing structures. The paper evaluates these index structures through the metrics of indexing run time, path search run time and space costs using the INEX and Reuters RVC1 collections. Findings – Initial results on the INEX collections show that there is a substantial overhead in space costs for the method, but this increase does not affect run time adversely. Indexing results on differing sized Reuters RVC1 sub‐collections show that the increase in space costs with increasing the size of a collection is significant, but in terms of run time the increase is linear. Path search results show sub‐millisecond run times, demonstrating minimal overhead for XML search. Practical implications – Overall, the results show the method implemented to support XML search in a traditional IR system such as Okapi is viable. Originality/value – The paper provides useful information on a method for XML indexing based on the IR system Okapi

City Research Online

Crossref

A survey on tree matching and XML retrieval

Author: Aho
Al-Khalifa
Alilaouar
Amer-Yahia
Aouicha
Ayala
Bille
Bille
Botev
Bruno
Buneman
Burghardt
Cai
Campi
Ceri
Chamberlin
Chase
Chen
Chen
Chen
Chen
Chen
Chen
Cheng
Cole
Cole
Cyril Laitang
Dalamagas
Dalamagas
Damiani
Damiani
Dao
de Vries
Demaine
Denoyer
Dubiner
Dulucq
Dürr
Hamamache Kheddouci
Haw
Haw
Hoffmann
Hubert
Hummel
Izadi
Jansson
Jiang
Jiang
Jiang
Kamps
Karen Pinel-Sauvagnat
Kazai
Kazai
Kilpelainen
Klein
Knuth
Kosaraju
Kuboyama
Laitang
Lalmas
Lalmas
Le
Lei Ning
Levenshtein
Levy
Li
Li
Li
Lu
Lu
Mass
Mihajlovic
Mohammed Amin Tahraoui
Mohand Boughanem
Ogilvie
Pehcevski
Pehcevski
Pinel-Sauvagnat
Piwowarski
Popovici
Qin
Rao
Richter
Robie
Runapongsa
Schenkel
Schenkel
Schlieder
Shasha
Stahl
Tai
Tekli
Theobald
Trotman
Trotman
Trotman
Trotman
Trotman
van Zwol
Wagner
Wang
Wang
Wang
Wang
Wu
Yang
Yao
Zezula
Zezula
Zhang
Zhang
Zhou
Publication venue: 'Elsevier BV'
Publication date: 01/05/2013
Field of study

International audienceWith the increasing number of available XML documents, numerous approaches for retrieval have been proposed in the literature. They usually use the tree representation of documents and queries to process them, whether in an implicit or explicit way. Although retrieving XML documents can be considered as a tree matching problem between the query tree and the document trees, only a few approaches take advantage of the algorithms and methods proposed by the graph theory. In this paper, we aim at studying the theoretical approaches proposed in the literature for tree matching and at seeing how these approaches have been adapted to XML querying and retrieval, from both an exact and an approximate matching perspective. This study will allow us to highlight theoretical aspects of graph theory that have not been yet explored in XML retrieval

Crossref

Scientific Publications of the University of Toulouse II Le Mirail

Hal - Université Grenoble Alpes

Open Archive Toulouse Archive Ouverte

Hal-Diderot

Report 2011

Author
Publication venue: Max-Planck-Institut für Informatik
Publication date: 01/01/2011
Field of study

MPG.PuRe

Bericht 2007/2008

Author
Publication venue: Max-Planck-Institut für Informatik
Publication date: 01/01/2008
Field of study

MPG.PuRe

Interactive Information Retrieval with Structured Documents

Author: Malik Saadia
Publication venue
Publication date: 22/02/2010
Field of study

In recent years there has been a growing realisation in the IR community that the interaction of searchers with information is an indispensable component of the IR process. As a result, issues relating to interactive IR have been extensively investigated in the last decade. This research has been performed in the context of unstructured documents or in the context of the loosely-defined structure encountered in web pages. XML documents, on the other hand, define a different context, by offering the possibility of navigating within the structure of a single document, or of following links to other documents. Relatively little work has been carried out to study user interaction with IR systems that make use of the additional features offered by XML documents. As part of the INEX initiative for the evaluation of XML retrieval, the INEX interactive track has focused on interactive XML retrieval since 2004. Here user friendly exposition to various features of XML documents is provided and some new features are designed and implemented to enable searchers to have access to their desired information in an efficient manner. In this study interaction entails three levels: query formulation, inspecting result list, and examining the detail. For query formulation, suggesting related terms is a conventional method to assist searchers. Here we investigate the related terms derived from two different co-occurrence units: elements and documents. In addition, contextual aspect is added to facilitate the searchers for appropriate selection of terms. Results showed the usefulness of suggesting related terms and some what acceptance of the contextual related tool. For inspecting the result list, classic document retrieval systems such as web search engines retrieve whole documents, and leave it to the searchers to collect their required information from possibly a lengthy text. In contrast, element retrieval aims at a focused view of information by pointing to the optimal access points of the document. A number of strategies have been investigated for presenting result lists. For examining the detail of a document, traditionally the complete document is presented to a searcher and here again the searcher has to put in effort to reach its required information. We investigated the use of additional support such as a table of contents along with document detail. In addition, we also investigated graphical representations of documents depicting its structure and granularity of retrieved elements along with their estimated relevance. Here the table of contents was found to be a very useful features for examining details. In order to conduct the analysis of searcher's interaction, a visualisation technique based on Tree Map was developed. It depicts the search interaction with element retrieval system. A number of browsing strategies has been identified with the help of this tool. The value of element retrieval for searchers and comparison between two focused approaches such as element and passage retrieval system was also evaluated. The study suggests that searchers find elements useful for their tasks and they locate a lot of the relevant information in specific elements rather than full documents. Sections, in particular, appear to be helpful. In order to provide user-specific support, the system needs feedback from searchers, who in turn, are very reluctant to give this information explicitly. Therefore, we investigated to what extent the different features can be used as relevance predictors. Of the five features regarded, primarily the reading time is a useful relevance predictor. Overall, relevance predictors for structured documents seem to be much weaker than for the case of atomic documents

Duisburg-Essen Publications Online

Eight Biennial Report : April 2005 – March 2007

Author
Publication venue: Max-Planck-Institut für Informatik
Publication date: 01/01/2007
Field of study

MPG.PuRe