16 research outputs found
UJM at INEX 2009 Ad Hoc track
7 pagesInternational audienceThis paper1 presents our participation to the INEX 2009 Ad- Hoc track. We have experimented the tuning of various parameters using a ”training” collection (i.e. INEX 2008) quite different than the ”testing” collection used for 2009 INEX Ad-Hoc track. Several parameters have been studied for article retrieval as well as for element retrieval, especially the two main BM25 weighting function parameters: b and k1
ENSM-SE and UJM at INEX 2010: Scoring with Proximity and Tag Weights
10 pagesInternational audienceThis paper presents our participation in the Relevant in Context task (ad-hoc track) during the 2010 INEX competition, and a posterior analysis. Two models presented in previous editions of INEX by the authors were merged for our 2010 participation. The first one is based on the proximity of the query terms in the documents and the second one is based on learnt tag weights. The results demonstrate the improvement of focused information retrieval, thanks to the integration of the tag weights in the approach based on proximity
UJM at INEX 2008: pre impacting of tags weights
International audienceThis paper addresses the integration of tags in terms weighting function for focused XML retrieval. Our model allows to consider a certain kind of structural information: tags that represent logical structure (title, section, etc.) as well as tags related to formatting (bold font, centered text, etc.). We first take into account the tags influence by estimating the probability that tags distinguishes terms which are the most relevant. Then, these weights are impacted on terms weighting function using several combining schemes. Experiments on a large collection during INEX 2008 XML IR evaluation campaign (INitiative for Evaluation of XML Retrieval) showed that using tags leads to improvements on focused retrieval
Using Proximity and Tag Weights for Focused Retrieval in Structured Documents
International audienceFocused information retrieval is concerned with the retrieval of small units of information. In this context, the structure of the documents as well as the proximity among query terms have been found useful for improving retrieval effectiveness. In this article, we propose an approach combining the proximity of the terms and the tags which mark these terms. Our approach is based on a Fetch and Browse method where the fetch step is performed with BM25 and the browse step with a structure enhanced proximity model. In this way, the ranking of a document depends not only upon the existence of the query terms within the document but also upon the tags which mark these terms. Thus, the document tends to be highly relevant when query terms are close together and are emphasized by tags. The evaluation of this model on a large XML structured collection provided by the INEX 2010 XML IR evaluation campaign shows that the use of term proximity and structure improves the retrieval effectiveness of BM25 in the context of focused information retrieval
The Wikipedia Image Retrieval Task
The wikipedia image retrieval task at ImageCLEF provides a testbed for the system-oriented evaluation of visual information retrieval from a collection of Wikipedia images. The aim is to investigate the effectiveness of retrieval approaches that exploit textual and visual evidence in the context of a large and heterogeneous collection of images that are searched for by users with diverse information needs. This chapter presents an overview of the available test collections, summarises the retrieval approaches employed by the groups that participated in the task during the 2008 and 2009 ImageCLEF campaigns, provides an analysis of the main evaluation results, identifies best practices for effective retrieval, and discusses open issues
BM25t: a BM25 extension for focused information retrieval
25 pagesInternational audienceThis paper addresses the integration of XML tags into a term-weighting function for focused XML Information Retrieval (IR). Our model allows us to consider a certain kind of structural information: tags that represent a logical structure (e.g. title, section, paragraph, etc.) as well as other tags (e.g. bold, italic, center, etc.). We take into account the influence of a tag by estimating the probability for this tag to distinguish relevant terms from the others. Then, these weights are integrated in a term-weighting function. Experiments on a large collection from the INEX 2008 XML IR evaluation campaign showed improvements on focused XML retrieval
An Exponentiation Method for XML Element Retrieval
XML document is now widely used for modelling and storing structured documents. The structure is very rich and carries important
information about contents and their relationships, for example, e-Commerce. XML data-centric collections require query terms allowing users to specify
constraints on the document structure; mapping structure queries and assigning the weight are significant for the set of possibly relevant documents
with respect to structural conditions. In this paper, we present an extension to the MEXIR search system that supports the combination
of structural and content queries in the form of content-and-structure queries, which we call the Exponentiation function. It has been shown
the structural information improve the effectiveness of the search system up to 52.60% over the baseline BM25 at MAP