Search CORE

362 research outputs found

The University of Lisbon at CLEF 2006 Ad-Hoc Task

Author: Cardoso Nuno
Martins Bruno
Silva Mário J.
Publication venue
Publication date: 15/10/2009
Field of study

An evaluation resource for geographic information retrieval

Author: Di Nunzio G.
Ferro N.
Gey F.
Mandl T.
Sanderson M.
Santos D.
Womser-Hacker C.
Publication venue
Publication date: 01/01/2008
Field of study

In this paper we present an evaluation resource for geographic information retrieval developed within the Cross Language Evaluation Forum (CLEF). The GeoCLEF track is dedicated to the evaluation of geographic information retrieval systems. The resource encompasses more than 600,000 documents, 75 topics so far, and more than 100,000 relevance judgments for these topics. Geographic information retrieval requires an evaluation resource which represents realistic information needs and which is geographically challenging. Some experimental results and analysis are reported

White Rose Research Online

Archivio istituzionale della ricerca - Università di Padova

GeoCLEF 2007: the CLEF 2007 cross-language geographic information retrieval track overview

Author: Di Nunzio G.
Ferro N.
Gey F.
Larson R.
Mandl T.
Sanderson M.
Santos D.
Womser-Hacker C.
Xie X.
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 01/01/2007
Field of study

GeoCLEF ran as a regular track for the second time within the Cross Language Evaluation Forum (CLEF) 2007. The purpose of GeoCLEF is to test and evaluate cross-language geographic information retrieval (GIR): retrieval for topics with a geographic specification. GeoCLEF 2007 consisted of two sub tasks. A search task ran for the third time and a query classification task was organized for the first. For the GeoCLEF 2007 search task, twenty-five search topics were defined by the organizing groups for searching English, German, Portuguese and Spanish document collections. All topics were translated into English, Indonesian, Portuguese, Spanish and German. Several topics in 2007 were geographically challenging. Thirteen groups submitted 108 runs. The groups used a variety of approaches. For the classification task, a query log from a search engine was provided and the groups needed to identify the queries with a geographic scope and the geographic components within the local queries

CiteSeerX

Crossref

Repositório Comum

White Rose Research Online

Archivio istituzionale della ricerca - Università di Padova

The University of Lisbon at GeoCLEF 2007

Author: Cardoso Nuno
Chaves Marcirio
Cruz David
Silva Mário J.
Publication venue
Publication date: 15/10/2009
Field of study

Repositório Comum

The University of Lisbon at GeoCLEF 2006

Author: Andrade Leonardo
Cardoso Nuno
Chaves Marcirio Silveira
Martins Bruno
Silva Mário J.
Publication venue
Publication date: 15/10/2009
Field of study

Repositório Comum

GeoCLEF 2008: the CLEF 2008 Cross-Language Geographic Information Retrieval Track Overview

Author: Carvalho Paula
Gey Fredric
Larson Ray
Mandl Thomas
Santos Diana
Womser-Hacker Christa
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 01/01/2009
Field of study

Repositório Comum

DCU@FIRE2010: term conflation, blind relevance feedback, and cross-language IR with manual and automatic query translation

Author: Ganguly Debasis
Jones Gareth J.F.
Leveling Johannes
Publication venue
Publication date: 01/02/2010
Field of study

For the first participation of Dublin City University (DCU) in the FIRE 2010 evaluation campaign, information retrieval (IR) experiments on English, Bengali, Hindi, and Marathi documents were performed to investigate term conation (different stemming approaches and indexing word prefixes), blind relevance feedback, and manual and automatic query translation. The experiments are based on BM25 and on language modeling (LM) for IR. Results show that term conation always improves mean average precision (MAP) compared to indexing unprocessed word forms, but different approaches seem to work best for different languages. For example, in monolingual Marathi experiments indexing 5-prefixes outperforms our corpus-based stemmer; in Hindi, the corpus-based stemmer achieves a higher MAP. For Bengali, the LM retrieval model achieves a much higher MAP than BM25 (0.4944 vs. 0.4526). In all experiments using BM25, blind relevance feedback yields considerably higher MAP in comparison to experiments without it. Bilingual IR experiments (English!Bengali and English!Hindi) are based on query translations obtained from native speakers and the Google translate web service. For the automatically translated queries, MAP is slightly (but not significantly) lower compared to experiments with manual query translations. The bilingual English!Bengali (English!Hindi) experiments achieve 81.7%-83.3% (78.0%-80.6%) of the best corresponding monolingual experiments

Irish Universities

DCU Online Research Access Service

An evaluation resource for Geographical Information Retrieval

Author: Ferro Nicola
Gey Fredric
Mandl Thomas
Nunzio Giorgio di
Sanderson Mark
Santos Diana
Womser-Hacker Christa
Publication venue: European Language Resources Association (ELRA)
Publication date: 01/01/2008
Field of study

CiteSeerX

Repositório Comum

Sub-word indexing and blind relevance feedback for English, Bengali, Hindi, and Marathi IR

Author: Jones Gareth J.F.
Leveling Johannes
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/09/2010
Field of study

The Forum for Information Retrieval Evaluation (FIRE) provides document collections, topics, and relevance assessments for information retrieval (IR) experiments on Indian languages. Several research questions are explored in this paper: 1. how to create create a simple, languageindependent corpus-based stemmer, 2. how to identify sub-words and which types of sub-words are suitable as indexing units, and 3. how to apply blind relevance feedback on sub-words and how feedback term selection is affected by the type of the indexing unit. More than 140 IR experiments are conducted using the BM25 retrieval model on the topic titles and descriptions (TD) for the FIRE 2008 English, Bengali, Hindi, and Marathi document collections. The major findings are: The corpus-based stemming approach is effective as a knowledge-light term conation step and useful in case of few language-specific resources. For English, the corpusbased stemmer performs nearly as well as the Porter stemmer and significantly better than the baseline of indexing words when combined with query expansion. In combination with blind relevance feedback, it also performs significantly better than the baseline for Bengali and Marathi IR. Sub-words such as consonant-vowel sequences and word prefixes can yield similar or better performance in comparison to word indexing. There is no best performing method for all languages. For English, indexing using the Porter stemmer performs best, for Bengali and Marathi, overlapping 3-grams obtain the best result, and for Hindi, 4-prefixes yield the highest MAP. However, in combination with blind relevance feedback using 10 documents and 20 terms, 6-prefixes for English and 4-prefixes for Bengali, Hindi, and Marathi IR yield the highest MAP. Sub-word identification is a general case of decompounding. It results in one or more index terms for a single word form and increases the number of index terms but decreases their average length. The corresponding retrieval experiments show that relevance feedback on sub-words benefits from selecting a larger number of index terms in comparison with retrieval on word forms. Similarly, selecting the number of relevance feedback terms depending on the ratio of word vocabulary size to sub-word vocabulary size almost always slightly increases information retrieval effectiveness compared to using a fixed number of terms for different languages

Irish Universities

DCU Online Research Access Service

Workshop on Novel Methodologies for Evaluation in Information Retrieval : Workshop held at European Conference on Information Retrieval - ECIR 2008, Glasgow, United Kingdom, 30 March 2008

Author
Publication venue
Publication date: 01/01/2008
Field of study

White Rose Research Online