465 research outputs found
Report of MIRACLE team for the Ad-Hoc track in CLEF 2007
This paper presents the 2007 MIRACLE’s team approach to the AdHoc Information Retrieval track. The work carried out for this campaign has been reduced to monolingual experiments, in the standard and in the robust tracks. No new approaches have been attempted in this campaign, following the procedures established in our participation in previous campaigns. For this campaign, runs were submitted for the following languages and tracks: - Monolingual: Bulgarian, Hungarian, and Czech. - Robust monolingual: French, English and Portuguese. There is still some room for improvement around multilingual named entities recognition
Miracle’s 2005 Approach to Cross-lingual Information Retrieval
This paper presents the 2005 Miracle’s team approach to Bilingual and Multilingual Information Retrieval. In the multilingual track, we have concentrated our work on the merging process of the results of monolingual runs to get the multilingual overall result, relying on available translations. In the bilingual and multilingual tracks, we have used available translation resources, and in some cases we have using a combining approach
Report of MIRACLE team for the Ad-Hoc track in CLEF 2006
This paper presents the 2006 MIRACLE’s team approach to the AdHoc Information Retrieval track. The experiments for this campaign keep on testing our IR approach. First, a baseline set of runs is obtained, including standard components: stemming, transforming, filtering, entities detection and extracting, and others. Then, a extended set of runs is obtained using several types of combinations of these baseline runs. The improvements introduced for this campaign have been a few ones: we have used an entity recognition and indexing prototype tool into our tokenizing scheme, and we have run more combining experiments for the robust multilingual case than in previous campaigns. However, no significative improvements have been achieved. For the this campaign, runs were submitted for the following languages and tracks: - Monolingual: Bulgarian, French, Hungarian, and Portuguese. - Bilingual: English to Bulgarian, French, Hungarian, and Portuguese; Spanish to French and Portuguese; and French to Portuguese. - Robust monolingual: German, English, Spanish, French, Italian, and Dutch. - Robust bilingual: English to German, Italian to Spanish, and French to Dutch. - Robust multilingual: English to robust monolingual languages. We still need to work harder to improve some aspects of our processing scheme, being the most important, to our knowledge, the entities recognition and normalization
Report of MIRACLE team for Geographical IR in CLEF 2006
The main objective of the designed experiments is testing the effects of geographical information retrieval from documents that contain geographical tags. In the designed experiments we try to isolate geographical retrieval from textual retrieval replacing all geo-entity textual references from topics with associated tags and splitting the retrieval process in two phases: textual retrieval from the textual part of the topic without geo-entity references and geographical retrieval from the tagged text generated by the topic tagger. Textual and geographical results are combined applying different techniques: union, intersection, difference, and external join based. Our geographic information retrieval system consists of a set of basics components organized in two categories: (i) linguistic tools oriented to textual analysis and retrieval and (ii) resources and tools oriented to geographical analysis. These tools are combined to carry out the different phases of the system: (i) documents and topics analysis, (ii) relevant documents retrieval and (iii) result combination. If we compare the results achieved to the last campaign’s results, we can assert that mean average precision gets worse when the textual geo-entity references are replaced with geographical tags. Part of this worsening is due to our experiments return cero pertinent documents if no documents satisfy de geographical sub-query. But if we only analyze the results of queries that satisfied both textual and geographical terms, we observe that the designed experiments recover pertinent documents quickly, improving R-Precision values. We conclude that the developed geographical information retrieval system is very sensible to textual georeference and therefore it is necessary to improve the name entity recognition module
Miracle’s 2005 Approach to Monolingual Information Retrieval
This paper presents the 2005 Miracle’s team approach to Monolingual Information Retrieval. The goal for the experiments in this year was twofold: continue testing the effect of combination approaches on information retrieval tasks, and improving our basic processing and indexing tools, adapting them to new languages with strange encoding schemes. The starting point was a set of basic components: stemming, transforming, filtering, proper nouns extracting, paragraph extracting, and pseudo-relevance feedback. Some of these basic components were used in different combinations and order of application for document indexing and for query processing. Second order combinations were also tested, by averaging or selective combination of the documents retrieved by different approaches for a particular query
An analysis of machine translation errors on the effectiveness of an Arabic-English QA system
The aim of this paper is to investigate
how much the effectiveness of a Question
Answering (QA) system was affected
by the performance of Machine
Translation (MT) based question translation.
Nearly 200 questions were selected
from TREC QA tracks and ran through a
question answering system. It was able to
answer 42.6% of the questions correctly
in a monolingual run. These questions
were then translated manually from English
into Arabic and back into English using
an MT system, and then re-applied to
the QA system. The system was able to
answer 10.2% of the translated questions.
An analysis of what sort of translation error
affected which questions was conducted,
concluding that factoid type
questions are less prone to translation error
than others
MIRACLE’s hybrid approach to bilingual and monolingual Information Retrieval
The main goal of the bilingual and monolingual participation of the MIRACLE team at CLEF 2004 was testing the effect of combination approaches to information retrieval. The starting point is a set of basic components: stemming, transformation, filtering, generation of n-grams, weighting and relevance feedback. Some of these basic components are used in different combinations and order of application for document indexing and for query processing. Besides this, a second order combination is done, mainly by averaging or by selective combination of the documents retrieved by different approaches for a particular query
Teaching a New Dog Old Tricks: Resurrecting Multilingual Retrieval Using Zero-shot Learning
While billions of non-English speaking users rely on search engines every
day, the problem of ad-hoc information retrieval is rarely studied for
non-English languages. This is primarily due to a lack of data set that are
suitable to train ranking algorithms. In this paper, we tackle the lack of data
by leveraging pre-trained multilingual language models to transfer a retrieval
system trained on English collections to non-English queries and documents. Our
model is evaluated in a zero-shot setting, meaning that we use them to predict
relevance scores for query-document pairs in languages never seen during
training. Our results show that the proposed approach can significantly
outperform unsupervised retrieval techniques for Arabic, Chinese Mandarin, and
Spanish. We also show that augmenting the English training collection with some
examples from the target language can sometimes improve performance.Comment: ECIR 2020 (short
MIRACLE Progress in Monolingual Information Retrieval at Ad-Hoc CLEF 2007
This paper presents the 2007 MIRACLE’s team approach to the Ad-Hoc Information Retrieval track. The main work carried out for this campaign has been around monolingual experiments, in the standard and in the robust tracks. The most important contributions have been the general introduction of automatic named-entities extraction and the use of Wikipedia resources. For the2007 campaign, runs were submitted for the following languages and tracks: a) Monolingual: Bulgarian, Hungarian, and Czech. b) Robust monolingual: French, English and Portuguese
- …