3,128 research outputs found
Overview of the TREC 2022 NeuCLIR Track
This is the first year of the TREC Neural CLIR (NeuCLIR) track, which aims to
study the impact of neural approaches to cross-language information retrieval.
The main task in this year's track was ad hoc ranked retrieval of Chinese,
Persian, or Russian newswire documents using queries expressed in English.
Topics were developed using standard TREC processes, except that topics
developed by an annotator for one language were assessed by a different
annotator when evaluating that topic on a different language. There were 172
total runs submitted by twelve teams.Comment: 22 pages, 13 figures, 10 tables. Part of the Thirty-First Text
REtrieval Conference (TREC 2022) Proceedings. Replace the misplaced Russian
result tabl
Fusion of retrieval models at CLEF 2008 Ad Hoc Persian Track
Metasearch engines submit the user query to several under- lying search engines and then merge their retrieved results to generate a single list that is more e®ective to the users information needs. According to the idea behind metasearch engines, it seems that merging the results retrieved from di®erent retrieval models will improve the search coverage and precision. In this study, we have investigated the e®ect of fusion of di®erent retrieval techniques on the performance of Persian retrieval. We use an extension of Ordered Weighted Average (OWA) operator called IOWA and a weighting schema, NOWA for merging the results. Our ex- perimental results show that merging by OWA operators produces better MAP
Using OWA fuzzy operator to merge retrieval system results
With rapid growth of information sources, it is essential to develop methods that retrieve most relevant information according to the user requirements. One way of improving the quality of retrieval is to use more than one retrieval engine and then merge the retrieved results and show a single ranked list to the user. There are studies that suggest combining the results of multiple search engines will improve ranking when these engine are treated as independent experts. In this study, we investigated performance of Persian retrieval by merging four different language modeling methods and two vector space models with Lnu.ltu and Lnc.btc weighting schemes. The experiments were conducted on a large Persian collection of news archives called Hamshari Collection. Different variations of the Ordered Weighted Average (OWA) fuzzy operators method, called a quantifier based OWA operator and a degree-of-importance based OWA operator method have been tested for merging the results. Our experimental results show that the OWA operators produce better precision and ranking in comparison with weaker retrieval methods. But in comparison with stronger retrieval models they only produce minimal improvements
NeuralMind-UNICAMP at 2022 TREC NeuCLIR: Large Boring Rerankers for Cross-lingual Retrieval
This paper reports on a study of cross-lingual information retrieval (CLIR)
using the mT5-XXL reranker on the NeuCLIR track of TREC 2022. Perhaps the
biggest contribution of this study is the finding that despite the mT5 model
being fine-tuned only on query-document pairs of the same language it proved to
be viable for CLIR tasks, where query-document pairs are in different
languages, even in the presence of suboptimal first-stage retrieval
performance. The results of the study show outstanding performance across all
tasks and languages, leading to a high number of winning positions. Finally,
this study provides valuable insights into the use of mT5 in CLIR tasks and
highlights its potential as a viable solution. For reproduction refer to
https://github.com/unicamp-dl/NeuCLIR22-mT
Investigation of the Lambda Parameter for Language Modeling Based Persian Retrieval
Language modeling is one of the most powerful methods in information retrieval. Many language modeling based retrieval systems have been developed and tested on English collections. Hence, the evaluation of language modeling on collections of other languages is an interesting research issue. In this study, four different language modeling methods proposed by Hiemstra [1] have been evaluated on a large Persian collection of a news archive. Furthermore, we study two different approaches that are proposed for tuning the Lambda parameter in the method. Experimental results show that the performance of language models on Persian text improves after Lambda Tuning. More specifically Witten Bell method provides the best results
Building a Text Collection for Urdu Information Retrieval
Urdu is a widely spoken language in the Indian subcontinent with over 300 million
speakers worldwide. However, linguistic advancements in Urdu are rare compared to
those in other European and Asian languages. Therefore, by following Text Retrieval
Conference standards, we attempted to construct an extensive text collection of
85 304 documents from diverse categories covering over 52 topics with relevance
judgment sets at 100 pool depth. We also present several applications to demonstrate
the effectiveness of our collection. Although this collection is primarily intended
for text retrieval, it can also be used for named entity recognition, text summarization,
and other linguistic applications with suitable modifications. Ours is the most
extensive existing collection for the Urdu language, and it will be freely available for
future research and academic education
Fusion of Retrieval Models at CLEF 2008 Ad Hoc Persian Track
Metasearch engines submit the user query to several under- lying search engines and then merge their retrieved results to generate a single list that is more e®ective to the users information needs. According to the idea behind metasearch engines, it seems that merging the results retrieved from di®erent retrieval models will improve the search coverage and precision. In this study, we have investigated the e®ect of fusion of di®erent retrieval techniques on the performance of Persian retrieval. We use an extension of Ordered Weighted Average (OWA) operator called IOWA and a weighting schema, NOWA for merging the results. Our ex- perimental results show that merging by OWA operators produces better MAP
- …