Search CORE

3,128 research outputs found

Overview of the TREC 2022 NeuCLIR Track

Author: Lawrie Dawn
MacAvaney Sean
Mayfield James
McNamee Paul
Oard Douglas W.
Soldaini Luca
Yang Eugene
Publication venue
Publication date: 24/09/2023
Field of study

This is the first year of the TREC Neural CLIR (NeuCLIR) track, which aims to study the impact of neural approaches to cross-language information retrieval. The main task in this year's track was ad hoc ranked retrieval of Chinese, Persian, or Russian newswire documents using queries expressed in English. Topics were developed using standard TREC processes, except that topics developed by an annotator for one language were assessed by a different annotator when evaluating that topic on a different language. There were 172 total runs submitted by twelve teams.Comment: 22 pages, 13 figures, 10 tables. Part of the Thirty-First Text REtrieval Conference (TREC 2022) Proceedings. Replace the misplaced Russian result tabl

arXiv.org e-Print Archive

Fusion of retrieval models at CLEF 2008 Ad Hoc Persian Track

Author: Aghazade Zahra
AleAhmad Abolfazel
Amiri Hadi
Dehghani Nazanin
Farzinvash Leili
Oroumchian Farhad
Rahimi Razieh
Publication venue: 'Sociological Research Online'
Publication date: 01/01/2009
Field of study

Metasearch engines submit the user query to several under- lying search engines and then merge their retrieved results to generate a single list that is more e®ective to the users information needs. According to the idea behind metasearch engines, it seems that merging the results retrieved from di®erent retrieval models will improve the search coverage and precision. In this study, we have investigated the e®ect of fusion of di®erent retrieval techniques on the performance of Persian retrieval. We use an extension of Ordered Weighted Average (OWA) operator called IOWA and a weighting schema, NOWA for merging the results. Our ex- perimental results show that merging by OWA operators produces better MAP

Research Online

Using OWA fuzzy operator to merge retrieval system results

Author: AleAhmad A.
Amiri H.
Lucas C.
Oroumchian Farhad
Rahgozar M.
Publication venue: 'Sociological Research Online'
Publication date: 01/01/2007
Field of study

With rapid growth of information sources, it is essential to develop methods that retrieve most relevant information according to the user requirements. One way of improving the quality of retrieval is to use more than one retrieval engine and then merge the retrieved results and show a single ranked list to the user. There are studies that suggest combining the results of multiple search engines will improve ranking when these engine are treated as independent experts. In this study, we investigated performance of Persian retrieval by merging four different language modeling methods and two vector space models with Lnu.ltu and Lnc.btc weighting schemes. The experiments were conducted on a large Persian collection of news archives called Hamshari Collection. Different variations of the Ordered Weighted Average (OWA) fuzzy operators method, called a quantifier based OWA operator and a degree-of-importance based OWA operator method have been tested for merging the results. Our experimental results show that the OWA operators produce better precision and ranking in comparison with weaker retrieval methods. But in comparison with stronger retrieval models they only produce minimal improvements

Research Online

NeuralMind-UNICAMP at 2022 TREC NeuCLIR: Large Boring Rerankers for Cross-lingual Retrieval

Author: Jeronymo Vitor
Lotufo Roberto
Nogueira Rodrigo
Publication venue
Publication date: 28/03/2023
Field of study

This paper reports on a study of cross-lingual information retrieval (CLIR) using the mT5-XXL reranker on the NeuCLIR track of TREC 2022. Perhaps the biggest contribution of this study is the finding that despite the mT5 model being fine-tuned only on query-document pairs of the same language it proved to be viable for CLIR tasks, where query-document pairs are in different languages, even in the presence of suboptimal first-stage retrieval performance. The results of the study show outstanding performance across all tasks and languages, leading to a high number of winning positions. Finally, this study provides valuable insights into the use of mT5 in CLIR tasks and highlights its potential as a viable solution. For reproduction refer to https://github.com/unicamp-dl/NeuCLIR22-mT

arXiv.org e-Print Archive

Investigation of the Lambda Parameter for Language Modeling Based Persian Retrieval

Author: Abedinzadeh Sadra
Amiri Hadi
Oroumchian Farhad
Rahgozar Masoud
Tavallaee Mahbod
Zarnani Ashkan
Publication venue: 'Sociological Research Online'
Publication date: 01/01/2008
Field of study

Language modeling is one of the most powerful methods in information retrieval. Many language modeling based retrieval systems have been developed and tested on English collections. Hence, the evaluation of language modeling on collections of other languages is an interesting research issue. In this study, four different language modeling methods proposed by Hiemstra [1] have been evaluated on a large Persian collection of a news archive. Furthermore, we study two different approaches that are proposed for tuning the Lambda parameter in the method. Experimental results show that the performance of language models on Persian text improves after Lambda Tuning. More specifically Witten Bell method provides the best results

Research Online

GeoCLEF 2008: the CLEF 2008 Cross-Language Geographic Information Retrieval Track Overview

Author: Carvalho Paula
Gey Fredric
Larson Ray
Mandl Thomas
Santos Diana
Womser-Hacker Christa
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 01/01/2009
Field of study

Repositório Comum

Building a Text Collection for Urdu Information Retrieval

Author: Banka Haider
Khan Hamaid M.
Rasheed Imran
Publication venue: 'Wiley'
Publication date: 01/01/2021
Field of study

Urdu is a widely spoken language in the Indian subcontinent with over 300 million speakers worldwide. However, linguistic advancements in Urdu are rare compared to those in other European and Asian languages. Therefore, by following Text Retrieval Conference standards, we attempted to construct an extensive text collection of 85 304 documents from diverse categories covering over 52 topics with relevance judgment sets at 100 pool depth. We also present several applications to demonstrate the effectiveness of our collection. Although this collection is primarily intended for text retrieval, it can also be used for named entity recognition, text summarization, and other linguistic applications with suitable modifications. Ours is the most extensive existing collection for the Urdu language, and it will be freely available for future research and academic education

DSpace@FSM Vakif University

Fusion of Retrieval Models at CLEF 2008 Ad Hoc Persian Track

Author: Aghazade Zahra
AleAhmad Abolfazel
Amiri Hadi
Dehghani Nazanin
Farzinvash Leili
Oroumchian Farhad
Rahimi Razieh
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2009
Field of study

Crossref

Research Online