3,128 research outputs found

    Overview of the TREC 2022 NeuCLIR Track

    Full text link
    This is the first year of the TREC Neural CLIR (NeuCLIR) track, which aims to study the impact of neural approaches to cross-language information retrieval. The main task in this year's track was ad hoc ranked retrieval of Chinese, Persian, or Russian newswire documents using queries expressed in English. Topics were developed using standard TREC processes, except that topics developed by an annotator for one language were assessed by a different annotator when evaluating that topic on a different language. There were 172 total runs submitted by twelve teams.Comment: 22 pages, 13 figures, 10 tables. Part of the Thirty-First Text REtrieval Conference (TREC 2022) Proceedings. Replace the misplaced Russian result tabl

    Fusion of retrieval models at CLEF 2008 Ad Hoc Persian Track

    Get PDF
    Metasearch engines submit the user query to several under- lying search engines and then merge their retrieved results to generate a single list that is more e®ective to the users information needs. According to the idea behind metasearch engines, it seems that merging the results retrieved from di®erent retrieval models will improve the search coverage and precision. In this study, we have investigated the e®ect of fusion of di®erent retrieval techniques on the performance of Persian retrieval. We use an extension of Ordered Weighted Average (OWA) operator called IOWA and a weighting schema, NOWA for merging the results. Our ex- perimental results show that merging by OWA operators produces better MAP

    Using OWA fuzzy operator to merge retrieval system results

    Get PDF
    With rapid growth of information sources, it is essential to develop methods that retrieve most relevant information according to the user requirements. One way of improving the quality of retrieval is to use more than one retrieval engine and then merge the retrieved results and show a single ranked list to the user. There are studies that suggest combining the results of multiple search engines will improve ranking when these engine are treated as independent experts. In this study, we investigated performance of Persian retrieval by merging four different language modeling methods and two vector space models with Lnu.ltu and Lnc.btc weighting schemes. The experiments were conducted on a large Persian collection of news archives called Hamshari Collection. Different variations of the Ordered Weighted Average (OWA) fuzzy operators method, called a quantifier based OWA operator and a degree-of-importance based OWA operator method have been tested for merging the results. Our experimental results show that the OWA operators produce better precision and ranking in comparison with weaker retrieval methods. But in comparison with stronger retrieval models they only produce minimal improvements

    NeuralMind-UNICAMP at 2022 TREC NeuCLIR: Large Boring Rerankers for Cross-lingual Retrieval

    Full text link
    This paper reports on a study of cross-lingual information retrieval (CLIR) using the mT5-XXL reranker on the NeuCLIR track of TREC 2022. Perhaps the biggest contribution of this study is the finding that despite the mT5 model being fine-tuned only on query-document pairs of the same language it proved to be viable for CLIR tasks, where query-document pairs are in different languages, even in the presence of suboptimal first-stage retrieval performance. The results of the study show outstanding performance across all tasks and languages, leading to a high number of winning positions. Finally, this study provides valuable insights into the use of mT5 in CLIR tasks and highlights its potential as a viable solution. For reproduction refer to https://github.com/unicamp-dl/NeuCLIR22-mT

    Investigation of the Lambda Parameter for Language Modeling Based Persian Retrieval

    Get PDF
    Language modeling is one of the most powerful methods in information retrieval. Many language modeling based retrieval systems have been developed and tested on English collections. Hence, the evaluation of language modeling on collections of other languages is an interesting research issue. In this study, four different language modeling methods proposed by Hiemstra [1] have been evaluated on a large Persian collection of a news archive. Furthermore, we study two different approaches that are proposed for tuning the Lambda parameter in the method. Experimental results show that the performance of language models on Persian text improves after Lambda Tuning. More specifically Witten Bell method provides the best results

    Building a Text Collection for Urdu Information Retrieval

    Get PDF
    Urdu is a widely spoken language in the Indian subcontinent with over 300 million speakers worldwide. However, linguistic advancements in Urdu are rare compared to those in other European and Asian languages. Therefore, by following Text Retrieval Conference standards, we attempted to construct an extensive text collection of 85 304 documents from diverse categories covering over 52 topics with relevance judgment sets at 100 pool depth. We also present several applications to demonstrate the effectiveness of our collection. Although this collection is primarily intended for text retrieval, it can also be used for named entity recognition, text summarization, and other linguistic applications with suitable modifications. Ours is the most extensive existing collection for the Urdu language, and it will be freely available for future research and academic education

    Fusion of Retrieval Models at CLEF 2008 Ad Hoc Persian Track

    Get PDF
    Metasearch engines submit the user query to several under- lying search engines and then merge their retrieved results to generate a single list that is more e®ective to the users information needs. According to the idea behind metasearch engines, it seems that merging the results retrieved from di®erent retrieval models will improve the search coverage and precision. In this study, we have investigated the e®ect of fusion of di®erent retrieval techniques on the performance of Persian retrieval. We use an extension of Ordered Weighted Average (OWA) operator called IOWA and a weighting schema, NOWA for merging the results. Our ex- perimental results show that merging by OWA operators produces better MAP
    corecore