680 research outputs found
Teaching a New Dog Old Tricks: Resurrecting Multilingual Retrieval Using Zero-shot Learning
While billions of non-English speaking users rely on search engines every
day, the problem of ad-hoc information retrieval is rarely studied for
non-English languages. This is primarily due to a lack of data set that are
suitable to train ranking algorithms. In this paper, we tackle the lack of data
by leveraging pre-trained multilingual language models to transfer a retrieval
system trained on English collections to non-English queries and documents. Our
model is evaluated in a zero-shot setting, meaning that we use them to predict
relevance scores for query-document pairs in languages never seen during
training. Our results show that the proposed approach can significantly
outperform unsupervised retrieval techniques for Arabic, Chinese Mandarin, and
Spanish. We also show that augmenting the English training collection with some
examples from the target language can sometimes improve performance.Comment: ECIR 2020 (short
Natural language processing
Beginning with the basic issues of NLP, this chapter aims to chart the major research activities in this area since the last ARIST Chapter in 1996 (Haas, 1996), including: (i) natural language text processing systems - text summarization, information extraction, information retrieval, etc., including domain-specific applications; (ii) natural language interfaces; (iii) NLP in the context of www and digital libraries ; and (iv) evaluation of NLP systems
Sheffield University CLEF 2000 submission - bilingual track: German to English
We investigated dictionary based cross language information
retrieval using lexical triangulation. Lexical triangulation combines the results
of different transitive translations. Transitive translation uses a pivot language
to translate between two languages when no direct translation resource is
available. We took German queries and translated then via Spanish, or Dutch
into English. We compared the results of retrieval experiments using these
queries, with other versions created by combining the transitive translations or
created by direct translation. Direct dictionary translation of a query introduces
considerable ambiguity that damages retrieval, an average precision 79% below
monolingual in this research. Transitive translation introduces more ambiguity,
giving results worse than 88% below direct translation. We have shown that
lexical triangulation between two transitive translations can eliminate much of
the additional ambiguity introduced by transitive translation
Language Resources Used in Multi-Lingual Question Answering Systems
Purpose – In the field of information retrieval, some multi-lingual tools are being created to help the users to overcome the language barriers. Nevertheless, these tools are not developed completely and it is necessary to investigate more for their improvement and application. One of their main problems is the choice of the linguistic resources to offer better coverage and to solve the translation problems in the context of the multi-lingual information retrieval. This paper aims to address this issue. Design/methodology/approach – This research is focused on the analysis of resources used by the multi-lingual question-answering systems, which respond to users' queries with short answers, rather than just offering a list of documents related to the search. An analysis of the main publications about the multi-lingual QA systems was carried out, with the aim of identifying the typology, the advantages and disadvantages, and the real use and trend of each of the linguistic resources and tools used in this new kind of system. Findings – Five of the resources most used in the cross-languages QA systems were identified and studied: databases, dictionaries, corpora, ontologies and thesauri. The three most popular traditional resources (automatic translators, dictionaries, and corpora) are gradually leaving a widening gap for others – such as ontologies and the free encyclopaedia Wikipedia. Originality/value – The perspective offered by the translation discipline can improve the effectiveness of QA system
Observing Users - Designing clarity a case study on the user-centred design of a cross-language information retrieval system
This paper presents a case study of the development of an interface to a novel and complex form of document retrieval: searching for texts written in foreign languages based on native language queries. Although the underlying technology for achieving such a search is relatively well understood, the appropriate interface design is not. A study involving users (with such searching needs) from the start of the design process is described covering initial examination of user needs and tasks; preliminary
design and testing of interface components; building, testing, and further refining an interface; before
finally conducting usability tests of the system. Lessons are learned at every stage of the process leading to a much more informed view of how such an interface should be built
Language Resources Used in Multi-Lingual Question Answering Systems
Purpose – In the field of information retrieval, some multi-lingual tools are being created to help the users to overcome the language barriers. Nevertheless, these tools are not developed completely and it is necessary to investigate more for their improvement and application. One of their main problems is the choice of the linguistic resources to offer better coverage and to solve the translation problems in the context of the multi-lingual information retrieval. This paper aims to address this issue. Design/methodology/approach – This research is focused on the analysis of resources used by the multi-lingual question-answering systems, which respond to users' queries with short answers, rather than just offering a list of documents related to the search. An analysis of the main publications about the multi-lingual QA systems was carried out, with the aim of identifying the typology, the advantages and disadvantages, and the real use and trend of each of the linguistic resources and tools used in this new kind of system. Findings – Five of the resources most used in the cross-languages QA systems were identified and studied: databases, dictionaries, corpora, ontologies and thesauri. The three most popular traditional resources (automatic translators, dictionaries, and corpora) are gradually leaving a widening gap for others – such as ontologies and the free encyclopaedia Wikipedia. Originality/value – The perspective offered by the translation discipline can improve the effectiveness of QA system
Dublin City University at CLEF 2007: Cross-Language Speech Retrieval Experiments
The Dublin City University participation in the CLEF 2007 CL-SR English task concentrated primarily on issues of topic translation. Our retrieval system used the BM25F model and pseudo relevance feedback. Topics were translated into English using the Yahoo! BabelFish free online service combined with domain-specific translation lexicons gathered automatically from Wikipedia. We explored alternative topic translation methods using these resources. Our results indicate that extending machine translation tools using automatically generated domainspecific translation lexicons can provide improved CLIR effectiveness for this task
- …