Search CORE

2,615 research outputs found

Cross-lingual topical relevance models

Author: Ganguly Debasis
Jones Gareth J.F.
Leveling Johannes
Publication venue
Publication date: 08/12/2012
Field of study

Cross-lingual relevance modelling (CLRLM) is a state-of-the-art technique for cross-lingual information retrieval (CLIR) which integrates query term disambiguation and expansion in a unified framework, to directly estimate a model of relevant documents in the target language starting with a query in the source language. However, CLRLM involves integrating a translation model either on the document side if a parallel corpus is available, or on the query side if a bilingual dictionary is available. For low resourced language pairs, large parallel corpora do not exist and the vocabulary coverage of dictionaries is small, as a result of which RLM-based CLIR fails to obtain satisfactory results. Despite the lack of parallel resources for a majority of language pairs, the availability of comparable corpora for many languages has grown considerably in the recent years. Existing CLIR techniques such as cross-lingual relevance models, cannot effectively utilise these comparable corpora, since they do not use information from documents in the source language. We overcome this limitation by using information from retrieved documents in the source language to improve the retrieval quality of the target language documents. More precisely speaking, our model involves a two step approach of first retrieving documents both in the source language and the target language (using query translation), and then improving on the retrieval quality of target language documents by expanding the query with translations of words extracted from the top ranked documents retrieved in the source language which are thematically related (i.e. share the same concept) to the words in the top ranked target language documents. Our key hypothesis is that the query in the source language and its equivalent target language translation retrieve documents which share topics. The ovelapping topics of these top ranked documents in both languages are then used to improve the ranking of the target language documents. Since the model relies on the alignment of topics between language pairs, we call it the cross-lingual topical relevance model (CLTRLM). Experimental results show that the CLTRLM significantly outperforms the standard CLRLM by upto 37% on English-Bengali CLIR, achieving mean average precision (MAP) of up to 60.27% of the Bengali monolingual IR MAP

DCU Online Research Access Service

PRIME: A System for Multi-lingual Patent Retrieval

Author: Fujii Atsushi
Fukui Masatoshi
Higuchi Shigeto
Ishikawa Tetsuya
Publication venue
Publication date: 01/01/2001
Field of study

Given the growing number of patents filed in multiple countries, users are interested in retrieving patents across languages. We propose a multi-lingual patent retrieval system, which translates a user query into the target language, searches a multilingual database for patents relevant to the query, and improves the browsing efficiency by way of machine translation and clustering. Our system also extracts new translations from patent families consisting of comparable patents, to enhance the translation dictionary

arXiv.org e-Print Archive

CiteSeerX

Explicit versus Latent Concept Models for Cross-Language Information Retrieval

Author: Boutilier Craig
Cimiano Philipp
Schultz Antje
Sizov Sergej
Sorg Philipp
Staab Steffen
Publication venue: AAAI Press
Publication date: 01/01/2009
Field of study

Cimiano P, Schultz A, Sizov S, Sorg P, Staab S. Explicit versus Latent Concept Models for Cross-Language Information Retrieval. In: Boutilier C, ed. IJCAI 2009, Proceedings of the 21st International Joint Conference on Artificial Intelligence. Menlo Park, CA: AAAI Press; 2009: 1513-1518

Publications at Bielefeld University

Introduction to the special issue on cross-language algorithms and applications

Author: Bangalore Srinivas
Lambert Patrik
Montiel-Ponsoda Elena
Màrquez Lluís
Ruiz Costa-Jussà Marta
Publication venue
Publication date: 01/01/2016
Field of study

With the increasingly global nature of our everyday interactions, the need for multilingual technologies to support efficient and efective information access and communication cannot be overemphasized. Computational modeling of language has been the focus of Natural Language Processing, a subdiscipline of Artificial Intelligence. One of the current challenges for this discipline is to design methodologies and algorithms that are cross-language in order to create multilingual technologies rapidly. The goal of this JAIR special issue on Cross-Language Algorithms and Applications (CLAA) is to present leading research in this area, with emphasis on developing unifying themes that could lead to the development of the science of multi- and cross-lingualism. In this introduction, we provide the reader with the motivation for this special issue and summarize the contributions of the papers that have been included. The selected papers cover a broad range of cross-lingual technologies including machine translation, domain and language adaptation for sentiment analysis, cross-language lexical resources, dependency parsing, information retrieval and knowledge representation. We anticipate that this special issue will serve as an invaluable resource for researchers interested in topics of cross-lingual natural language processing.Postprint (published version

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

DCU and UTA at ImageCLEFPhoto 2007

Author: Adamek Tomasz
Airio Eija
Jones Gareth J.F.
Järvelin Anni
Wilkins Peter
Publication venue
Publication date: 01/09/2007
Field of study

Dublin City University (DCU) and University of Tampere(UTA) participated in the ImageCLEF 2007 photographic ad-hoc retrieval task with several monolingual and bilingual runs. Our approach was language independent: text retrieval based on fuzzy s-gram query translation was combined with visual retrieval. Data fusion between text and image content was performed using unsupervised query-time weight generation approaches. Our baseline was a combination of dictionary-based query translation and visual retrieval, which achieved the best result. The best mixed modality runs using fuzzy s-gram translation achieved on average around 83% of the performance of the baseline. Performance was more similar when only top rank precision levels of P10 and P20 were considered. This suggests that fuzzy sgram query translation combined with visual retrieval is a cheap alternative for cross-lingual image retrieval where only a small number of relevant items are required. Both sets of results emphasize the merit of our query-time weight generation schemes for data fusion, with the fused runs exhibiting marked performance increases over single modalities, this is achieved without the use of any prior training data

Irish Universities

DCU Online Research Access Service

Miracle’s 2005 Approach to Cross-lingual Information Retrieval

Author: González Cristóbal José Carlos
Goñi Menoyo José Miguel
Villena Román Julio
Publication venue: E.T.S.I. Telecomunicación (UPM)
Publication date: 01/01/2005
Field of study

This paper presents the 2005 Miracle’s team approach to Bilingual and Multilingual Information Retrieval. In the multilingual track, we have concentrated our work on the merging process of the results of monolingual runs to get the multilingual overall result, relying on available translations. In the bilingual and multilingual tracks, we have used available translation resources, and in some cases we have using a combining approach

Archivo Digital UPM

Cross-lingual document retrieval categorisation and navigation based on distributed services

Author: Deksne D.
Demetriou G.
Gaizauskas R.
Hansen P.
Karlgren J.
Keskustalo H.
Petrelli D.
Sanderson M.
Skadina I.
Publication venue
Publication date
Field of study

The widespread use of the Internet across countries has increased the need for access to document collections that are often written in languages different from a user’s native language. In this paper we describe Clarity, a Cross Language Information Retrieval (CLIR) system for English, Finnish, Swedish, Latvian and Lithuanian. Clarity is a fully-fledged retrieval system that supports the user during the whole process of query formulation, text retrieval and document browsing. We address four of the major aspects of Clarity: (i) the user-driven methodology that formed the basis for the iterative design cycle and framework in the project, (ii) the system architecture that was developed to support the interaction and coordination of Clarity’s distributed services, (iii) the data resources and methods for query translation, and (iv) the support for Baltic languages. Clarity is an example of a distributed CLIR system built with minimal translation resources and, to our knowledge, the only such system that currently supports Baltic languages

White Rose Research Online

Embedding Web-based Statistical Translation Models in Cross-Language Information Retrieval

Author: Kraaij Wessel
Nie Jian-Yun
Simard Michel
Publication venue
Publication date: 01/01/2003
Field of study

Although more and more language pairs are covered by machine translation services, there are still many pairs that lack translation resources. Cross-language information retrieval (CLIR) is an application which needs translation functionality of a relatively low level of sophistication since current models for information retrieval (IR) are still based on a bag-of-words. The Web provides a vast resource for the automatic construction of parallel corpora which can be used to train statistical translation models automatically. The resulting translation models can be embedded in several ways in a retrieval model. In this paper, we will investigate the problem of automatically mining parallel texts from the Web and different ways of integrating the translation models within the retrieval process. Our experiments on standard test collections for CLIR show that the Web-based translation models can surpass commercial MT systems in CLIR tasks. These results open the perspective of constructing a fully automatic query translation device for CLIR at a very low cost.Comment: 37 page

arXiv.org e-Print Archive

CiteSeerX

Leiden University Scholary Publications