4,078 research outputs found
Determining the User Intent of Chinese-English Mixed Language Queries Based On Search Logs
With the increasing number of multilingual web pages on the Internet, multilingual information retrieval has become an important research topic. While queries are the key element of information retrieval process, mixed-language queries have not yet been adequately studied. This study is to determine the user intents of Chinese-English mixed-language queries submitted to a Chinese search engine, and compares the user intents identified by query content to those identified using additional user behavior data (e.g. clicked results, subsequent queries). The preliminary findings present the distributions of user intents by analyzing query only and additional user behavior data, suggesting a specific searching behavior of Chinese-English mixed-language queries users. The findings of this study could provide useful insights in understanding the searching behavior of Chinese-English mixed-language queries users, and enable web search engines to provide users with more relevant results and more precisely targeted sponsored links.ye
User experiments with the Eurovision cross-language image retrieval system
In this paper we present Eurovision, a text-based system for cross-language (CL) image retrieval.
The system is evaluated by multilingual users for two search tasks with the system configured in
English and five other languages. To our knowledge this is the first published set of user
experiments for CL image retrieval. We show that: (1) it is possible to create a usable multilingual
search engine using little knowledge of any language other than English, (2) categorizing images
assists the user's search, and (3) there are differences in the way users search between the proposed
search tasks. Based on the two search tasks and user feedback, we describe important aspects of
any CL image retrieval system
Teaching a New Dog Old Tricks: Resurrecting Multilingual Retrieval Using Zero-shot Learning
While billions of non-English speaking users rely on search engines every
day, the problem of ad-hoc information retrieval is rarely studied for
non-English languages. This is primarily due to a lack of data set that are
suitable to train ranking algorithms. In this paper, we tackle the lack of data
by leveraging pre-trained multilingual language models to transfer a retrieval
system trained on English collections to non-English queries and documents. Our
model is evaluated in a zero-shot setting, meaning that we use them to predict
relevance scores for query-document pairs in languages never seen during
training. Our results show that the proposed approach can significantly
outperform unsupervised retrieval techniques for Arabic, Chinese Mandarin, and
Spanish. We also show that augmenting the English training collection with some
examples from the target language can sometimes improve performance.Comment: ECIR 2020 (short
MIRACLE Retrieval Experiments with East Asian Languages
This paper describes the participation of MIRACLE in NTCIR 2005 CLIR task. Although our group has a strong background and long expertise in Computational Linguistics and Information Retrieval applied to European languages and using Latin and Cyrillic alphabets, this was our first attempt on East Asian languages. Our main goal was to study the particularities and distinctive characteristics of Japanese, Chinese and Korean, specially focusing on the similarities and differences with European languages, and carry out research on CLIR tasks which include those languages. The basic idea behind our participation in NTCIR is to test if the same familiar linguisticbased techniques may also applicable to East Asian languages, and study the necessary adaptations
Natural language processing
Beginning with the basic issues of NLP, this chapter aims to chart the major research activities in this area since the last ARIST Chapter in 1996 (Haas, 1996), including: (i) natural language text processing systems - text summarization, information extraction, information retrieval, etc., including domain-specific applications; (ii) natural language interfaces; (iii) NLP in the context of www and digital libraries ; and (iv) evaluation of NLP systems
Overview of the 2005 cross-language image retrieval track (ImageCLEF)
The purpose of this paper is to outline efforts from the 2005 CLEF crosslanguage image retrieval campaign (ImageCLEF). The aim of this CLEF track is to explore
the use of both text and content-based retrieval methods for cross-language image retrieval. Four tasks were offered in the ImageCLEF track: a ad-hoc retrieval from an historic photographic collection, ad-hoc retrieval from a medical collection, an automatic image annotation task, and a user-centered (interactive) evaluation task that is explained in the iCLEF summary. 24 research groups from a variety of backgrounds and nationalities (14 countries) participated in ImageCLEF. In this paper we describe the ImageCLEF tasks, submissions from participating groups and summarise the main fndings
Multilingual Schema Matching for Wikipedia Infoboxes
Recent research has taken advantage of Wikipedia's multilingualism as a
resource for cross-language information retrieval and machine translation, as
well as proposed techniques for enriching its cross-language structure. The
availability of documents in multiple languages also opens up new opportunities
for querying structured Wikipedia content, and in particular, to enable answers
that straddle different languages. As a step towards supporting such queries,
in this paper, we propose a method for identifying mappings between attributes
from infoboxes that come from pages in different languages. Our approach finds
mappings in a completely automated fashion. Because it does not require
training data, it is scalable: not only can it be used to find mappings between
many language pairs, but it is also effective for languages that are
under-represented and lack sufficient training samples. Another important
benefit of our approach is that it does not depend on syntactic similarity
between attribute names, and thus, it can be applied to language pairs that
have distinct morphologies. We have performed an extensive experimental
evaluation using a corpus consisting of pages in Portuguese, Vietnamese, and
English. The results show that not only does our approach obtain high precision
and recall, but it also outperforms state-of-the-art techniques. We also
present a case study which demonstrates that the multilingual mappings we
derive lead to substantial improvements in answer quality and coverage for
structured queries over Wikipedia content.Comment: VLDB201
Searching and organizing images across languages
With the continual growth of users on the Web
from a wide range of countries, supporting
such users in their search of cultural heritage
collections will grow in importance. In the
next few years, the growth areas of Internet
users will come from the Indian sub-continent
and China. Consequently, if holders of cultural
heritage collections wish their content to be
viewable by the full range of users coming to
the Internet, the range of languages that they
need to support will have to grow. This paper
will present recent work conducted at the
University of Sheffield (and now being
implemented in BRICKS) on how to use
automatic translation to provide search and
organisation facilities for a historical image
search engine. The system allows users to
search for images in seven different languages,
providing means for the user to examine
translated image captions and browse retrieved
images organised by categories written in their
native language
- …