6,861 research outputs found

    Japanese/English Cross-Language Information Retrieval: Exploration of Query Translation and Transliteration

    Full text link
    Cross-language information retrieval (CLIR), where queries and documents are in different languages, has of late become one of the major topics within the information retrieval community. This paper proposes a Japanese/English CLIR system, where we combine a query translation and retrieval modules. We currently target the retrieval of technical documents, and therefore the performance of our system is highly dependent on the quality of the translation of technical terms. However, the technical term translation is still problematic in that technical terms are often compound words, and thus new terms are progressively created by combining existing base words. In addition, Japanese often represents loanwords based on its special phonogram. Consequently, existing dictionaries find it difficult to achieve sufficient coverage. To counter the first problem, we produce a Japanese/English dictionary for base words, and translate compound words on a word-by-word basis. We also use a probabilistic method to resolve translation ambiguity. For the second problem, we use a transliteration method, which corresponds words unlisted in the base word dictionary to their phonetic equivalents in the target language. We evaluate our system using a test collection for CLIR, and show that both the compound word translation and transliteration methods improve the system performance

    Natural language processing

    Get PDF
    Beginning with the basic issues of NLP, this chapter aims to chart the major research activities in this area since the last ARIST Chapter in 1996 (Haas, 1996), including: (i) natural language text processing systems - text summarization, information extraction, information retrieval, etc., including domain-specific applications; (ii) natural language interfaces; (iii) NLP in the context of www and digital libraries ; and (iv) evaluation of NLP systems

    Which user interaction for cross-language information retrieval? Design issues and reflections

    Get PDF
    A novel and complex form of information access is cross-language information retrieval: searching for texts written in foreign languages based on native language queries. Although the underlying technology for achieving such a search is relatively well understood, the appropriate interface design is not. The authors present three user evaluations undertaken during the iterative design of Clarity, a cross-language retrieval system for low-density languages, and shows how the user-interaction design evolved depending on the results of usability tests. The first test was instrumental to identify weaknesses in both functionalities and interface; the second was run to determine if query translation should be shown or not; the final was a global assessment and focused on user satisfaction criteria. Lessons were learned at every stage of the process leading to a much more informed view of what a cross-language retrieval system should offer to users

    Beyond English text: Multilingual and multimedia information retrieval.

    Get PDF
    Non

    MIRACLE Retrieval Experiments with East Asian Languages

    Get PDF
    This paper describes the participation of MIRACLE in NTCIR 2005 CLIR task. Although our group has a strong background and long expertise in Computational Linguistics and Information Retrieval applied to European languages and using Latin and Cyrillic alphabets, this was our first attempt on East Asian languages. Our main goal was to study the particularities and distinctive characteristics of Japanese, Chinese and Korean, specially focusing on the similarities and differences with European languages, and carry out research on CLIR tasks which include those languages. The basic idea behind our participation in NTCIR is to test if the same familiar linguisticbased techniques may also applicable to East Asian languages, and study the necessary adaptations

    Multilingual search for cultural heritage archives via combining multiple translation resources

    Get PDF
    The linguistic features of material in Cultural Heritage (CH) archives may be in various languages requiring a facility for effective multilingual search. The specialised language often associated with CH content introduces problems for automatic translation to support search applications. The MultiMatch project is focused on enabling users to interact with CH content across different media types and languages. We present results from a MultiMatch study exploring various translation techniques for the CH domain. Our experiments examine translation techniques for the English language CLEF 2006 Cross-Language Speech Retrieval (CL-SR) task using Spanish, French and German queries. Results compare effectiveness of our query translation against a monolingual baseline and show improvement when combining a domain-specific translation lexicon with a standard machine translation system

    User experiments with the Eurovision cross-language image retrieval system

    Get PDF
    In this paper we present Eurovision, a text-based system for cross-language (CL) image retrieval. The system is evaluated by multilingual users for two search tasks with the system configured in English and five other languages. To our knowledge this is the first published set of user experiments for CL image retrieval. We show that: (1) it is possible to create a usable multilingual search engine using little knowledge of any language other than English, (2) categorizing images assists the user's search, and (3) there are differences in the way users search between the proposed search tasks. Based on the two search tasks and user feedback, we describe important aspects of any CL image retrieval system

    Analysis of errors in the automatic translation of questions for translingual QA systems

    Get PDF
    Purpose – This study aims to focus on the evaluation of systems for the automatic translation of questions destined to translingual question-answer (QA) systems. The efficacy of online translators when performing as tools in QA systems is analysed using a collection of documents in the Spanish language. Design/methodology/approach – Automatic translation is evaluated in terms of the functionality of actual translations produced by three online translators (Google Translator, Promt Translator, and Worldlingo) by means of objective and subjective evaluation measures, and the typology of errors produced was identified. For this purpose, a comparative study of the quality of the translation of factual questions of the CLEF collection of queries was carried out, from German and French to Spanish. Findings – It was observed that the rates of error for the three systems evaluated here are greater in the translations pertaining to the language pair German-Spanish. Promt was identified as the most reliable translator of the three (on average) for the two linguistic combinations evaluated. However, for the Spanish-German pair, a good assessment of the Google online translator was obtained as well. Most errors (46.38 percent) tended to be of a lexical nature, followed by those due to a poor translation of the interrogative particle of the query (31.16 percent). Originality/value – The evaluation methodology applied focuses above all on the finality of the translation. That is, does the resulting question serve as effective input into a translingual QA system? Thus, instead of searching for “perfection”, the functionality of the question and its capacity to lead one to an adequate response are appraised. The results obtained contribute to the development of improved translingual QA systems

    Overview of the 2005 cross-language image retrieval track (ImageCLEF)

    Get PDF
    The purpose of this paper is to outline efforts from the 2005 CLEF crosslanguage image retrieval campaign (ImageCLEF). The aim of this CLEF track is to explore the use of both text and content-based retrieval methods for cross-language image retrieval. Four tasks were offered in the ImageCLEF track: a ad-hoc retrieval from an historic photographic collection, ad-hoc retrieval from a medical collection, an automatic image annotation task, and a user-centered (interactive) evaluation task that is explained in the iCLEF summary. 24 research groups from a variety of backgrounds and nationalities (14 countries) participated in ImageCLEF. In this paper we describe the ImageCLEF tasks, submissions from participating groups and summarise the main fndings
    • …
    corecore