314 research outputs found
Japanese/English Cross-Language Information Retrieval: Exploration of Query Translation and Transliteration
Cross-language information retrieval (CLIR), where queries and documents are
in different languages, has of late become one of the major topics within the
information retrieval community. This paper proposes a Japanese/English CLIR
system, where we combine a query translation and retrieval modules. We
currently target the retrieval of technical documents, and therefore the
performance of our system is highly dependent on the quality of the translation
of technical terms. However, the technical term translation is still
problematic in that technical terms are often compound words, and thus new
terms are progressively created by combining existing base words. In addition,
Japanese often represents loanwords based on its special phonogram.
Consequently, existing dictionaries find it difficult to achieve sufficient
coverage. To counter the first problem, we produce a Japanese/English
dictionary for base words, and translate compound words on a word-by-word
basis. We also use a probabilistic method to resolve translation ambiguity. For
the second problem, we use a transliteration method, which corresponds words
unlisted in the base word dictionary to their phonetic equivalents in the
target language. We evaluate our system using a test collection for CLIR, and
show that both the compound word translation and transliteration methods
improve the system performance
Natural language processing
Beginning with the basic issues of NLP, this chapter aims to chart the major research activities in this area since the last ARIST Chapter in 1996 (Haas, 1996), including: (i) natural language text processing systems - text summarization, information extraction, information retrieval, etc., including domain-specific applications; (ii) natural language interfaces; (iii) NLP in the context of www and digital libraries ; and (iv) evaluation of NLP systems
Query translation architecture for Malay-English cross-language information retrieval system
This paper discusses research on query translation events in Malay-English Cross-Language Information Retrieval (CLIR) system. We assume that by improving query translation accuracy, we can improve the information retrieval performance. The dictionary-based CLIR system facing three main problems: translation ambiguity; compound and phrase handling and proper names translation. The use of natural language processing (NLP) techniques, such as stemming, Part-of-Speech (POS) tagging is useful in query translation process. Hence, n-gram matching technique has successfully applied to information retrieval (IR) system for phrases and proper names translation. The proposed query translation architecture consist of stemming, Part-of-Speech (POS) tagging and n-gram matching techniques is useful in CLIR system as well as search engine application
Which user interaction for cross-language information retrieval? Design issues and reflections
A novel and complex form of information access is cross-language information retrieval: searching for texts written in foreign languages based on native language queries. Although the underlying technology for achieving such a search is relatively well understood, the appropriate interface design is not. The authors present three user evaluations undertaken during the iterative design of Clarity, a cross-language retrieval system for low-density languages, and shows how the user-interaction design evolved depending on the results of usability tests. The first test was instrumental to identify weaknesses in both functionalities and interface; the second was run to determine if query translation should be shown or not; the final was a global assessment and focused on user satisfaction criteria. Lessons were learned at every stage of the process leading to a much more informed view of what a cross-language retrieval system should offer to users
Which User Interaction for Cross-Language Information Retrieval? Design Issues and Reflections
A novel and complex form of information access is cross-language information retrieval: searching for texts written in foreign languages based on native language queries. Although the underlying technology for achieving such a search is relatively well understood, the appropriate interface design is not. This paper presents three user evaluations undertaken during the iterative design of Clarity, a cross-language retrieval system for rare languages, and shows how the user interaction design evolved depending on the results of usability tests. The first test was instrumental to identify weaknesses in both functionalities and interface; the second was run to determine if query translation should be shown or not; the final was a global assessment and focussed on user satisfaction criteria. Lessons were learned at every stage of the process leading to a much more informed view of what a cross-language retrieval system should offer to users
A Domain Specific Lexicon Acquisition Tool for Cross-Language Information Retrieval
With the recent enormous increase of information dissemination via the web as incentive there is a growing interest in supporting tools for cross-language retrieval. In this paper we describe a disclosure and retrieval approach that fulfils the needs of both information providers and users by offering fast and cheap access to large amounts of documents from various language domains. Relevant information can be retrieved irrespective of the language used for the specification of a query. In order to realize this type of multilingual functionality the availability of several translation tools is needed, both of a generic and a domain specific nature. Domain specific tools are often not available or only against large costs. In this paper we will therefore focus on a way to reduce these costs, namely the automatic derivation of multilingual resources from so-called parallel text corpora. The benefits of this approach will be illustrated for an example system, i.e. the demonstrator developed within the project Twenty-One, which is tuned to information from the area of sustainable development
Which User Interaction for Cross-Language Information Retrieval? Design Issues and Reflections
A novel and complex form of information access is cross-language information retrieval: searching for texts written in foreign languages based on native language queries. Although the underlying technology for achieving such a search is relatively well understood, the appropriate interface design is not. This paper presents three user evaluations undertaken during the iterative design of Clarity, a cross-language retrieval system for rare languages, and shows how the user interaction design evolved depending on the results of usability tests. The first test was instrumental to identify weaknesses in both functionalities and interface; the second was run to determine if query translation should be shown or not; the final was a global assessment and focussed on user satisfaction criteria. Lessons were learned at every stage of the process leading to a much more informed view of what a cross-language retrieval system should offer to users
Cross-language Text Classification with Convolutional Neural Networks From Scratch
Cross language classification is an important task in multilingual learning, where documents in different languages often share the same set of categories. The main goal is to reduce the labeling cost of training classification model for each individual language. The novel approach by using Convolutional Neural Networks for multilingual language classification is proposed in this article. It learns representation of knowledge gained from languages. Moreover, current method works for new individual language, which was not used in training. The results of empirical study on large dataset of 21 languages demonstrate robustness and competitiveness of the presented approach
Knowledge Management and Cultural Heritage Repositories. Cross-Lingual Information Retrieval Strategies
In the last years important initiatives, like the development of the European Library and Europeana, aim to increase the availability of cultural content from various types of providers and institutions. The accessibility to these resources requires the development of environments which allow both to manage multilingual complexity and to preserve the semantic interoperability. The creation of Natural Language Processing (NLP) applications is finalized to the achievement of CrossLingual Information Retrieval (CLIR). This paper presents an ongoing research on language processing based on the LexiconGrammar (LG) approach with the goal of improving knowledge management in the Cultural Heritage repositories. The proposed framework aims to guarantee interoperability between multilingual systems in order to overcome crucial issues like cross-language and cross-collection retrieval. Indeed, the LG methodology tries to overcome the shortcomings of statistical approaches as in Google Translate or Bing by Microsoft concerning Multi-Word Unit (MWU) processing in queries, where the lack of linguistic context represents a serious obstacle to disambiguation. In particular, translations concerning specific domains, as it is has been widely recognized, is unambiguous since the meanings of terms are mono-referential and the type of relation that links a given term to its equivalent in a foreign language is biunivocal, i.e. a one-to-one coupling which causes this relation to be exclusive and reversible. Ontologies are used in CLIR and are considered by several scholars a promising research area to improve the effectiveness of Information Extraction (IE) techniques particularly for technical-domain queries. Therefore, we present a methodological framework which allows to map both the data and the metadata among the language-specific ont
- …