7,801 research outputs found
Kannada and Telugu Native Languages to English Cross Language Information Retrieval
One of the crucial challenges in cross lingual
information retrieval is the retrieval of relevant information for
a query expressed in as native language. While retrieval of
relevant documents is slightly easier, analysing the relevance of
the retrieved documents and the presentation of the results to
the users are non-trivial tasks. To accomplish the above task,
we present our Kannada English and Telugu English CLIR
systems as part of Ad-Hoc Bilingual task. We take a query
translation based approach using bi-lingual dictionaries. When
a query words not found in the dictionary then the words are
transliterated using a simple rule based approach which
utilizes the corpus to return the ‘k’ closest English
transliterations of the given Kannada/Telugu word. The
resulting multiple translation/transliteration choices for each
query word are disambiguated using an iterative page-rank
style algorithm which, based on term-term co-occurrence
statistics, produces the final translated query. Finally we
conduct experiments on these translated query using a
Kannada/Telugu document collection and a set of English
queries to report the improvements, performance achieved for
each task is to be presented and statistical analysis of these
results are given
A framework for English and Malay cross-lingual document alignment method
Issues of information divide in multilingual information
retrieval are usually being solved by translating users’ queries
to a language that the users understand. But dictionaries or
other translation knowledge in some of the Asian languages
are scarce. The objective of this study was to automatically
align the English and Malay news documents to become a
comparable corpus, which could contribute as a translation
resource to improve the query translation in cross-lingual
information retrieval. This study proposes a direct alignment
framework by utilizing the textual features similarity of each
document itself while attempting a novel approach of using
the similarity of the documents sentiment in improving the
effectiveness of the alignment method. The proposed
sentiment-based approach outperformed existing alignment
methods and improved the effectiveness in differentiating the
related and unrelated documents. These aligned comparable
documents can further be utilised in translation research for
the English and Malay cross-lingual information retrieval
tasks
A Multi-Task Architecture on Relevance-based Neural Query Translation
We describe a multi-task learning approach to train a Neural Machine
Translation (NMT) model with a Relevance-based Auxiliary Task (RAT) for search
query translation. The translation process for Cross-lingual Information
Retrieval (CLIR) task is usually treated as a black box and it is performed as
an independent step. However, an NMT model trained on sentence-level parallel
data is not aware of the vocabulary distribution of the retrieval corpus. We
address this problem with our multi-task learning architecture that achieves
16% improvement over a strong NMT baseline on Italian-English query-document
dataset. We show using both quantitative and qualitative analysis that our
model generates balanced and precise translations with the regularization
effect it achieves from multi-task learning paradigm.Comment: Accepted for publication at ACL 201
Explicit versus Latent Concept Models for Cross-Language Information Retrieval
Cimiano P, Schultz A, Sizov S, Sorg P, Staab S. Explicit versus Latent Concept Models for Cross-Language Information Retrieval. In: Boutilier C, ed. IJCAI 2009, Proceedings of the 21st International Joint Conference on Artificial Intelligence. Menlo Park, CA: AAAI Press; 2009: 1513-1518
Introduction to the special issue on cross-language algorithms and applications
With the increasingly global nature of our everyday interactions, the need for multilingual technologies to support efficient and efective information access and communication cannot be overemphasized. Computational modeling of language has been the focus of
Natural Language Processing, a subdiscipline of Artificial Intelligence. One of the current challenges for this discipline is to design methodologies and algorithms that are cross-language in order to create multilingual technologies rapidly. The goal of this JAIR special
issue on Cross-Language Algorithms and Applications (CLAA) is to present leading research in this area, with emphasis on developing unifying themes that could lead to the development of the science of multi- and cross-lingualism. In this introduction, we provide the reader with the motivation for this special issue and summarize the contributions of the papers that have been included. The selected papers cover a broad range of cross-lingual technologies including machine translation, domain and language adaptation for sentiment
analysis, cross-language lexical resources, dependency parsing, information retrieval and knowledge representation. We anticipate that this special issue will serve as an invaluable resource for researchers interested in topics of cross-lingual natural language processing.Postprint (published version
Automatic Construction of Cross-lingual Networks of Concepts from the Hong Kong SAR Police Department
Abstract. The tragic event of September 11 has prompted the rapid growth of attention of national security and criminal analysis. In the national security world, very large volumes of data and information are generated and gathered. Much of this data and information written in different languages and stored in different locations may be seemingly unconnected. Therefore, cross-lingual semantic interoperability is a major challenge to generate an overview of this disparate data and information so that it can be analysed, searched. The traditional information retrieval (IR) approaches normally require a document to share some keywords with the query. In reality, the users may use some keywords that are different from what used in the documents. There are then two different term spaces, one for the users, and another for the documents. The problem can be viewed as the creation of a thesaurus. The creation of such relationships would allow the system to match queries with relevant documents, even though they contain different terms. Apart from this, terrorists and criminals may communicate through letters, e-mails and faxes in languages other than English. The translation ambiguity significantly exacerbates the retrieval problem. To facilitate cross-lingual information retrieval, a corpusbased approach uses the term co-occurrence statistics in parallel or comparable corpora to construct a statistical translation model to cross the language boundary. However, collecting parallel corpora between European language and Oriental language is not an easy task due to the unique linguistics and grammar structures of oriental languages. In this paper, the text-based approach to align English/Chinese Hong Kong Police press release documents from the Web is first presented. This article then reports an algorithmic approach to generate a robust knowledge base based on statistical correlation analysis of the semantics (knowledge) embedded in the bilingual press release corpus. The research output consisted of a thesaurus-like, semantic network knowledge base, which can aid in semantics-based cross-lingual information management and retrieval
- …