Search CORE

20 research outputs found

ANETAC: Arabic named entity transliteration and classification dataset

Author: Guessoum A
Hadj Ameur MS
Meziane F
Publication venue: 'Center for Open Science'
Publication date: 06/07/2019
Field of study

In this paper, we make freely accessible ANETAC, our English-Arabic named entity transliteration and classification dataset that we built from freely available parallel translation corpora. The dataset contains 79, 924 instances, each instance is a triplet (e, a, c), where e is the English named entity, a is its Arabic transliteration and c is its class that can be either a Person, a Location, or an Organization. The ANETAC dataset is mainly aimed for the researchers that are working on Arabic named entity transliteration, but it can also be used for named entity classification purposes. This dataset was developed and used as part of a previous research study done by Hadj Ameur et al. [1]

arXiv.org e-Print Archive

University of Salford Institutional Repository

Generating Paired Transliterated-cognates Using Multiple Pronunciation Characteristics from Web corpora

Author: Kuo Jin-Shea
Yang Ying-Kuei
Publication venue: Logico-Linguistic Society of Japan
Publication date: 16/11/2005
Field of study

A novel approach to automatically extracting paired transliterated-cognates from Web corpora is proposed in this paper. One of the most important issues addressed is that of taking multiple pronunciation characteristics into account. Terms from various languages may pronounce very differently. Incorporating the knowledge of word origin may improve the pronunciation accuracy of terms. The accuracy of generated phonetic information has an important impact on term transliteration and hence transliterated-term extraction. Transliterated-term extraction is a fundamental task in natural language processing to extract paired transliterated-terms in studying term transliteration. An experiment on transliterated-term extraction from two kinds of Web resources, Web pages and anchored texts, has been conducted and evaluated. The experimental results show that many transliterated-term pairs, which cannot be extracted using the approach only exploiting English pronunciation characteristics, have been successfully extracted using the proposed approach in this paper. By taking multiple language-specific pronunciation transformations into account may further improve the output of the transliterated-term extraction

Waseda University Repository

Incorporating Pronunciation Variation into Different Strategies of Term Transliteration

Author: Kuo Jin-Shea
Yang Ying-Kuei
Publication venue: Logico-Linguistic Society of Japan
Publication date: 16/11/2005
Field of study

Term transliteration addresses the problem of converting terms in one language into their phonetic equivalents in the other language via spoken form. It is especially concerned with proper nouns, such as personal names, place names and organization names. Pronunciation variation refers to pronunciation ambiguity frequently encountered in spoken language, which has a serious impact on term transliteration. More than one transliteration variants can be generated by an out-of-vocabulary term due to different kinds of pronunciation variations. It is important to take this issue into account when dealing with term transliteration. Several models, which take pronunciation variation into consideration, are proposed for term transliteration in this paper. They describe transliteration from various viewpoints and utilize the relationships trained from extracted transliterated-term pairs. An experiment in applying the proposed models to term transliteration was conducted and evaluated. The experimental results show promise. These proposed models are not only applicable to term transliteration, but also are helpful in indexing and retrieving spoken document retrieval

Waseda University Repository

Arabic machine transliteration using an attention-based encoder-decoder model

Author: Arbabi
Bengio
Brown
Deselaers
Finkel
Fujii
Goller
Habash
Hermjakob
Hochreiter
Jiang
Koehn
Koehn
Och
Och
Schuster
Sutskever
Virga
Williams
Zens
Publication venue: 'Elsevier BV'
Publication date: 01/01/2017
Field of study

Transliteration is the process of converting words from a given source language alphabet to a target language alphabet, in a way that best preserves the phonetic and orthographic aspects of the transliterated words. Even though an important effort has been made towards improving this process for many languages such as English, French and Chinese, little research work has been accomplished with regard to the Arabic language. In this work, an attention-based encoder-decoder system is proposed for the task of Machine Transliteration between the Arabic and English languages. Our experiments proved the efficiency of our proposal approach in comparison to some previous research developed in this area

University of Salford Institutional Repository

Crossref

UDORA - University of Derby Online Research Archive

Arabic machine transliteration using an attention-based encoder-decoder model

Author: Deselaers
Hermjakob
Habash
Virga
Fujii
Jiang
Arbabi
Sutskever
Williams
Bengio
Goller
Hochreiter
Schuster
Finkel
Brown
Zens
Och
Koehn
Och
Koehn
Publication venue: 'Elsevier BV'
Publication date: 01/01/2017
Field of study

University of Salford Institutional Repository

Crossref

FigShare

Are You Finding the Right Person? A Name Translation System Towards Web 2.0

Author: Zhou Yilu
Publication venue: AIS Electronic Library (AISeL)
Publication date: 01/01/2009
Field of study

In a multilingual world, information available in global information systems is increasing rapidly. Searching for proper names in foreign language becomes an important task in multilingual search and knowledge discovery. However, these names are the most difficult to handle because they are often unknown words that cannot be found in a translation dictionary and even human experts cannot handle the variation generated during translation. Furthermore, existing research on name translation have focused on translation algorithms. However, user experience during name translation and name search are often ignored. With the Web technology moving towards Web 2.0, creating a platform that allow easier distributed collaboration and information sharing, we seek methods to incorporate Web 2.0 technologies into a name translation system. In this research, we review challenges in name translation and propose an interactive name translation and search system: NameTran. This system takes English names and translates them into Chinese using a combined hybrid Hidden Markov Model-based (HMM-based) transliteration approach and a web mining approach. Evaluation results showed that web mining consistently boosted the performance of a pure HMM approach. Our system achieved top-1 accuracy of 0.64 and top-8 accuracy of 0.96. To cope with changing popularity and variation in name translations, we demonstrated the feasibility of allowing users to rank translations and the new ranking serves as feedback to the original trained HMM model. We believe that such user input will significantly improve system usability

AIS Electronic Library (AISeL)