Search CORE

2,559 research outputs found

An automatically built named entity lexicon for Arabic

Author: Attia Mohammed
Monachini Monica
Toral Antonio
Tounsi Lamia
van Genabith Josef
Publication venue: European Language Resources Association
Publication date: 01/01/2010
Field of study

We have successfully adapted and extended the automatic Multilingual, Interoperable Named Entity Lexicon approach to Arabic, using Arabic WordNet (AWN) and Arabic Wikipedia (AWK). First, we extract AWN’s instantiable nouns and identify the corresponding categories and hyponym subcategories in AWK. Then, we exploit Wikipedia inter-lingual links to locate correspondences between articles in ten different languages in order to identify Named Entities (NEs). We apply keyword search on AWK abstracts to provide for Arabic articles that do not have a correspondence in any of the other languages. In addition, we perform a post-processing step to fetch further NEs from AWK not reachable through AWN. Finally, we investigate diacritization using matching with geonames databases, MADA-TOKAN tools and different heuristics for restoring vowel marks of Arabic NEs. Using this methodology, we have extracted approximately 45,000 Arabic NEs and built, to the best of our knowledge, the largest, most mature and well-structured Arabic NE lexical resource to date. We have stored and organised this lexicon following the Lexical Markup Framework (LMF) ISO standard. We conduct a quantitative and qualitative evaluation of the lexicon against a manually annotated gold standard and achieve precision scores from 95.83% (with 66.13% recall) to 99.31% (with 61.45% recall) according to different values of a threshold

CiteSeerX

Irish Universities

DCU Online Research Access Service

An analysis of machine translation errors on the effectiveness of an Arabic-English QA system

Author: Al-Maskari A.
Sanderson M.
Publication venue
Publication date: 01/01/2006
Field of study

The aim of this paper is to investigate how much the effectiveness of a Question Answering (QA) system was affected by the performance of Machine Translation (MT) based question translation. Nearly 200 questions were selected from TREC QA tracks and ran through a question answering system. It was able to answer 42.6% of the questions correctly in a monolingual run. These questions were then translated manually from English into Arabic and back into English using an MT system, and then re-applied to the QA system. The system was able to answer 10.2% of the translated questions. An analysis of what sort of translation error affected which questions was conducted, concluding that factoid type questions are less prone to translation error than others

White Rose Research Online

An application of distributional semantics for the analysis of the Holy Quran

Author: Benotto Giulia
Giovannetti Emiliano
NAHLI OUAFAE
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2016
Field of study

In this contribution we illustrate the methodology and the results of an experiment we conducted by applying Distributional Semantics Models to the analysis of the Holy Quran. Our aim was to gather information on the potential differences in meanings that the same words might take on when used in Modern Standard Arabic w.r.t. their usage in the Quran. To do so we used the Penn Arabic Treebank as a contrastive corpu

Archivio della ricerca- Università di Roma La Sapienza

Arabic-English Text Translation Leveraging Hybrid NER

Author: Hkiri Emna
Mallat Souheyl
Zrigui Mounir
Publication venue: the National University (Philippines)
Publication date: 01/01/2017
Field of study

Waseda University Repository

Transliteration Feasibility as a Means of Communication between Arab Expatriates and Their Progeny Abroad

Author: Al Jumaily Samir
Publication venue: 'Scholink Co, Ltd.'
Publication date: 05/07/2019
Field of study

This study is dedicated to exploring and understanding the role of transliteration as a means of remote communication between Arab expatriates and their children in foreign countries. Expatriates need this type of writing communication to communicate with their children for several reasons; especially when the vocal communication is impossible or not available due to technical problems or family and individual privacy. The study tries also to figure out the difference between transliteration and electronic chatting on one hand, and transliteration, translation and creative translation on the other hand. The study is mainly based on a questionnaire professionally and objectively designed and forwarded to a number of Arab expats living in The Netherlands in order to verify the hypothesis made related to the feasibility of transliteration technique and how far it is useful and practical as a remote means of communication between the Arab expatriates and their children due to the lack of proficiency of one of the parties in using the writing system of either language. The study highlights the importance of voice transmission in clarifying the correct pronunciation of words and phrases in a way that is accessible to all around the world beyond being obliged to know the characters of the language from which they are taken. Furthermore, the researcher has scrutinized, studied and analyzed the participants’ answers and consequently described them relying on objective and scientific criteria

Scholink Journals

Hybrid Approach to English-Hindi Name Entity Transliteration

Author: Mathur Shruti
Saxena Varun Prakash
Publication venue
Publication date: 28/03/2014
Field of study

Machine translation (MT) research in Indian languages is still in its infancy. Not much work has been done in proper transliteration of name entities in this domain. In this paper we address this issue. We have used English-Hindi language pair for our experiments and have used a hybrid approach. At first we have processed English words using a rule based approach which extracts individual phonemes from the words and then we have applied statistical approach which converts the English into its equivalent Hindi phoneme and in turn the corresponding Hindi word. Through this approach we have attained 83.40% accuracy.Comment: Proceedings of IEEE Students' Conference on Electrical, Electronics and Computer Sciences 201

arXiv.org e-Print Archive

Crossref