Search CORE

589 research outputs found

A Semi-automatic and Low Cost Approach to Build Scalable Lemma-based Lexical Resources for Arabic Verbs

Author: Abdelali Ahmed,
Doumi Noureddine
Lehireche Ahmed
Maurel Denis
Publication venue: IJIT
Publication date: 01/02/2016
Field of study

International audienceThis work presents a method that enables Arabic NLP community to build scalable lexical resources. The proposed method is low cost and efficient in time in addition to its scalability and extendibility. The latter is reflected in the ability for the method to be incremental in both aspects, processing resources and generating lexicons. Using a corpus; firstly, tokens are drawn from the corpus and lemmatized. Secondly, finite state transducers (FSTs) are generated semi-automatically. Finally, FSTsare used to produce all possible inflected verb forms with their full morphological features. Among the algorithm’s strength is its ability to generate transducers having 184 transitions, which is very cumbersome, if manually designed. The second strength is a new inflection scheme of Arabic verbs; this increases the efficiency of FST generation algorithm. The experimentation uses a representative corpus of Modern Standard Arabic. The number of semi-automatically generated transducers is 171. The resulting open lexical resources coverage is high. Our resources cover more than 70% Arabic verbs. The built resources contain 16,855 verb lemmas and 11,080,355 fully, partially and not vocalized verbal inflected forms. All these resources are being made public and currently used as an open package in the Unitex framework available under the LGPL license

Directory of Open Access Journals

HAL Descartes

HAL Université de Tours

Hal-Diderot

APMorph: finite-state transducer for Amazigh pronominal morphology

Author: Ammari Rachid
Zenkoua Ahbib
Publication venue: 'Institute of Advanced Engineering and Science'
Publication date: 01/02/2021
Field of study

Our work aims to present an amazigh pronominal morphological analyzer (APMorph) based on xerox’s finite-state transducer (XFST). Our system revolves around a large lexicon named “APlex” including the affixed pronoun to the noun and to the verb and the characteristics relating to each lemma. A set of rules are added to define the inflectional behavior and morphosyntactic links of each entry as well as the relationship between the different lexical units. The implementation and the evaluation of our approach will be detailed within this article. The use of XFST remains a relevant choice in the sense that this platform allows both analysis and generation. The robustness of our system makes it able to be integrated in other applications of natural language processing (NLP) especially spellchecking, machine translation, and machine learning. This paper presents a continuation of our previous works on the automatic processing of Amazigh nouns and verbs

Crossref

ZENODO

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Institute of Advanced Engineering and Science

Pattern-and-root inflectional morphology: the Arabic broken plural

Author: Laporte Eric
Neme Alexis Amid
Publication venue: 'Elsevier BV'
Publication date: 01/11/2013
Field of study

International audienceنقدم نموذجًا مفصّلاً لتوصيف جموع التكسير مرتكزاً على أولوية الوزن على الجذر. ويستخلص النموذج صيغة جمع التكسير مستنداً على أحرف المفرد وصيغته. وقد تمّ تنفيذه وإختباره في ترميز 3200 مدخل معجمي. وقد أولينا اهتماماً خاصًا بإدارة الموارد اللغوية والمعاجم من اجل تسهيل عملية التوصيف لتصبح أكثر ملائمةً للخبراء في اللغّة العربيّة.يستند النموذج على المفاهيم التقليدية للوزن والجذر. وبالمقارنة مع الصرف التقليدي، فإنه يُبعد الصرف الإشتقاقي من هذا التوصيف. كما في القواميس العربية التقليدية، ويتمحور القاموس على مداخله المعجمية القابلة للتحديث، وهي إملائياً مشكولة كلياً. في نموذجنا، يتعرّف نظام التحليل الصرفي آلياً على جموع التكسير في النص مباشرةً معتمداً على قاموس أشكال مصرّفة بالكامل ودون قواعد مورفوفنولوجية أو إملائية. يعتمد تصنيف صيغ جموع التكسير مبادئ سهلة، منتظمة ومفصّلة. تم تبسيط ترميز أوزان المفرد للصوائت القصيرة (v) والطويلة (vv) دون تحديد نوعها كضمة أو فتحة، أو كسرة. تم ترميز التبدّلات المورفوفنولوجية للجذر والتغيرات الإملائية لأحرف العلة والهمزة بشكل مستقل عن صيغة الوزن و بشكل مباشر، أي دون ردّ الجذر إلى أصله ودون قواعد مورفوفونولوجية.تم تصنيف صيغ جموع التكسير تَراتُبِياً وفقاً: لوزن الجمع، فوزن المفرد، فأحرف العلّة. تقتصر صيَغ جموع التكسير الرباعية على 3 أوزان متفرعة إلى 70 صنفاً، وصِيَغ التكسير الثلاثية علـى 22 وزناً متفرعة إلى 90 صنفاً. هذه الأصناف الـ 160، تصبح 300 عندما نأخذ في الاعتبار التغيرات الإعرابية والإملائية في صيغ المفرد.We present a substantially implemented model of description of the inflectional morphology of Arabic nouns, with special attention to the management of dictionaries and other language resources by Arabic-speaking linguists. Our model includes broken plurals (BPs), i.e. plurals formed by modifying the stem. It is based on the traditional notions of root and pattern of Semitic morphology. However, as compared to traditional Arabic morphology, it keeps the formal description of inflection separate from that of derivation and semantics. As traditional Arabic dictionaries, the updatable dictionary is structured in lexical entries for lemmas, and the reference spelling is fully diacritized. In our model, morphological analysis of Arabic text is performed directly with a dictionary of words and without morphophonological rules. Our taxonomy for noun inflection is simple, orderly and detailed. We simplify the taxonomy of singular patterns by specifying vowel quantity as v or vv, and ignoring vowel quality. Root alternations and orthographical variations are encoded independently from patterns and in a factual way, without deep roots or morphophonological or orthographical rules. Nouns with a triliteral BP are classified according to 22 patterns subdivided into 90 classes, and nouns with a quadriliteral BP according to 3 patterns subdivided into 70 classes. These 160 classes become 300 inflectional classes when we take into account inflectional variations that affect only the singular. We provide a straightforward encoding scheme that we applied to 3 200 entries of BP nouns

HAL Descartes

Hal-Diderot

HAL-Ecole des Ponts ParisTech

HAL - UPEC / UPEM

A Semi-Automatic and Low Cost Approach to Build Scalable Lemma-based Lexical Resources for Arabic Verbs

Author: Ahmed Abdelali
Ahmed Lehireche
Denis Maurel
Noureddine Doumi
null null
Publication venue: 'MECS Publisher'
Publication date: 01/01/2016
Field of study

Crossref