17 research outputs found

    AmAMorph: Finite State Morphological Analyzer for Amazighe

    Get PDF
    This paper presents AmAMorph, a morphological analyzer for Amazighe language using a system based on the NooJ linguistic development environment. The paper begins with the development of Amazighe lexicons with large coverage formalization. The built electronic lexicons, named ‘NAmLex’, ‘VAmLex’ and ‘PAmLex’ which stand for ‘Noun Amazighe Lexicon’, ‘Verb Amazighe Lexicon’ and ‘Particles Amazighe Lexicon’, link inflectional, morphological, and syntacticsemantic information to the list of lemmas. Automated inflectional and derivational routines are applied to each lemma producing over inflected forms. To our knowledge,AmAMorph is the first morphological analyzer for Amazighe. It identifies the component morphemes of the forms using large coverage morphological grammars. Along with the description of how the analyzer is implemented, this paper gives an evaluation of the analyzer

    Development of part-of-speech tagger for Xhosa

    Get PDF
    Part-of-Speech (POS) tagging is a process of assigning an appropriate part of speech or lexical category to each word in a given sentence of a particular natural language. Natural languages are languages that human beings use to communicate with one another be it Xhosa, Zulu, English etc. POS tagging plays a huge and important role in natural language processing applications. The main applications of POS tagging include machine translation, parsing, text chunking, spell checkiXhosa (sometimes referred to as isiXhosa) is one of the eleven official languages of South Africa and is spoken by over 8 million South Africans. The language is mainly spoken in the Eastern Cape and Western Cape provinces of the country. It is the second most widely spoken native language in South Africa after Zulu (sometimes called isiZulu). Although the number of speakers might seem to be high, Xhosa is considerably under-resourced. There are very few publications in Xhosa, very few books have been published in the language and also the domains that use the language as a medium of instruction are very limited. However, the language is finding momentum nowadays. An Oxford approved Xhosa dictionary has been developed recently, and Xhosa newspapers that did not exist in the recent past are now published. Text from previously mentioned sources can then be combined to formulate a larger text that can be used to train the tagger. This work aims to develop an effective POS tagger for Xhosa. g and grammar. This thesis presents/describes the work that needed to be done to produce an automatic POS tagger for Xhosa. A tagset consisting of 36 POS tags/labels for the language were used for this purpose. These are listed. A total of 5000 words were manually tagged/labelled for the purpose of training the tagger. Another 3000 words were used for testing the tagger and these were disjoint from the manually tagged training data. The open source Stanford CoreNLP toolkit was used to create the tagger. The toolkit implements a Maximum Entropy machine learning model which was applied in the development of the tagger presented in this thesis. The thesis describes the implementation and testing processes of the model in detail. The results show that the development of the Xhosa POS tagging model was successful. This model managed to obtain a tagging accuracy of 87.71 percent

    Diversité linguistique, langues menacées et développement durable

    Get PDF
    262 p.The UNESCO Chair on World Language Heritage at the University of the Basque Country (UPV/EHU) has carried out different linguistic cooperation activities which have shown that the effort made, particularly by the local community itself, always has a positive effect. Traditionally, we have had the right to be educated in “vernacular language”. This was the name given at that time to native or indigenous languages, which were generally unofficial languages. For the majority of these languages, this right has not been upheld. This year, 2019, the year of indigenous languages, many linguistic communities are decrying abandonment, apathy and a lack of real commitment to achieving the goals that the so-called bilingual schools are formally pursuing. It is an honour for us to be able to include contributions from the specialists we have called upon, who have generously responded with their original insights for this publication. This will help it to continue to stimulate development among the most vulnerable groups in society and particularly among communities of speakers of indigenous languages. The first half of the book contains more general contributions about linguistic diversity and its challenges as it relates to sustainable development. It includes theorising and reflections that, even if they take inspiration from knowledge and analysis of specific situations, provide information and suggestions for action to take which are relevant and applicable to any situation where there is a decline or a threat to linguistic diversity. The second half of the book incorporates contributions that could also be classified as general. However, they relate more specifically to certain languages and places which are under study and proposals specifically designed for those contexts are put forward. We believe its specificity to be its most valuable asset.Esta edición se ha realizado gracias a la financiación del Gobierno Vasco /Eusko Jaurlaritza, Departamento de Cultura y Política Lingüística y de la Fundación AZKUE. The edition of this volume has been made possible through financial Support of the Basque Government, Culture and Linguistic Policy Department and the Foundation AZKU

    Coreference Resolution for Arabic

    Get PDF
    Recently, there has been enormous progress in coreference resolution. These recent developments were applied to Chinese, English and other languages, with outstanding results. However, languages with a rich morphology or fewer resources, such as Arabic, have not received as much attention. In fact, when this PhD work started there was no neural coreference resolver for Arabic, and we were not aware of any learning-based coreference resolver for Arabic since [Björkelund and Kuhn, 2014]. In addition, as far as we know, whereas lots of attention had been devoted to the phemomenon of zero anaphora in languages such as Chinese or Japanese, no neural model for Arabic zero-pronoun anaphora had been developed. In this thesis, we report on a series of experiments on Arabic coreference resolution in general and on zero anaphora in particular. We propose a new neural coreference resolver for Arabic, and we present a series of models for identifying and resolving Arabic zero pronouns. Our approach for zero-pronoun identification and resolution is applicable to other languages, and was also evaluated on Chinese, with results surpassing the state of the art at the time. This research also involved producing revised versions of standard datasets for Arabic coreference

    English and the Languages of Algeria: Suggestions towards a New Language Policy

    Get PDF
    Algeria has always been a multilingual country, due to its rich history of being colonized for centuries by different colonizers from the Romans to the Phoenicians to the French and many others. However, the French were the predominating colonizer in Algeria, since they adopted a nationalizing process to impose their language on the people. That has greatly influenced the spoken language in the country, and French was established for a long time as an official language in Algeria. The linguistic situation in Algeria has become quite complex, due to the mentioned reasons. As a result, many languages are spoken and coexist in the country. Standard Arabic, Algerian Arabic or as it is called “Darja”, Berber which is the language of the indigenous people of Algeria. It is also called Tamazight and is still spoken in many areas in the country. French, which coexists with all the spoken Algerian dialects, is heavily present in the Algerian territory and plays a significant role in the political, social, and educational sectors. The global spread of English around the world also reached Algeria, and an increasing interest in the language has been noticed during the recent years in the country. In this respect, the present study pursues the goal of defining the status of English while examining the linguistic situation of the country. Hence, the aim of the present research is to explore the sociolinguistic profile of Algeria and highlight the role of each of the existent languages in the main domains in the country. What is more, the study investigates the emergence of English as a competing language and explores the potential consequences it can have on the other languages and on the Algerians in general. The present research consists of seven chapters. The data were collected online using main Social Media platforms, in addition to a collection of photographs of the linguistic landscape of the country. (shops, street names, buildings ….). Both quantitative and qualitative approaches were selected to conduct the research, with a total of 494 survey respondents and 10 interviewees, and more than 100 photographs. The tools for research were a questionnaire of 27 questions and a concise questionnaire for the semi-structured interview. Photographs highlighting the visibility of English in shops and in the streets of Algeria were gathered to add information on the presence of the language in the country. The results of the study indicate the intensity of the linguistic situation in Algeria and the continuous conflicts between the different language groups. The research explores the consequences of this conflicted situation and the effects of the misused language policies on the Algerian individual and on society. Further analysis reveals the increasing rate of the use of English, the results exhibit a noticeable development and a growing interest in the language especially among the younger generation. The qualitative data acquired from the photographs and the interviews extended this view and provided a more insightful view on the role of the languages on the Algerians and the society and the meaning of the potential spread of the English language in Algeria. In conclusion, the research makes a few recommendations to policymakers and educationalists on how to moderate the linguistic atmosphere in Algeria by applying the needed reforms in key domains like education, and highlight the important role of well-planned language policies in improving the educational system and contributing to the development of the country
    corecore