21 research outputs found

    POS tagging in Amazigh using support vector machines and conditional random fields

    Full text link
    The aim of this paper is to present the first Amazighe POS tagger. Very few linguistic resources have been developed so far for Amazighe and we believe that the development of a POS tagger tool is the first step needed for automatic text processing. The used data have been manually collected and annotated. We have used state-of-art supervised machine learning approaches to build our POS-tagging models. The obtained accuracy achieved 92.58% and we have used the 10-fold technique to further validate our results. © Springer-Verlag Berlin Heidelberg 2011We would like to thank all IRCAM researchers for their valuable assistance. The work of the third author was funded by the MICINN research project TEXT-ENTERPRISE 2.0 TIN2009-13391-C04-03 (Plan I+D+i).Outahajala, M.; Benajiba, Y.; Rosso, P.; Zenkouar, L. (2011). POS tagging in Amazigh using support vector machines and conditional random fields. En Natural Language Processing and Information Systems. Springer Verlag (Germany). 6716:238-241. https://doi.org/10.1007/978-3-642-22327-3_28S238241671

    L'étiquetage grammatical de l'amazighe en utilisant les propriétés n-grammes et un prétraitement de segmentation

    Full text link
    [FR] L’objectif de cet article est de présenter le premier étiqueteur grammatical amazighe. Très peu de ressources ont été développées pour l’amazighe et nous croyons que le développement d’un outil d’étiquetage grammatical est une étape préalable au traitement automatique de textes. Afin d'atteindre cet objectif, nous avons formé deux modèles de classification de séquences en utilisant les SVMs, séparateurs à vaste marge (Support Vector Machines) et les CRFs, champs markoviens conditionnels (Conditional Random Fields) en utilisant une phase de segmentation. Nous avons utilisé la technique de 10 fois la validation croisée pour évaluer notre approche. Les résultats montrent que les performances des SVMs et des CRFs sont très comparables. Dans l'ensemble, les SVMs ont légèrement dépassé les CRFs au niveau des échantillons (92,58% contre 92,14%) et la moyenne de précision des CRFs dépasse celle des SVMs (89,48% contre 89,29%). Ces résultats sont très prometteurs étant donné que nous avons utilisé un corpus de seulement ~ 20k mots.[EN] The aim of this paper is to present the first amazigh POS tagger. Very few linguistic resources have been developed so far for amazigh and we believe that the development of a POS tagger tool is the first step needed for automatic text processing. In order to achieve this endeavor, we have trained two sequence classification models using Support Vector Machines (SVMs) and Conditional Random Fields (CRFs) after using a tokenization step. We have used the 10- fold technique to evaluate our approach. Results show that the performance of SVMs and CRFs are very comparable. Across the board, SVMs outperformed CRFs on the fold level (92.58% vs. 92.14%) and CRFs outperformed SVMs on the 10 folds average level (89.48% vs. 89.29%). These results are very promising considering that we have used a corpus of only ~20k tokens.Les travaux du troisième auteur ont été financés par le projet de recherche EU FP7 Marie Curie PEOPLE-IRSES 269180 WiQ-Ei, MICINN TEXT-ENTERPRISE 2.0 TIN2009-13391-C04-03 (Plan I+D+i), VLC/CAMPUS Microcluster on Multimodal Interaction in Intelligent Systems.Outahajala, M.; Benajiba, Y.; Rosso, P.; Zenkouar, L. (2012). L'étiquetage grammatical de l'amazighe en utilisant les propriétés n-grammes et un prétraitement de segmentation. E-TI : la revue électronique des technologies de l'information. 6:48-61. http://hdl.handle.net/10251/47570S4861

    Protozoaires et Métazoaires parasites de l'anguille Anguilla anguilla : Interdépendance des populations parasitaires et coexistence des pathogènes potentiels

    No full text
    International audienceA monthly sampling of eels in the mixohalin lagoon of Mauguio (Languedoc, France) allowed the analysis of Anguillicola crassus (Nemathelmintha), Myxidium giardi (Myxozoa) and Eimeria anguillae (Sporozoa) population structures. This study showed that the high intensities of A. crassus, a parasite recently introduced in this area, seem to induce the development of both protists. However, any negative effect of these parasites of Anguilla anguilla can be detected by the study of the host size/weight relation

    Clinical Study Efficacy of Multiple Micronutrients Fortified Milk Consumption on Iron Nutritional Status in Moroccan Schoolchildren

    Get PDF
    Iron deficiency constitutes a major public health problem in Morocco, mainly among women and children. The aim of our paper is to assess the efficacy of consumption of multiple micronutrients (MMN) fortified milk on iron status of Moroccan schoolchildren living in rural region. Children ( = 195), aged 7 to 9 y, were recruited from schools and divided into two groups: the nonfortified group (NFG) received daily a nonfortified Ultra-High-Temperature (UHT) milk and the fortified group received (FG) daily UHT milk fortified with multiple micronutrients including iron sulfate. Blood samples were collected at baseline (T0) and after 9 months (T9). Hemoglobin (Hb) was measured in situ by Hemocue device; ferritin and C Reactive Protein were assessed in serum using ELISA and nephelometry techniques, respectively. Results were considered significant when the value was <0.05. At T9 FG showed a reduction of iron deficiency from 50.9% to 37.2% ( = 0.037). Despite the low prevalence of iron deficiency anemia (1.9%); more than 50% of children in our sample suffered from iron deficiency at baseline. The consumption of fortified milk reduced the prevalence of iron deficiency by 27% in schoolchildren living in high altitude rural region of Morocco. Clinical Trial Registration. Our study is registered in the Pan African Clinical Trial Registry with the identification number PACTR201410000896410

    Utilisation des CACs et des Ressources Externes pour l Amélioration des Performances de l Étiquetage Morphosyntaxique

    Full text link
    [FR] La langue amazighe, comme la plupart des langues de moindre diffusion, souffre encore de la pénurie d'outils et des ressources pour son traitement automatique en particulier les corpus annotés. Ces derniers sont plus difficiles à construire que les corpus bruts qui à leur tour nécessitent des prétraitements dans la majorité des cas. L’objectif de cet article est de présenter une approche basée sur l’apprentissage semisupervisé visant l’utilisation d’un corpus de textes brutes, sélectionnées sur la base de la mesure de confiance des Champs Aléatoires Conditionnels(CACs), conjointement avec un corpus annoté manuellement de 20k morphèmes. Les résultats des expérimentations préliminaires montrent une réduction du taux d’erreur de l’étiqueteur morphosyntaxique de 1,3%. Aussi la réduction du taux d’erreur est de 5,9%, entre 60% et 90% du corpus, lorsque le modèle est entrainé par les phrases du corpus brut annotées automatiquement.[EN] Amazigh language, and like most of the languages which have only recently started being investigated for the Natural Language Processing (NLP) tasks, lacks annotated corpora and tools and still suffers from the scarcity of linguistic tools and resources and especially annotated corpora. Creating labeled data is a hard task. However, obtaining unlabeled data, although needing most time preprocessing for languages with scarce resources, is less difficult. The aim of this paper is to present a semi-supervised based approach using labeled and unlabeled data. Preliminary results show an error reduction of 1,3%, when training our POS tagger with Conditional Random Fields(CRFs), with chosen automatically annotated texts and a small manually annotated corpus of about 20k tokens. Also, when trained with automatically annotated data, the achieved improvement between 60% and 90% of the trained data is 5.9%.Le premier auteur exprime sa gratitude à la CODESRIA. Les travaux du quatrième auteur ont été financés dans le cadre des projets de recherche: VLC/CAMPUS Microcluster on Multimodal Interaction in Intelligent Systems, la commission européenne WIQ-EI IRSES (no. 269180) et DIANAAPPLICATIONS(TIN2012-38603-C02-01).Outahajala, M.; Benajiba, Y.; Rosso, P. (2014). Utilisation des CACs et des Ressources Externes pour l Amélioration des Performances de l Étiquetage Morphosyntaxique. Asinag. (9):91-104. http://hdl.handle.net/10251/61654S91104

    A Nonlinear Support Vector Machine Analysis Using Kernel Functions for Nature and Medicine

    No full text
    After the emergence of Artificial Intelligence (AI), great developments have taken place in the fields of science, economics, medicine and all other fields that use computer science. Along with the resulting developments in these fields, artificial intelligence has also solved many intractable problems, such as predicting specific serious diseases, determining future product sales, as well as analyzing and studying big data in the shortest possible time … SVM is one of the most important technologies in this field of artificial intelligence that goes into supervised methods, and which every machine learning expert should have in his/her arena. For this reason, in this article, we studied this technique and determined its advantages and disadvantages as well as its fields of application. Next, we applied this technique to three different databases, using four basis change functions, and we compared the results obtained to determine the best way to use the basis change functions

    Using confidence and informativeness criteria to improve POS-tagging in amazigh

    Full text link
    Amazigh is used by tens of millions of people mainly for oral communication. However, and like all the newly investigated languages in natural language processing, it is resource-scarce. The main aim of this paper is to present our POS taggers results based on two state of the art sequence labeling techniques, namely Conditional Random Fields and Support Vector Machines, by making use of a small manually annotated corpus of only 20k tokens. Since creating labeled data is very time-consuming task while obtaining unlabeled data is less so, we have decided to gather a set of unlabeled data of Amazigh language that we have preprocessed and tokenized. The paper is also meant to address using semi-supervised techniques to improve POS tagging accuracy. An adapted self training algorithm, combining confidence measure with a function of Out Of Vocabulary words to select data for self training, has been used. Using this language independent method, we have managed to obtain encouraging results.The first author wants to grant CODESRIA. The work of the third author was carried out in the framework of the VLC/CAMPUS Microcluster on Multimodal Interaction in Intelligent Systems and the European Commission WIQ-EI IRSES (no. 269180) and DIANA-APPLICATIONS-Finding Hidden Knowledge in Texts: Applications(TIN2012-38603- C02-01) research projects.Outahajala, M.; Benajiba, Y.; Rosso, P.; Zenkouar, L. (2015). Using confidence and informativeness criteria to improve POS-tagging in amazigh. Journal of Intelligent and Fuzzy Systems. 28(3):1319-1330. doi:10.3233/IFS-141417S1319133028

    Protozoaires et Métazoaires parasites de l'anguille Anguilla anguilla, L. 1758 : Structures temporelles de leurs populations dans une lagune méditerranéenne

    No full text
    International audienceThree parasite species, Anguillicola crassus (Nematoda), Myxidium giardi (Myxozoa) and Eimeria anguillae (Sporozoa) were recorded in the eel, Anguilla anguilla from the Mediterranean Lagoon of Mauguio (Southern France). A. crassus is found in the swim bladder, M. giardi in the kidney and E. anguillae in the intestinal cells of this bony fish. A comparative study of the temporal variations in the population structures of these three endoparasites has been undertaken. This study reveals : a) the regular presence of these endoparasites throughout the year in the eel, and the relative stability of intensities; b) the changes in prevalence are rather cyclic in A. crassus; they increase during spring and autumn. This biannual prevalence change could be related to its biological cycle and this later related to the planktonic blooms in these lagoons; c) the populations of M. giardi increase slightly during spring and autumn. Their periodicity however is less evident; d) E. anguillae is a regular species but does not present any evident cyclic features. The ecopathological consequences of these demographic characteristics are discussed

    Establishment of the hematology reference intervals in a healthy population of adults in the Northwest of Morocco (Tangier-Tetouan region)

    Get PDF
    Introduction: Among the most useful biological examinations in common medical practice, blood count is the most prescribed. The reference intervals of the hematological parameters of this examination are of major importance for clinical orientations and therapeutic decisions. In Morocco, the reference values used by the laboratories of medical biology and used by doctors are ones collected from Caucasian and European individuals. These values could be different in the Moroccan population. Besides, reference intervals of the blood count specific to the various Moroccan regions are missing. We decided to determine the reference intervals from a population of healthy adults of the Tangier-Tetouan region by following the procedures recommended by the IFCC-CLSI guidelines in 2008 and comparing them to those of the literature. Methods: Blood samples were taken from 15840 adult volunteers (8402 men from 18 to 55 years old and 7438 women from 18 to 50 years old) from the regional transfusion center of Tangier and Tetouan during a period between November 2014 and May 2016. The complete blood count was measured by the Sysmex KX21N® analyzer. For each sample a systematic blood smear was done to determine the leukocyte differential. The data analysis was made by the software SPSS 20.0 by using percentiles 2.5th and 97.5th. Results: A significant difference between both sexes was noted (p<0,001) for all the hematological parameters (red blood cells, hematocrit, hemoglobin, mean corpuscular volume, mean corpuscular hemoglobin, mean corpuscular hemoglobin concentration, leukocytes, neutrophils, basophils, eosinophils, monocytes, platelets and mean platelet volume) except for the numeration of lymphocytes (p = 0.552). The values of this study were compared with those reported in Arabic, Caucasian and African populations. Said comparisons showed the existence of significant differences. Conclusion: This study tries to accentuate the necessity of proceeding with the establishment of reference intervals specific to the blood count of the Moroccan population to avoid errors of diagnosis, allow clinicians to interpret with greater specificity the hematological examinations and to improve the quality of medical care distributed to patients
    corecore