Search CORE

5 research outputs found

Text Categorization Can Enhance Domain-Agnostic Stopword Extraction

Author: Aouicha Mohamed Ben
Awokoya Ayodele
Emezue Chris Chinenye
Etori Naome A.
Lawan Falalu Ibrahim
Nixdorf Doreen
Omotayo Abdul-Hakeem
Taieb Mohamed Ali Hadj
Turki Houcemeddine
Publication venue
Publication date: 24/01/2024
Field of study

This paper investigates the role of text categorization in streamlining stopword extraction in natural language processing (NLP), specifically focusing on nine African languages alongside French. By leveraging the MasakhaNEWS, African Stopwords Project, and MasakhaPOS datasets, our findings emphasize that text categorization effectively identifies domain-agnostic stopwords with over 80% detection success rate for most examined languages. Nevertheless, linguistic variances result in lower detection rates for certain languages. Interestingly, we find that while over 40% of stopwords are common across news categories, less than 15% are unique to a single category. Uncommon stopwords add depth to text but their classification as stopwords depends on context. Therefore combining statistical and linguistic approaches creates comprehensive stopword lists, highlighting the value of our hybrid method. This research enhances NLP for African languages and underscores the importance of text categorization in stopword extraction.Comment: A Project Report for the Masakhane Research Communit

arXiv.org e-Print Archive

MasakhaNEWS: News Topic Classification for African languages

Author: Ababu Teshome Mulugeta
Abdulganiyu Habiba
Abdulmumin Idris
Adeeko Adetola
Adelani David Ifeoluwa
Adelani Tolulope
Afolabi Abeeb
Ajayi Tunde
al-azzawi sana
Alabi Jesujoba
Aremu Anuoluwapo
Awosan Oyinkansola
Awoyomi Oluwabusayo
Azime Israel Abebe
Chukwuneke Chiamaka
David Davis
Diko Thina
Dossou Bonaventure F. P.
Emezue Chris Chinenye
Fanijo Samuel
Gwadabe Tajuddeen
Hassan Fuad Mire
Johar Abdulmejid
Jules Jules
Kebede Tadesse
Kimanuka Ussen
Kimotho Wangari
Masiak Marek
Mbonu Chinedu
Mehamed Moges Ahmed
Mohamed Muhidin
Mohamed Shafie
Moteu Tatiana
Muhammad Shamsuddeen Hassan
Mukiibi Jonathan
Mwase Christine
Ndolela Lolwethu
Ngabire Evrard
Nigusse Sinodos
Nixdorf Doreen
Nxakama Siyanda
Nyatsine Pamela
Obiefuna Nnaemeka
Odhiambo Brian
Oduwole Mardiyyah
Ogbu Onyekachi
Ogundepo Odunayo
Ojo Jessica
Oladipo Akintunde
Omotayo Abdul-Hakeem
Owodunni Abraham
Sakayo Toadoum Sari
Salahudeen Saheed Abdullahi
Samuel Olanrewaju
Shode Iyanuoluwa
Sibanda Blessing
Sidume Freedmore
Siro Clemencia
Ssenkungu Ivan
Stenetorp Pontus
Taye Mahlet
Tonja Atnafu Lambebo
Tshinu Tshinu
Yigezu Mesay Gemeda
Yousuf Oreen
Publication venue
Publication date: 20/09/2023
Field of study

African languages are severely under-represented in NLP research due to lack of datasets covering several NLP tasks. While there are individual language specific datasets that are being expanded to different tasks, only a handful of NLP tasks (e.g. named entity recognition and machine translation) have standardized benchmark datasets covering several geographical and typologically-diverse African languages. In this paper, we develop MasakhaNEWS -- a new benchmark dataset for news topic classification covering 16 languages widely spoken in Africa. We provide an evaluation of baseline models by training classical machine learning models and fine-tuning several language models. Furthermore, we explore several alternatives to full fine-tuning of language models that are better suited for zero-shot and few-shot learning such as cross-lingual parameter-efficient fine-tuning (like MAD-X), pattern exploiting training (PET), prompting language models (like ChatGPT), and prompt-free sentence transformer fine-tuning (SetFit and Cohere Embedding API). Our evaluation in zero-shot setting shows the potential of prompting ChatGPT for news topic classification in low-resource African languages, achieving an average performance of 70 F1 points without leveraging additional supervision like MAD-X. In few-shot setting, we show that with as little as 10 examples per label, we achieved more than 90\% (i.e. 86.0 F1 points) of the performance of full supervised training (92.6 F1 points) leveraging the PET approach.Comment: Accepted to IJCNLP-AACL 2023 (main conference

arXiv.org e-Print Archive

MasakhaNEWS:News Topic Classification for African languages

Author: Abdullahi Saheed Salahudeen
Abdulmumin Idris
Abeeb Afolabi
Adeeko Adetola
Adelani David Ifeoluwa
Adelani Tolulope Anu
Ajayi Tunde Oluwaseyi
al-azzawi Sana Sabah
Alabi Jesujoba Oluwadara
Aremu Anuoluwapo
Awosan Oyinkansola F.
Awoyomi Oluwabusayo Olufunke
Azime Israel Abebe
Bame Mahlet Taye
Chukwuneke Chiamaka I.
David Davis
Diko Thina
Dossou Bonaventure F. P.
Emezue Chris Chinenye
Fanijo Samuel
Gebre Sinodos
Guge Tadesse Kebede
Gwadabe Tajuddeen
Hassan Fuad Mire
Johar Abdulmejid Tuni
Kailani Habiba Abdulganiy
Kimanuka Ussen
Kimotho Wangari
Masiak Marek
Mbonu Chinedu E.
Mehamed Moges Ahmed
Mohamed Muhidin
Mohamed Shafie Abdi
Muhammad Shamsuddeen Hassan
Mukiibi Jonathan
Mwase Christine
Ndolela Lolwethu
Ngabire Evrard
Ngoli Tatiana Moteu
Nixdorf Doreen
Nxakama Siyanda
Nyatsine Pamela
Obiefuna Nnaemeka C.
Odhiambo Brian
Oduwole Mardiyyah
Ogbu Onyekachi Raphael
Ogundepo Odunayo
Ojo Jessica
Oladipo Akintunde
Omotayo Abdul-Hakeem
Owodunni Abraham Toluwase
Samuel Olanrewaju
Sari Sakayo Toadoum
Shode Iyanuoluwa
Sibanda Blessing K.
Sidume Freedmore
Siro Clemencia
Stenetorp Pontus
Tonja Atnafu Lambebo
Tshinu Kanda Patrick
Yigezu Mesay Gemeda
Yousuf Oreen
Publication venue
Publication date: 19/04/2023
Field of study

Lancaster E-Prints

Anhaltender idiopathischer Gesichtsschmerz und atypische Odontalgie

Author: Abiko
Aggarwal
Attal
Benoliel
Charly Gaul
Dominik Ettlin
Doreen B. Pfau
Eisenberger
Eker
Epstein
Finnerup
Fitzgerald
Forssell
Forssell
Graff-Radford
Gremeau-Richard
Guler
Hagelberg
Hanihara
Jones
Kehlet
Koopman
Lang
Lang
List
Macfarlane
Maier
Nilges
Nixdorf
Nixdorf
Pigg
Raslan
Reeves
Richardson
Rolke
Siccoli
Steimer
Stohler
Stowell
Svensson
Svensson
Turp
Türp
Vlaeyen
Volcy
Zakrzewska
Publication venue: 'Elsevier BV'
Publication date: 01/01/2013
Field of study

The terms ‘persistent idiopathic facial pain’ (PIFP) and ‘atypical odontalgia’ (AO) are currently used as exclusion diagnoses for chronic toothache and chronic facial pain. Knowledge about these pain conditions in medical and dental practices is of crucial importance for the prevention of iatrogenic tissue damage by not-indicated invasive interventions, such as endodontic treatment and tooth extraction. In the present paper, etiology and pathogenesis, differential diagnostic criteria, and diagnostic approaches will be explained and relevant therapeutic principles will be outlined

Crossref

ZORA

MasakhaNEWS:News Topic Classification for African languages

Author: Abdullahi Saheed Salahudeen
Abdulmumin Idris
Abeeb Afolabi
Adeeko Adetola
Adelani David Ifeoluwa
Adelani Tolulope Anu
Ajayi Tunde Oluwaseyi
al-azzawi Sana Sabah
Alabi Jesujoba Oluwadara
Aremu Anuoluwapo
Awosan Oyinkansola F.
Awoyomi Oluwabusayo Olufunke
Azime Israel Abebe
Bame Mahlet Taye
Chukwuneke Chiamaka I.
David Davis
Diko Thina
Dossou Bonaventure F. P.
Emezue Chris Chinenye
Fanijo Samuel
Gebre Sinodos
Guge Tadesse Kebede
Gwadabe Tajuddeen
Hassan Fuad Mire
Johar Abdulmejid Tuni
Kailani Habiba Abdulganiy
Kimanuka Ussen
Kimotho Wangari
Masiak Marek
Mbonu Chinedu E.
Mehamed Moges Ahmed
Mohamed Muhidin
Mohamed Shafie Abdi
Muhammad Shamsuddeen Hassan
Mukiibi Jonathan
Mwase Christine
Ndolela Lolwethu
Ngabire Evrard
Ngoli Tatiana Moteu
Nixdorf Doreen
Nxakama Siyanda
Nyatsine Pamela
Obiefuna Nnaemeka C.
Odhiambo Brian
Oduwole Mardiyyah
Ogbu Onyekachi Raphael
Ogundepo Odunayo
Ojo Jessica
Oladipo Akintunde
Omotayo Abdul-Hakeem
Owodunni Abraham Toluwase
Samuel Olanrewaju
Sari Sakayo Toadoum
Shode Iyanuoluwa
Sibanda Blessing K.
Sidume Freedmore
Siro Clemencia
Stenetorp Pontus
Tonja Atnafu Lambebo
Tshinu Kanda Patrick
Yigezu Mesay Gemeda
Yousuf Oreen
Publication venue: arXiv.org
Publication date: 19/04/2023
Field of study

Aston Publications Explorer