Search CORE

11 research outputs found

MasakhaNER 2.0: Africa-centric Transfer Learning for Named Entity Recognition

Author: Abdulmumin I
Adelani DI
Adewumi T
Adeyemi M
Ahia O
Alabi JO
Aremu A
Bamba Dione CM
Beukman M
Bukula A
Buzaaba H
Chukwuneke C
Dossou BFP
Emezue CC
Ezeani I
Gitau C
Gwadabe T
Hacheme GQ
Kabore F
Kalipe G
Klakow D
Koagne VM
Lignos C
Mabuya R
Macucwa T
Marivate V
Mbaye D
Mboning E
Mokono NL
Muhammad SH
Mukiibi J
Munkoh-Buabeng E
Nabende P
Nakatumba-Nabende J
Neubig G
Ngoli TM
Ogayo P
Ogundepo O
Palen-Michel C
Rijhwani S
Ruder S
Sibanda B
Tapo AA
Taylor A
Yousuf O
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/12/2022
Field of study

African languages are spoken by over a billion people, but are underrepresented in NLP research and development. The challenges impeding progress include the limited availability of annotated datasets, as well as a lack of understanding of the settings where current methods are effective. In this paper, we make progress towards solutions for these challenges, focusing on the task of named entity recognition (NER). We create the largest human-annotated NER dataset for 20 African languages, and we study the behavior of state-of-the-art cross-lingual transfer methods in an Africa-centric setting, demonstrating that the choice of source language significantly affects performance. We show that choosing the best transfer language improves zero-shot F1 scores by an average of 14 points across 20 languages compared to using English. Our results highlight the need for benchmark datasets and models that cover typologically-diverse African languages

UCL Discovery

MasakhaPOS: Part-of-Speech Tagging for Typologically Diverse African languages

Author: Abdullahi M
Adelani DI
Adelani TA
Agbolo A
Akinade I
Alabi JO
Aremu A
Atindogbe G
Bamba Dione CM
Bukula A
Buzaaba H
Chimhenga E
Dossou BFP
Emezue CC
Gitau C
Gotosa K
Gwadabe T
Kabore FO
Kalipe G
Klakow D
Koagne VM
Mabuya R
Macucwa T
Marivate V
Mbaye D
Mboning ET
Mizha P
Muhammad SH
Mukiibi J
Munkoh-Buabeng E
Musabeyezu T
Nabende P
Nahimana M
Niyomutabazi E
Ogayo P
Onyenwe I
Samuel O
Sibanda B
Sindane T
Tapo AA
Taylor A
Traore S
Uchechukwu C
Yusuf A
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/07/2023
Field of study

In this paper, we present AfricaPOS, the largest part-of-speech (POS) dataset for 20 typologically diverse African languages. We discuss the challenges in annotating POS for these languages using the universal dependencies (UD) guidelines. We conducted extensive POS baseline experiments using both conditional random field and several multilingual pre-trained language models. We applied various cross-lingual transfer models trained with data available in the UD. Evaluating on the AfricaPOS dataset, we show that choosing the best transfer language(s) in both single-source and multi-source setups greatly improves the POS tagging performance of the target languages, in particular when combined with parameter-fine-tuning methods. Crucially, transferring knowledge from a language that matches the language family and morphosyntactic properties seems to be more effective for POS tagging in unseen languages

UCL Discovery

{MasakhaNER}: {N}amed Entity Recognition for {A}frican Languages

Author: Abbott J.
Adelani D.
Adewumi T.
Adeyemi M.
Ahia O.
Akinfaderin A.
Akinode V.
Alabi J.
Anebi E.
Anuoluwapo A.
Awokoya A.
Azime I.
Bateesa T.
Buzaaba H.
Chukwuneke C.
David D.
Diallo A.
DIOP T.
Dossou B.
D’souza D.
Emezue C.
Ezeani I.
Faye A.
Gebreyohannes D.
Gitau C.
Gwadabe T.
Katusiime M.
Kreutzer J.
Lignos C.
Marengereke T.
Mayhew S.
Mbaye D.
MBOUP M.
Muhammad S.
Mukiibi J.
Muriuki G.
Nabagereka D.
Nakatumba-Nabende J.
Neubig G.
Ngom S.
Niyongabo R.
Nwaike K.
Odu N.
Ogayo P.
Ogueji K.
Oloyede T.
Orife I.
Osei S.
Otiende V.
Oyerinde S.
Palen-Michel C.
Rayson P.
Rijhwani S.
Ruder S.
Sibanda B.
Siro C.
Tilaye H.
Wairagala E.
Wambui Y.
Wolde D.
Yimam S.
Publication venue: 'MIT Press - Journals'
Publication date: 01/01/2021
Field of study

MPG.PuRe

MasakhaNEWS:News Topic Classification for African languages

Author: Abdullahi Saheed Salahudeen
Abdulmumin Idris
Abeeb Afolabi
Adeeko Adetola
Adelani David Ifeoluwa
Adelani Tolulope Anu
Ajayi Tunde Oluwaseyi
al-azzawi Sana Sabah
Alabi Jesujoba Oluwadara
Aremu Anuoluwapo
Awosan Oyinkansola F.
Awoyomi Oluwabusayo Olufunke
Azime Israel Abebe
Bame Mahlet Taye
Chukwuneke Chiamaka I.
David Davis
Diko Thina
Dossou Bonaventure F. P.
Emezue Chris Chinenye
Fanijo Samuel
Gebre Sinodos
Guge Tadesse Kebede
Gwadabe Tajuddeen
Hassan Fuad Mire
Johar Abdulmejid Tuni
Kailani Habiba Abdulganiy
Kimanuka Ussen
Kimotho Wangari
Masiak Marek
Mbonu Chinedu E.
Mehamed Moges Ahmed
Mohamed Muhidin
Mohamed Shafie Abdi
Muhammad Shamsuddeen Hassan
Mukiibi Jonathan
Mwase Christine
Ndolela Lolwethu
Ngabire Evrard
Ngoli Tatiana Moteu
Nixdorf Doreen
Nxakama Siyanda
Nyatsine Pamela
Obiefuna Nnaemeka C.
Odhiambo Brian
Oduwole Mardiyyah
Ogbu Onyekachi Raphael
Ogundepo Odunayo
Ojo Jessica
Oladipo Akintunde
Omotayo Abdul-Hakeem
Owodunni Abraham Toluwase
Samuel Olanrewaju
Sari Sakayo Toadoum
Shode Iyanuoluwa
Sibanda Blessing K.
Sidume Freedmore
Siro Clemencia
Stenetorp Pontus
Tonja Atnafu Lambebo
Tshinu Kanda Patrick
Yigezu Mesay Gemeda
Yousuf Oreen
Publication venue
Publication date: 19/04/2023
Field of study

African languages are severely under-represented in NLP research due to lack of datasets covering several NLP tasks. While there are individual language specific datasets that are being expanded to different tasks, only a handful of NLP tasks (e.g. named entity recognition and machine translation) have standardized benchmark datasets covering several geographical and typologically-diverse African languages. In this paper, we develop MasakhaNEWS -- a new benchmark dataset for news topic classification covering 16 languages widely spoken in Africa. We provide an evaluation of baseline models by training classical machine learning models and fine-tuning several language models. Furthermore, we explore several alternatives to full fine-tuning of language models that are better suited for zero-shot and few-shot learning such as cross-lingual parameter-efficient fine-tuning (like MAD-X), pattern exploiting training (PET), prompting language models (like ChatGPT), and prompt-free sentence transformer fine-tuning (SetFit and Cohere Embedding API). Our evaluation in zero-shot setting shows the potential of prompting ChatGPT for news topic classification in low-resource African languages, achieving an average performance of 70 F1 points without leveraging additional supervision like MAD-X. In few-shot setting, we show that with as little as 10 examples per label, we achieved more than 90\% (i.e. 86.0 F1 points) of the performance of full supervised training (92.6 F1 points) leveraging the PET approach

Lancaster E-Prints

Authentic Empathy and the Role of Victim Service Providers in (De)stigmatizing Male Sexual Victimization

Author: Becker H. S.
Chuka N. Emezue
Connell R. W.
Foucault M.
Foucault M.
Goffman E.
Rasmussen L. A.
Smith S. G.
Tipparat Udmuangpia
Willig C.
Publication venue: 'SAGE Publications'
Publication date
Field of study

Crossref

Culturally-Differentiated Batterer Intervention Programs for Immigrant Male Batterers (IMB): An Integrative Review

Author: Bennett L.
Celaya-Alston R. C.
Chuka N. Emezue
Guruge S.
McAuliffe M.
Oliver J. Williams
Perilla J.
Raj A.
Tina L. Bloom
Tjaden P.
Publication venue: 'Informa UK Limited'
Publication date
Field of study

Crossref