Search CORE

22 research outputs found

Educational reform: a bedrock for national development in Nigeria

Author: Alabi Amos Oyetunde
Alabi Jesujoba Oluwadara
Publication venue: 'African Journals Online (AJOL)'
Publication date: 20/11/2018
Field of study

This paper examined the fundamental of educational reform to national development with relevance to Nigerian educational system. The rational for educational reform to national development was well highlighted and the needs for educational reform such as to improve on the standard, future expectations, exogenous factors, achievement inclined and creativity were properly expatiated. Planning for reform in educational structure, curriculum and methods as well as management of educational reforms which are the backbone to effective and efficient educational reform were well elucidated. The Federal Government of Nigeria had envisaged and prepared for reform in education through the provision of goal oriented national policy on education. Basis for evaluation of educational reform which is a necessity as well as procedure and levels of evaluation are fully highlighted and discussed. Factors militating against effective educational reform in Nigeria were identified and explained. The paper itemized some recommendations toward sustainable educational reform in Nigeria.Keywords: Educational Reform, National Development, Nigerian Educational Syste

AJOL - African Journals Online

YORC: Yoruba Reading Comprehension dataset

Author: Adelani David Ifeoluwa
Alabi Jesujoba O.
Aremu Anuoluwapo
Publication venue
Publication date: 14/09/2023
Field of study

In this paper, we create YORC: a new multi-choice Yoruba Reading Comprehension dataset that is based on Yoruba high-school reading comprehension examination. We provide baseline results by performing cross-lingual transfer using existing English RACE dataset based on a pre-trained encoder-only model. Additionally, we provide results by prompting large language models (LLMs) like GPT-4

arXiv.org e-Print Archive

$\mathcal{E}$ K\'U [MASK]: Integrating Yor\`ub\'a cultural greetings into machine translation

Author: Adelani David
Akinade Idris
Alabi Jesujoba
Klakow Dietrich
Odoje Clement
Publication venue
Publication date: 31/03/2023
Field of study

This paper investigates the performance of massively multilingual neural machine translation (NMT) systems in translating Yor\`ub\'a greetings (

\mathcal{E}

k\'u [MASK]), which are a big part of Yor\`ub\'a language and culture, into English. To evaluate these models, we present IkiniYor\`ub\'a, a Yor\`ub\'a-English translation dataset containing some Yor\`ub\'a greetings, and sample use cases. We analysed the performance of different multilingual NMT systems including Google and NLLB and show that these models struggle to accurately translate Yor\`ub\'a greetings into English. In addition, we trained a Yor\`ub\'a-English model by finetuning an existing NMT model on the training split of IkiniYor\`ub\'a and this achieved better performance when compared to the pre-trained multilingual NMT models, although they were trained on a large volume of data.Comment: C3NLP Workshop @ EACL202

arXiv.org e-Print Archive

Inria-ALMAnaCH at the WMT 2022 shared task: Does Transcription Help Cross-Script Machine Translation?

Author: Alabi Jesujoba,
Bawden Rachel
Muller Benjamin
Nishimwe Lydia
Rey Camille
Sagot Benoît
Publication venue: HAL CCSD
Publication date: 07/12/2022
Field of study

International audienceThis paper describes the Inria ALMAnaCH team submission to the WMT 2022 general translation shared task. Participating in the language directions {cs,ru,uk}→en and cs↔uk, we experiment with the use of a dedicated Latin-script transcription convention aimed at representing all Slavic languages involved in a way that maximises character-and word-level correspondences between them as well as with the English language. Our hypothesis was that bringing the source and target language closer could have a positive impact on machine translation results. We provide multiple comparisons, including bilingual and multilingual baselines, with and without transcription. Initial results indicate that the transcription strategy was not successful, resulting in lower results than baselines. We nevertheless submitted our multilingual, transcribed models as our primary systems, and in this paper provide some indications as to why we got these negative results

INRIA a CCSD electronic archive server

SIB-200: A Simple, Inclusive, and Big Evaluation Dataset for Topic Classification in 200+ Languages and Dialects

Author: Adelani David Ifeoluwa
Alabi Jesujoba O.
Gao Haonan
Lee Annie En-Shiun
Liu Hannah
Mao Yanke
Shen Xiaoyu
Vassilyev Nikita
Publication venue
Publication date: 14/09/2023
Field of study

Despite the progress we have recorded in the last few years in multilingual natural language processing, evaluation is typically limited to a small set of languages with available datasets which excludes a large number of low-resource languages. In this paper, we created SIB-200 -- a large-scale open-sourced benchmark dataset for topic classification in 200 languages and dialects to address the lack of evaluation dataset for Natural Language Understanding (NLU). For many of the languages covered in SIB-200, this is the first publicly available evaluation dataset for NLU. The dataset is based on Flores-200 machine translation corpus. We annotated the English portion of the dataset and extended the sentence-level annotation to the remaining 203 languages covered in the corpus. Despite the simplicity of this task, our evaluation in full-supervised setting, cross-lingual transfer setting and prompting of large language model setting show that there is still a large gap between the performance of high-resource and low-resource languages when multilingual evaluation is scaled to numerous world languages. We found that languages unseen during the pre-training of multilingual language models, under-represented language families (like Nilotic and Altantic-Congo), and languages from the regions of Africa, Americas, Oceania and South East Asia, often have the lowest performance on our topic classification dataset. We hope our dataset will encourage a more inclusive evaluation of multilingual language models on a more diverse set of languages. https://github.com/dadelani/sib-200Comment: under submissio

arXiv.org e-Print Archive

Masakhane-Afrisenti at SemEval-2023 Task 12: Sentiment Analysis using Afro-centric Language Models and Adapters for Low-resource African Languages

Author: Adewumi Tosin
Al-Azzawi Sana Sabah
Alabi Jesujoba
Awokoya Ayodele
Awosan Oyinkansola
Azime Israel Abebe
Fanijo Samuel
Oduwole Mardiyyah
Shode Iyanuoluwa
Tonja Atnafu Lambebo
Yousuf Oreen
Publication venue
Publication date: 13/04/2023
Field of study

AfriSenti-SemEval Shared Task 12 of SemEval-2023. The task aims to perform monolingual sentiment classification (sub-task A) for 12 African languages, multilingual sentiment classification (sub-task B), and zero-shot sentiment classification (task C). For sub-task A, we conducted experiments using classical machine learning classifiers, Afro-centric language models, and language-specific models. For task B, we fine-tuned multilingual pre-trained language models that support many of the languages in the task. For task C, we used we make use of a parameter-efficient Adapter approach that leverages monolingual texts in the target language for effective zero-shot transfer. Our findings suggest that using pre-trained Afro-centric language models improves performance for low-resource African languages. We also ran experiments using adapters for zero-shot tasks, and the results suggest that we can obtain promising results by using adapters with a limited amount of resources.Comment: SemEval 202

arXiv.org e-Print Archive

MasakhaNEWS: News Topic Classification for African languages

Author: Ababu Teshome Mulugeta
Abdulganiyu Habiba
Abdulmumin Idris
Adeeko Adetola
Adelani David Ifeoluwa
Adelani Tolulope
Afolabi Abeeb
Ajayi Tunde
al-azzawi sana
Alabi Jesujoba
Aremu Anuoluwapo
Awosan Oyinkansola
Awoyomi Oluwabusayo
Azime Israel Abebe
Chukwuneke Chiamaka
David Davis
Diko Thina
Dossou Bonaventure F. P.
Emezue Chris Chinenye
Fanijo Samuel
Gwadabe Tajuddeen
Hassan Fuad Mire
Johar Abdulmejid
Jules Jules
Kebede Tadesse
Kimanuka Ussen
Kimotho Wangari
Masiak Marek
Mbonu Chinedu
Mehamed Moges Ahmed
Mohamed Muhidin
Mohamed Shafie
Moteu Tatiana
Muhammad Shamsuddeen Hassan
Mukiibi Jonathan
Mwase Christine
Ndolela Lolwethu
Ngabire Evrard
Nigusse Sinodos
Nixdorf Doreen
Nxakama Siyanda
Nyatsine Pamela
Obiefuna Nnaemeka
Odhiambo Brian
Oduwole Mardiyyah
Ogbu Onyekachi
Ogundepo Odunayo
Ojo Jessica
Oladipo Akintunde
Omotayo Abdul-Hakeem
Owodunni Abraham
Sakayo Toadoum Sari
Salahudeen Saheed Abdullahi
Samuel Olanrewaju
Shode Iyanuoluwa
Sibanda Blessing
Sidume Freedmore
Siro Clemencia
Ssenkungu Ivan
Stenetorp Pontus
Taye Mahlet
Tonja Atnafu Lambebo
Tshinu Tshinu
Yigezu Mesay Gemeda
Yousuf Oreen
Publication venue
Publication date: 20/09/2023
Field of study

African languages are severely under-represented in NLP research due to lack of datasets covering several NLP tasks. While there are individual language specific datasets that are being expanded to different tasks, only a handful of NLP tasks (e.g. named entity recognition and machine translation) have standardized benchmark datasets covering several geographical and typologically-diverse African languages. In this paper, we develop MasakhaNEWS -- a new benchmark dataset for news topic classification covering 16 languages widely spoken in Africa. We provide an evaluation of baseline models by training classical machine learning models and fine-tuning several language models. Furthermore, we explore several alternatives to full fine-tuning of language models that are better suited for zero-shot and few-shot learning such as cross-lingual parameter-efficient fine-tuning (like MAD-X), pattern exploiting training (PET), prompting language models (like ChatGPT), and prompt-free sentence transformer fine-tuning (SetFit and Cohere Embedding API). Our evaluation in zero-shot setting shows the potential of prompting ChatGPT for news topic classification in low-resource African languages, achieving an average performance of 70 F1 points without leveraging additional supervision like MAD-X. In few-shot setting, we show that with as little as 10 examples per label, we achieved more than 90\% (i.e. 86.0 F1 points) of the performance of full supervised training (92.6 F1 points) leveraging the PET approach.Comment: Accepted to IJCNLP-AACL 2023 (main conference

arXiv.org e-Print Archive