22 research outputs found
Educational reform: a bedrock for national development in Nigeria
This paper examined the fundamental of educational reform to national development with relevance to Nigerian educational system. The rational for educational reform to national development was well highlighted and the needs for educational reform such as to improve on the standard, future expectations, exogenous factors, achievement inclined and creativity were properly expatiated. Planning for reform in educational structure, curriculum and methods as well as management of educational reforms which are the backbone to effective and efficient educational reform were well elucidated. The Federal Government of Nigeria had envisaged and prepared for reform in education through the provision of goal oriented national policy on education. Basis for evaluation of educational reform which is a necessity as well as procedure and levels of evaluation are fully highlighted and discussed. Factors militating against effective educational reform in Nigeria were identified and explained. The paper itemized some recommendations toward sustainable educational reform in Nigeria.Keywords: Educational Reform, National Development, Nigerian Educational Syste
YORC: Yoruba Reading Comprehension dataset
In this paper, we create YORC: a new multi-choice Yoruba Reading
Comprehension dataset that is based on Yoruba high-school reading comprehension
examination. We provide baseline results by performing cross-lingual transfer
using existing English RACE dataset based on a pre-trained encoder-only model.
Additionally, we provide results by prompting large language models (LLMs) like
GPT-4
K\'U [MASK]: Integrating Yor\`ub\'a cultural greetings into machine translation
This paper investigates the performance of massively multilingual neural
machine translation (NMT) systems in translating Yor\`ub\'a greetings
( k\'u [MASK]), which are a big part of Yor\`ub\'a language and
culture, into English. To evaluate these models, we present IkiniYor\`ub\'a, a
Yor\`ub\'a-English translation dataset containing some Yor\`ub\'a greetings,
and sample use cases. We analysed the performance of different multilingual NMT
systems including Google and NLLB and show that these models struggle to
accurately translate Yor\`ub\'a greetings into English. In addition, we trained
a Yor\`ub\'a-English model by finetuning an existing NMT model on the training
split of IkiniYor\`ub\'a and this achieved better performance when compared to
the pre-trained multilingual NMT models, although they were trained on a large
volume of data.Comment: C3NLP Workshop @ EACL202
Inria-ALMAnaCH at the WMT 2022 shared task: Does Transcription Help Cross-Script Machine Translation?
International audienceThis paper describes the Inria ALMAnaCH team submission to the WMT 2022 general translation shared task. Participating in the language directions {cs,ru,uk}āen and csāuk, we experiment with the use of a dedicated Latin-script transcription convention aimed at representing all Slavic languages involved in a way that maximises character-and word-level correspondences between them as well as with the English language. Our hypothesis was that bringing the source and target language closer could have a positive impact on machine translation results. We provide multiple comparisons, including bilingual and multilingual baselines, with and without transcription. Initial results indicate that the transcription strategy was not successful, resulting in lower results than baselines. We nevertheless submitted our multilingual, transcribed models as our primary systems, and in this paper provide some indications as to why we got these negative results
SIB-200: A Simple, Inclusive, and Big Evaluation Dataset for Topic Classification in 200+ Languages and Dialects
Despite the progress we have recorded in the last few years in multilingual
natural language processing, evaluation is typically limited to a small set of
languages with available datasets which excludes a large number of low-resource
languages. In this paper, we created SIB-200 -- a large-scale open-sourced
benchmark dataset for topic classification in 200 languages and dialects to
address the lack of evaluation dataset for Natural Language Understanding
(NLU). For many of the languages covered in SIB-200, this is the first publicly
available evaluation dataset for NLU. The dataset is based on Flores-200
machine translation corpus. We annotated the English portion of the dataset and
extended the sentence-level annotation to the remaining 203 languages covered
in the corpus. Despite the simplicity of this task, our evaluation in
full-supervised setting, cross-lingual transfer setting and prompting of large
language model setting show that there is still a large gap between the
performance of high-resource and low-resource languages when multilingual
evaluation is scaled to numerous world languages. We found that languages
unseen during the pre-training of multilingual language models,
under-represented language families (like Nilotic and Altantic-Congo), and
languages from the regions of Africa, Americas, Oceania and South East Asia,
often have the lowest performance on our topic classification dataset. We hope
our dataset will encourage a more inclusive evaluation of multilingual language
models on a more diverse set of languages. https://github.com/dadelani/sib-200Comment: under submissio
Masakhane-Afrisenti at SemEval-2023 Task 12: Sentiment Analysis using Afro-centric Language Models and Adapters for Low-resource African Languages
AfriSenti-SemEval Shared Task 12 of SemEval-2023. The task aims to perform
monolingual sentiment classification (sub-task A) for 12 African languages,
multilingual sentiment classification (sub-task B), and zero-shot sentiment
classification (task C). For sub-task A, we conducted experiments using
classical machine learning classifiers, Afro-centric language models, and
language-specific models. For task B, we fine-tuned multilingual pre-trained
language models that support many of the languages in the task. For task C, we
used we make use of a parameter-efficient Adapter approach that leverages
monolingual texts in the target language for effective zero-shot transfer. Our
findings suggest that using pre-trained Afro-centric language models improves
performance for low-resource African languages. We also ran experiments using
adapters for zero-shot tasks, and the results suggest that we can obtain
promising results by using adapters with a limited amount of resources.Comment: SemEval 202
MasakhaNEWS: News Topic Classification for African languages
African languages are severely under-represented in NLP research due to lack
of datasets covering several NLP tasks. While there are individual language
specific datasets that are being expanded to different tasks, only a handful of
NLP tasks (e.g. named entity recognition and machine translation) have
standardized benchmark datasets covering several geographical and
typologically-diverse African languages. In this paper, we develop MasakhaNEWS
-- a new benchmark dataset for news topic classification covering 16 languages
widely spoken in Africa. We provide an evaluation of baseline models by
training classical machine learning models and fine-tuning several language
models. Furthermore, we explore several alternatives to full fine-tuning of
language models that are better suited for zero-shot and few-shot learning such
as cross-lingual parameter-efficient fine-tuning (like MAD-X), pattern
exploiting training (PET), prompting language models (like ChatGPT), and
prompt-free sentence transformer fine-tuning (SetFit and Cohere Embedding API).
Our evaluation in zero-shot setting shows the potential of prompting ChatGPT
for news topic classification in low-resource African languages, achieving an
average performance of 70 F1 points without leveraging additional supervision
like MAD-X. In few-shot setting, we show that with as little as 10 examples per
label, we achieved more than 90\% (i.e. 86.0 F1 points) of the performance of
full supervised training (92.6 F1 points) leveraging the PET approach.Comment: Accepted to IJCNLP-AACL 2023 (main conference