Search CORE

13 research outputs found

Student Attrition Prediction Using Machine Learning Techniques

Author: Asogwa Doris Chinedu
Asogwa Emmanuel Chibuogu
Belonwu Tochukwu Sunday
Mbonu Emmanuel Chinedu
Nwankpa Joshua Makuochukwu
Publication venue: Mohammad Nassar for Researches (MNFR)
Publication date: 02/09/2023
Field of study

In educational systems, students’ course enrollment is fundamental performance metrics to academic and financial sustainability. In many higher institutions today, students’ attrition rates are caused by a variety of circumstances, including demographic and personal factors such as age, gender, academic background, financial abilities, and academic degree of choice. In this study, machine learning approaches was used to develop prediction models that predicted students’ attrition rate in pursuing computer science degree, as well as students who have a high risk of dropping out before graduation. This can help higher education institutes to develop proper intervention plans to reduce attrition rates and increase the probability of student academic success. Student’s data were collected from the Federal University Lokoja (FUL), Nigeria. The data were preprocessed using existing weka machine learning libraries where the data was converted into attribute related file form (arff) and resampling techniques was used to partition the data into training set and testing set. The correlation-based feature selection was extracted and used to develop the students’ attrition model and to identify the students’ risk of dropping out. Random forest and random tree machine learning algorithms were used to predict students' attrition. The results showed that the random forest had an accuracy of 79.45%, while the random tree's accuracy was 78.09%. This is an improvement over previous results where 66.14% and 57.48% accuracy was recorded for random forest and random tree respectively. This improvement was as a result of the techniques used. It is therefore recommended that applying techniques to the classification model can improve the performance of the model

International Journal of Computer (IJC - Global Society of Scientific Research and Researchers, GSSRR)

IgboNER 2.0:Expanding Named Entity Recognition Datasets via Projection

Author: Asogwa Doris
Chukwuneke CI
El-Haj Mahmoud
Ezeani Ignatius
Mbonu Chinedu
Okpalla Chidimma
Rayson Paul
Publication venue
Publication date: 03/03/2023
Field of study

Since the inception of the state-of-the-art neural network models for natural language processing research, the major challenge faced by low-resource languages is the lack or insufficiency of annotated training data. The named entity recognition (NER) task is no exception. The need for an efficient data creation and annotation process, especially for low-resource languages cannot be over-emphasized. In this work, we leverage an existing NER tool for English in a cross-language projection method that automatically creates a mapping dictionary of entities in a source language and their translations in the target language using a parallel English-Igbo corpus. The resultant mapping dictionary, which was manually checked and corrected by human annotators, was used to automatically generate and format an NER training dataset from the Igbo monolingual corpus thereby saving a lot of annotation time for the Igbo NER task. The generated dataset was also included in the training process and our experiments show improved performance results from previous works

Lancaster E-Prints

The African Stopwords project:curating stopwords for African languages

Author: Abdulmumin Idris
Aina Kaosarat
Ajibade Benjamin
Chukwuneke Chiamaka
David Davis
Dossou Bonaventure F. P.
Emezue Chris
Emezue Handel
Emmanuel Mbonu Chinedu
Etori Naome A.
Ige Ifeoluwatayo A.
Joshua Oviawe
Louis Lerato
Muhammad Shamsuddeen
Nigatu Hellina
Onwuegbuzia Emeka
Oyerinde Samuel
Samuel Olanrewaju
Thinwa Cynthia
Tonja Atnafu Lambebo
Yousuf Oreen
Zhou Helper
Publication venue: 'Center for Open Science'
Publication date: 21/03/2023
Field of study

Stopwords are fundamental in Natural Language Processing (NLP) techniques for information retrieval. One of the common tasks in preprocessing of text data is the removal of stopwords. Currently, while high-resource languages like English benefit from the availability of several stopwords, low-resource languages, such as those found in the African continent, have none that are standardized and available for use in NLP packages. Stopwords in the context of African languages are understudied and can reveal information about the crossover between languages. The \textit{African Stopwords} project aims to study and curate stopwords for African languages. In this paper, we present our current progress on ten African languages as well as future plans for the project

Lancaster E-Prints

MasakhaNEWS: News Topic Classification for African languages

Author: Ababu Teshome Mulugeta
Abdulganiyu Habiba
Abdulmumin Idris
Adeeko Adetola
Adelani David Ifeoluwa
Adelani Tolulope
Afolabi Abeeb
Ajayi Tunde
al-azzawi sana
Alabi Jesujoba
Aremu Anuoluwapo
Awosan Oyinkansola
Awoyomi Oluwabusayo
Azime Israel Abebe
Chukwuneke Chiamaka
David Davis
Diko Thina
Dossou Bonaventure F. P.
Emezue Chris Chinenye
Fanijo Samuel
Gwadabe Tajuddeen
Hassan Fuad Mire
Johar Abdulmejid
Jules Jules
Kebede Tadesse
Kimanuka Ussen
Kimotho Wangari
Masiak Marek
Mbonu Chinedu
Mehamed Moges Ahmed
Mohamed Muhidin
Mohamed Shafie
Moteu Tatiana
Muhammad Shamsuddeen Hassan
Mukiibi Jonathan
Mwase Christine
Ndolela Lolwethu
Ngabire Evrard
Nigusse Sinodos
Nixdorf Doreen
Nxakama Siyanda
Nyatsine Pamela
Obiefuna Nnaemeka
Odhiambo Brian
Oduwole Mardiyyah
Ogbu Onyekachi
Ogundepo Odunayo
Ojo Jessica
Oladipo Akintunde
Omotayo Abdul-Hakeem
Owodunni Abraham
Sakayo Toadoum Sari
Salahudeen Saheed Abdullahi
Samuel Olanrewaju
Shode Iyanuoluwa
Sibanda Blessing
Sidume Freedmore
Siro Clemencia
Ssenkungu Ivan
Stenetorp Pontus
Taye Mahlet
Tonja Atnafu Lambebo
Tshinu Tshinu
Yigezu Mesay Gemeda
Yousuf Oreen
Publication venue
Publication date: 20/09/2023
Field of study

African languages are severely under-represented in NLP research due to lack of datasets covering several NLP tasks. While there are individual language specific datasets that are being expanded to different tasks, only a handful of NLP tasks (e.g. named entity recognition and machine translation) have standardized benchmark datasets covering several geographical and typologically-diverse African languages. In this paper, we develop MasakhaNEWS -- a new benchmark dataset for news topic classification covering 16 languages widely spoken in Africa. We provide an evaluation of baseline models by training classical machine learning models and fine-tuning several language models. Furthermore, we explore several alternatives to full fine-tuning of language models that are better suited for zero-shot and few-shot learning such as cross-lingual parameter-efficient fine-tuning (like MAD-X), pattern exploiting training (PET), prompting language models (like ChatGPT), and prompt-free sentence transformer fine-tuning (SetFit and Cohere Embedding API). Our evaluation in zero-shot setting shows the potential of prompting ChatGPT for news topic classification in low-resource African languages, achieving an average performance of 70 F1 points without leveraging additional supervision like MAD-X. In few-shot setting, we show that with as little as 10 examples per label, we achieved more than 90\% (i.e. 86.0 F1 points) of the performance of full supervised training (92.6 F1 points) leveraging the PET approach.Comment: Accepted to IJCNLP-AACL 2023 (main conference

arXiv.org e-Print Archive

AfriQA:Cross-lingual Open-Retrieval Question Answering for African Languages

Author: Abdou Aziz DIOP
Adelani David Ifeoluwa
Adeyemi Mofetoluwa
Adhiambo Sonia
Ahia Orevaoghene
Ahmad Ibrahim Said
Ajayi Tunde Oluwaseyi
Ajisafe Daniel A.
Alabi Jesujoba O.
Amuok Priscilla A.
Anuoluwapo Aremu
Arthur Steven
Asai Akari
Awosan Oyinkansola
Ayodele Awokoya
Buzaaba Happy
Chinedu Mbonu
Chukwuneke Chiamaka
Clark Jonathan H.
Dossou Bonaventure F. P.
Emezue Chris
Ezeani Ignatius
Gwadabe Tajuddeen R.
Hacheme Gilles
Iro Ruqayya Nasir
Kahira Albert Njoroge
Lawan Falalu Ibrahim
Mabuya Rooweither
Mbow Habib
Mngoma Ndumiso
Muhammad Shamsuddeen H.
Mukonde Eunice
Mwase Christine
Namukombo Martin
Niyomutabazi Emile
Ogundepo Odunayo
Oladipo Akintunde
Onwuegbuzia Emeka Felix
Opoku Bernard
Osei Salomey
Otiende Verrah
Owodunni Abraham Toluwase
Phiri Mofya
Putini Neo
Rivera Clara E.
Rubungo Andre Niyongabo
Ruder Sebastian
Shode Iyanuoluwa
Sikasote Claytone
Sinkala Boyd
Siro Clemencia
Tonja Atnafu Lambebo
Publication venue: 'Center for Open Science'
Publication date: 11/05/2023
Field of study

African languages have far less in-language content available digitally, making it challenging for question answering systems to satisfy the information needs of users. Cross-lingual open-retrieval question answering (XOR QA) systems -- those that retrieve answer content from other languages while serving people in their native language -- offer a means of filling this gap. To this end, we create AfriQA, the first cross-lingual QA dataset with a focus on African languages. AfriQA includes 12,000+ XOR QA examples across 10 African languages. While previous datasets have focused primarily on languages where cross-lingual QA augments coverage from the target language, AfriQA focuses on languages where cross-lingual answer content is the only high-coverage source of answer content. Because of this, we argue that African languages are one of the most important and realistic use cases for XOR QA. Our experiments demonstrate the poor performance of automatic translation and multilingual retrieval methods. Overall, AfriQA proves challenging for state-of-the-art QA models. We hope that the dataset enables the development of more equitable QA technology

Lancaster E-Prints

AfriMTE and AfriCOMET : Empowering COMET to Embrace Under-resourced African Languages

Author: Abdullahi Saheed S.
Abolade Daud
Adelani David Ifeoluwa
Adewumi Tosin
Afolabi Abeeb
Agrawal Sweta
Ajao Simbiat
Akinjobi Zainab
Al-Azzawi Sana
Alkhaled Lama
Anigri Salma El
Aremu Anuoluwapo
Awoyomi Oluwabusayo Olufunke
Bourhim Sofia
Briakou Eleftheria
Brian Sam
Bukula Andiswa
Carpuat Marine
Chukwuneke Chiamaka
Etori Naome A.
Hassan Ayinde
He Xuanli
Hourrane Oumaima
Iro Ruqayya Nasir
Kimotho Wangari
Kimotho Wangui
Macharm Ricky
Mangwana Thabiso
Masiak Marek
Mbonu Chinedu Emmanuel
Mohamed Muhidin
Mohamed Shafie Abdi
Mokayede Hamam
Momo Lyse Naomi Wamba
Moore Stephen E.
Muchiri Eric
Muhammad Shamsuddeen Hassan
Mwase Christine
Ndolela Lolwethu
Njoroge Samuel
Obiefuna Nnaemeka
Ochieng Millicent
Ogayo Perez
Ogbu Onyekachi Raphael
Ojo Jessica
Olatoye Temitayo
Omotayo Abdul-Hakeem
Opoku Bernard
Osei Salomey
Otiende Verrah Akinyi
Rei Ricardo
Sari Sakayo Toadoum
Shode Iyanuoluwa
Siro Clemencia
Stenetorp Pontus
Wang Jiayi
Yuehgoh Foutse
Publication venue: 'Center for Open Science'
Publication date: 16/11/2023
Field of study

Despite the progress we have recorded in scaling multilingual machine translation (MT) models and evaluation data to several under-resourced African languages, it is difficult to measure accurately the progress we have made on these languages because evaluation is often performed on n-gram matching metrics like BLEU that often have worse correlation with human judgments. Embedding-based metrics such as COMET correlate better; however, lack of evaluation data with human ratings for under-resourced languages, complexity of annotation guidelines like Multidimensional Quality Metrics (MQM), and limited language coverage of multilingual encoders have hampered their applicability to African languages. In this paper, we address these challenges by creating high-quality human evaluation data with a simplified MQM guideline for error-span annotation and direct assessment (DA) scoring for 13 typologically diverse African languages. Furthermore, we develop AfriCOMET, a COMET evaluation metric for African languages by leveraging DA training data from high-resource languages and African-centric multilingual encoder (AfroXLM-Roberta) to create the state-of-the-art evaluation metric for African languages MT with respect to Spearman-rank correlation with human judgments (+0.406)

Lancaster E-Prints

MasakhaNEWS:News Topic Classification for African languages

Author: Abdullahi Saheed Salahudeen
Abdulmumin Idris
Abeeb Afolabi
Adeeko Adetola
Adelani David Ifeoluwa
Adelani Tolulope Anu
Ajayi Tunde Oluwaseyi
al-azzawi Sana Sabah
Alabi Jesujoba Oluwadara
Aremu Anuoluwapo
Awosan Oyinkansola F.
Awoyomi Oluwabusayo Olufunke
Azime Israel Abebe
Bame Mahlet Taye
Chukwuneke Chiamaka I.
David Davis
Diko Thina
Dossou Bonaventure F. P.
Emezue Chris Chinenye
Fanijo Samuel
Gebre Sinodos
Guge Tadesse Kebede
Gwadabe Tajuddeen
Hassan Fuad Mire
Johar Abdulmejid Tuni
Kailani Habiba Abdulganiy
Kimanuka Ussen
Kimotho Wangari
Masiak Marek
Mbonu Chinedu E.
Mehamed Moges Ahmed
Mohamed Muhidin
Mohamed Shafie Abdi
Muhammad Shamsuddeen Hassan
Mukiibi Jonathan
Mwase Christine
Ndolela Lolwethu
Ngabire Evrard
Ngoli Tatiana Moteu
Nixdorf Doreen
Nxakama Siyanda
Nyatsine Pamela
Obiefuna Nnaemeka C.
Odhiambo Brian
Oduwole Mardiyyah
Ogbu Onyekachi Raphael
Ogundepo Odunayo
Ojo Jessica
Oladipo Akintunde
Omotayo Abdul-Hakeem
Owodunni Abraham Toluwase
Samuel Olanrewaju
Sari Sakayo Toadoum
Shode Iyanuoluwa
Sibanda Blessing K.
Sidume Freedmore
Siro Clemencia
Stenetorp Pontus
Tonja Atnafu Lambebo
Tshinu Kanda Patrick
Yigezu Mesay Gemeda
Yousuf Oreen
Publication venue
Publication date: 19/04/2023
Field of study

Lancaster E-Prints

Detection and Classification of Human Gender into Binary (Male and Female) Using Convolutional Neural Network (CNN) Model

Author: Adene Gift
Ejike Chukwuogo Okechukwu
Emeka Ikedilo Obiora
Makuo Nwankpa Joshua
Mbonu Chinedu Emmanuel
Publication venue: SCIENCEDOMAIN international
Publication date: 22/04/2024
Field of study

This paper focuses on detecting the human gender using Convolutional Neural Network (CNN). Using CNN, a deep learning technique used as a feature extractor that takes input photos and gives values to various characteristics of the image and differentiates between them, the goal is to create and develop a real-time gender detection model. The model focuses on classifying human gender only into two different categories; male and female. The major reason why this work was carried out is to solve the problem of imposture. A CNN model was developed to extract facial features such as eyebrows, cheek bone, lip, nose shape and expressions to classify them into male and female gender, and also use demographic classification analysis to study and detect the facial expression. We implemented both machine learning algorithms and image processing techniques, and the Kaggle dataset showed encouraging results

Asian Journal of Research in Computer Science

Assessment of Awareness Level of e-Learning Classroom Strategies of University Lecturers: Implication for Evaluation of Library and Information Science Resources

Author: Ezugwu Ifesinachi Jude
Mbonu-Adigwe Bianca Uzoamaka
Okebanama Cliff Ikenna
Okeke Chinedu I.O., PhD
Okoye Maureen Nkolika
Onu William Okoroaja
Ugwuanyi Christian Sunday, PhD
Umate Bukar Alhaji
Publication venue: DigitalCommons@University of Nebraska - Lincoln
Publication date: 01/01/2021
Field of study

The emergence of information technology has brought a drastic change in the role of university libraries. University Libraries play key role in supporting e-learning implying that Library and information science personnel can significantly assist in the integration of information resources in the process of electronic learning. Thus, this study investigated lecturers’ level of awareness of e-learning classroom strategies. Descriptive survey design was adopted for the study. The sample comprised 149 lecturers teaching integrated science courses. Instrument used for data collection was a questionnaire titled Lecturers’ Awareness Level of E-learning Strategies (LALES). LALES was validated and the reliability index of the items was estimated at 0.897 using Cronbach’s Alpha method. The data collected were analyzed using mean and standard deviation to answer the research questions while the hypotheses were tested using t-test. Findings revealed among others, that the lecturers were partially aware of strategies to facilitate e-learning. Based on the findings, it was recommended among others that the Nigerian government should provide e-learning facilities through proper evaluation of Library and information science resources

DigitalCommons@University of Nebraska

The IgboAPI Dataset : Empowering Igbo Language Technologies through Multi-dialectal Enrichment

Author: Awo-Ndubuisi Esther Chidinma
Chukwuneke Chiamaka
Emezue Chris Chinenye
Ezeani Ignatius
Lal Daisy Monika
Mbonu Chinedu Emmanuel
Nweya Gerald Okey
Ogbonna Bright Ikechukwu
Okeke Chukwuma Onyebuchi
Okoh Ifeoma
Onwuzulike Ijemma
Oraegbunam Chukwuebuka Uchenna
Osuagwu Akudo Amarachukwu
Rayson Paul
Publication venue: ELRA and ICCL
Publication date: 01/05/2024
Field of study

The Igbo language is facing a risk of becoming endangered, as indicated by a 2025 UNESCO study. This highlights the need to develop language technologies for Igbo to foster communication, learning and preservation. To create robust, impactful, and widely adopted language technologies for Igbo, it is essential to incorporate the multi-dialectal nature of the language. The primary obstacle in achieving dialectal-aware language technologies is the lack of comprehensive dialectal datasets. In response, we present the IgboAPI dataset, a multi-dialectal Igbo-English dictionary dataset, developed with the aim of enhancing the representation of Igbo dialects. Furthermore, we illustrate the practicality of the IgboAPI dataset through two distinct studies: one focusing on Igbo semantic lexicon and the other on machine translation. In the semantic lexicon project, we successfully establish an initial Igbo semantic lexicon for the Igbo semantic tagger, while in the machine translation study, we demonstrate that by finetuning existing machine translation systems using the IgboAPI dataset, we significantly improve their ability to handle dialectal variations in sentences

Lancaster E-Prints