Search CORE

25 research outputs found

An Approach for ECG Feature Extraction using Daubechies 4 (DB4) Wavelet

Author: Deriche Mohamed
Mohamed Muhidin
Publication venue: 'Foundation of Computer Science'
Publication date: 01/06/2014
Field of study

An Electrocardiogram (ECG) signal describes the electrical activity of the heart recorded by electrodes placed on the surface of human body. It summarizes an important electrical activity used for the primary diagnosis of heart abnormalities such as Tachycardia, Bradycardia, Normalcy, Regularity and Heart Rate Variation. The most clinically useful information of the ECG signal is found in the time intervals between its consecutive waves and amplitudes defined by its features. In this paper, an ECG feature extraction algorithm based on Daubechies Wavelet Transform is presented. DB4 Wavelet is selected due to the similarity of its scaling function to the shape of the ECG signal. R peaks detection is the core of this algorithm’s feature extraction. All other primary peaks are extracted with respect to the location of R peaks through creating windows proportional to their normal intervals. The proposed extraction algorithm is evaluated on MIT-BIH Arrhythmia Database. Experimental results indicate that the algorithm can successfully detect and extract all the primary features with a deviation error of less than 10%

Crossref

Aston Publications Explorer

Identifying and Extracting Named Entities from Wikipedia Database Using Entity Infoboxes

Author: Mohamed Muhidin
Oussalah Mourad
Publication venue: 'The Science and Information Organization'
Publication date: 01/01/2014
Field of study

An approach for named entity classification based on Wikipedia article infoboxes is described in this paper. It identifies the three fundamental named entity types, namely; Person, Location and Organization. An entity classification is accomplished by matching entity attributes extracted from the relevant entity article infobox against core entity attributes built from Wikipedia Infobox Templates. Experimental results showed that the classifier can achieve a high accuracy and F-measure scores of 97%. Based on this approach, a database of around 1.6 million 3-typed named entities is created from 20140203 Wikipedia dump. Experiments on CoNLL2003 shared task named entity recognition (NER) dataset disclosed the system’s outstanding performance in comparison to three different state-of-the-art systems

CiteSeerX

Crossref

University of Birmingham Research Portal

Aston Publications Explorer

A hybrid approach for paraphrase identification based on knowledge-enriched semantic heuristics

Author: Mohamed Muhidin
Oussalah Mourad
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2020
Field of study

In this paper, we propose a hybrid approach for sentence paraphrase identification. The proposal addresses the problem of evaluating sentence-to-sentence semantic similarity when the sentences contain a set of named-entities. The essence of the proposal is to distinguish the computation of the semantic similarity of named-entity tokens from the rest of the sentence text. More specifically, this is based on the integration of word semantic similarity derived from WordNet taxonomic relations, and named-entity semantic relatedness inferred from Wikipedia entity co-occurrences and underpinned by Normalized Google Distance. In addition, the WordNet similarity measure is enriched with word part-of-speech (PoS) conversion aided with a Categorial Variation database (CatVar), which enhances the lexico-semantics of words. We validated our hybrid approach using two different datasets; Microsoft Research Paraphrase Corpus (MSRPC) and TREC-9 Question Variants. In our empirical evaluation, we showed that our system outperforms baselines and most of the related state-of-the-art systems for paraphrase detection. We also conducted a misidentification analysis to disclose the primary sources of our system errors

Aston Publications Explorer

University of Oulu Repository - Jultika

A comparative study of conversion aided methods for WordNet sentence textual similarity

Author: Mohamed Muhidin
Oussalah Mourad
Publication venue
Publication date: 01/01/2014
Field of study

In this paper, we present a comparison of three methods for taxonomic-based sentence semantic relatedness, aided with word parts of speech (PoS) conversion. We use WordNet ontology for determining word level semantic similarity while augmenting WordNet with two other lexicographical databases; namely Categorial Variation Database (CatVar) and Morphosemantic Database in assisting the word category conversion. Using a human annotated benchmark data set, all the three approaches achieved a high positive correlation reaching up to (r = 0.881647) with comparison to human ratings and two other baselines evaluated on the same benchmark data set

CiteSeerX

Crossref

Aston Publications Explorer

Automatic text summarisation using linguistic knowledge-based semantics

Author: Mohamed Muhidin Abdullahi
Publication venue
Publication date: 01/07/2016
Field of study

Text summarisation is reducing a text document to a short substitute summary. Since the commencement of the field, almost all summarisation research works implemented to this date involve identification and extraction of the most important document/cluster segments, called extraction. This typically involves scoring each document sentence according to a composite scoring function consisting of surface level and semantic features. Enabling machines to analyse text features and understand their meaning potentially requires both text semantic analysis and equipping computers with an external semantic knowledge. This thesis addresses extractive text summarisation by proposing a number of semantic and knowledge-based approaches. The work combines the high-quality semantic information in WordNet, the crowdsourced encyclopaedic knowledge in Wikipedia, and the manually crafted categorial variation in CatVar, to improve the summary quality. Such improvements are accomplished through sentence level morphological analysis and the incorporation of Wikipedia-based named-entity semantic relatedness while using heuristic algorithms. The study also investigates how sentence-level semantic analysis based on semantic role labelling (SRL), leveraged with a background world knowledge, influences sentence textual similarity and text summarisation. The proposed sentence similarity and summarisation methods were evaluated on standard publicly available datasets such as the Microsoft Research Paraphrase Corpus (MSRPC), TREC-9 Question Variants, and the Document Understanding Conference 2002, 2005, 2006 (DUC 2002, DUC 2005, DUC 2006) Corpora. The project also uses Recall-Oriented Understudy for Gisting Evaluation (ROUGE) for the quantitative assessment of the proposed summarisers’ performances. Results of our systems showed their effectiveness as compared to related state-of-the-art summarisation methods and baselines. Of the proposed summarisers, the SRL Wikipedia-based system demonstrated the best performance

University of Birmingham Research Archive, E-theses Repository

Knowledge-Based Sentence Semantic Similarity:Algebraical Properties

Author: Mohamed Muhidin
Oussalah Mourad
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 21/08/2021
Field of study

Determining the extent to which two text snippets are semantically equivalent is a well-researched topic in the areas of natural language processing, information retrieval and text summarization. The sentence-to-sentence similarity scoring is extensively used in both generic and query-based summarization of documents as a significance or a similarity indicator. Nevertheless, most of these applications utilize the concept of semantic similarity measure only as a tool, without paying importance to the inherent properties of such tools that ultimately restrict the scope and technical soundness of the underlined applications. This paper aims to contribute to fill in this gap. It investigates three popular WordNet hierarchical semantic similarity measures, namely path-length, Wu and Palmer and Leacock and Chodorow, from both algebraical and intuitive properties, highlighting their inherent limitations and theoretical constraints. We have especially examined properties related to range and scope of the semantic similarity score, incremental monotonicity evolution, monotonicity with respect to hyponymy/hypernymy relationship as well as a set of interactive properties. Extension from word semantic similarity to sentence similarity has also been investigated using a pairwise canonical extension. Properties of the underlined sentence-to-sentence similarity are examined and scrutinized. Next, to overcome inherent limitations of WordNet semantic similarity in terms of accounting for various Part-of-Speech word categories, a WordNet “All word-To-Noun conversion” that makes use of Categorial Variation Database (CatVar) is put forward and evaluated using a publicly available dataset with a comparison with some state-of-the-art methods. The finding demonstrates the feasibility of the proposal and opens up new opportunities in information retrieval and natural language processing tasks

Aston Publications Explorer

SDbQfSum: Query-focused summarization framework basedon diversity and text semantic analysis

Author: Chang Victor
Mohamed Muhidin
Oussalah Mourad
Publication venue
Publication date: 29/09/2023
Field of study

Query-focused multi-document summarization (Qf-MDS) is a sub-task of automatic text summarization that aims to extract a substitute summary from a document cluster of the same topic and based on a user query. Unlike other summarization tasks, Qf-MDS has specific research challenges including the differences and similarities across related document sets, the high degree of redundancy inherent in the summaries created from multiple related sources, relevance to the given query, topic diversity in the produced summary and the small source-to-summary compression ratio. In this work, we propose a semantic diversity feature based query-focused extractive summarizer (SDbQfSum) built on powerful text semantic representation techniques underpinned with Wikipedia commonsense knowledge in order to address the query-relevance, centrality, redundancy and diversity challenges. Specifically, a semantically parsed document text is combined with knowledge-based vectorial representation to extract effective sentence importance and query-relevance features. The proposed monolingual summarizer is evaluated on a standard English dataset for automatic query-focused summarization tasks, that is, the DUC2006 dataset. The obtained results show that our summarizer outperforms most state-of-the-art related approaches on one or more ROUGE measures achieving 0.418, 0.092 and 0.152 in ROUGE-1, ROUGE-2,and ROUGE-SU4 respectively. It also attains competitive performance with the slightly outperforming system(s), for example, the difference between our system's result and best system in ROUGE-1 is just 0.006. We also found through the conducted experiments that our proposed custom cluster merging algorithm significantly reduces information redundancy while maintaining topic diversity across documents

Aston Publications Explorer

Data Expansion Using WordNet-based Semantic Expansion and Word Disambiguation for Cyberbullying Detection

Author: Jahan Md Saroar
Mohamed Muhidin
Oussalah Mourad
Publication venue
Publication date: 20/06/2022
Field of study

Automatic identification of cyberbullying from textual content is known to be a challenging task. The challenges arise from the inherent structure of cyberbullying and the lack of labeled large-scale corpus, enabling efficient machine-learning-based tools including neural networks. This paper advocates a data augmentation-based approach that could enhance the automatic detection of cyberbullying in social media texts. We use both word sense disambiguation and synonymy relation in WordNet lexical database to generate coherent equivalent utterances of cyberbullying input data. The disambiguation and semantic expansion are intended to overcome the inherent limitations of social media posts, such as an abundance of unstructured constructs and limited semantic content. Besides, to test the feasibility, a novel protocol has been employed to collect cyberbullying traces data from AskFm forum, where about a 10K-size dataset has been manually labeled. Next, the problem of cyberbullying identification is viewed as a binary classification problem using an elaborated data augmentation strategy and an appropriate classifier. For the latter, a Convolutional Neural Network (CNN) architecture with FastText and BERT was put forward, whose results were compared against commonly employed Na¨ıve Bayes (NB) and Logistic Regression (LR) classifiers with and without data augmentation. The research outcomes were promising and yielded almost 98.4% of classifier accuracy, an improvement of more than 4% over baseline results

Aston Publications Explorer

MasakhaNEWS: News Topic Classification for African languages

Author: Ababu Teshome Mulugeta
Abdulganiyu Habiba
Abdulmumin Idris
Adeeko Adetola
Adelani David Ifeoluwa
Adelani Tolulope
Afolabi Abeeb
Ajayi Tunde
al-azzawi sana
Alabi Jesujoba
Aremu Anuoluwapo
Awosan Oyinkansola
Awoyomi Oluwabusayo
Azime Israel Abebe
Chukwuneke Chiamaka
David Davis
Diko Thina
Dossou Bonaventure F. P.
Emezue Chris Chinenye
Fanijo Samuel
Gwadabe Tajuddeen
Hassan Fuad Mire
Johar Abdulmejid
Jules Jules
Kebede Tadesse
Kimanuka Ussen
Kimotho Wangari
Masiak Marek
Mbonu Chinedu
Mehamed Moges Ahmed
Mohamed Muhidin
Mohamed Shafie
Moteu Tatiana
Muhammad Shamsuddeen Hassan
Mukiibi Jonathan
Mwase Christine
Ndolela Lolwethu
Ngabire Evrard
Nigusse Sinodos
Nixdorf Doreen
Nxakama Siyanda
Nyatsine Pamela
Obiefuna Nnaemeka
Odhiambo Brian
Oduwole Mardiyyah
Ogbu Onyekachi
Ogundepo Odunayo
Ojo Jessica
Oladipo Akintunde
Omotayo Abdul-Hakeem
Owodunni Abraham
Sakayo Toadoum Sari
Salahudeen Saheed Abdullahi
Samuel Olanrewaju
Shode Iyanuoluwa
Sibanda Blessing
Sidume Freedmore
Siro Clemencia
Ssenkungu Ivan
Stenetorp Pontus
Taye Mahlet
Tonja Atnafu Lambebo
Tshinu Tshinu
Yigezu Mesay Gemeda
Yousuf Oreen
Publication venue
Publication date: 20/09/2023
Field of study

African languages are severely under-represented in NLP research due to lack of datasets covering several NLP tasks. While there are individual language specific datasets that are being expanded to different tasks, only a handful of NLP tasks (e.g. named entity recognition and machine translation) have standardized benchmark datasets covering several geographical and typologically-diverse African languages. In this paper, we develop MasakhaNEWS -- a new benchmark dataset for news topic classification covering 16 languages widely spoken in Africa. We provide an evaluation of baseline models by training classical machine learning models and fine-tuning several language models. Furthermore, we explore several alternatives to full fine-tuning of language models that are better suited for zero-shot and few-shot learning such as cross-lingual parameter-efficient fine-tuning (like MAD-X), pattern exploiting training (PET), prompting language models (like ChatGPT), and prompt-free sentence transformer fine-tuning (SetFit and Cohere Embedding API). Our evaluation in zero-shot setting shows the potential of prompting ChatGPT for news topic classification in low-resource African languages, achieving an average performance of 70 F1 points without leveraging additional supervision like MAD-X. In few-shot setting, we show that with as little as 10 examples per label, we achieved more than 90\% (i.e. 86.0 F1 points) of the performance of full supervised training (92.6 F1 points) leveraging the PET approach.Comment: Accepted to IJCNLP-AACL 2023 (main conference

arXiv.org e-Print Archive