Search CORE

125 research outputs found

SMOTE for high-dimensional class-imbalanced data

Author: A Fallahi
A Hinneburg
B Wallace
C Bunkhumpornpat
C Cortes
C Drummond
C Sotiriou
CM Bishop
DA Cieslak
E Fix
H Han
H He
J Pittman
J Wang
J Xiao
J Zhu
JV Hulse
K Beyer
KD MacIsaac
L Breiman
L Breiman
Lara Lusa
LD Miller
MA Shipp
N Iizuka
NV Chawla
P Radivojac
Q Gu
R Batuwita
R Blagus
R Development Core Team
R Johnson
R Tibshirani
RM Simon
Rok Blagus
S Daskalaki
S Doyle
S Dudoit
S Ramaswamy
SE Ertekin
T Fawcett
TP Speed
Y Guo
Y Liu
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

CUSBoost: Cluster-based Under-sampling with Boosting for Imbalanced Classification

Author: Ahmed Sajid
Farid Dewan Md.
Jani Md. Rafsan
Mahbub Asif
Rayhan Farshid
Shatabda Swakkhar
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 12/12/2017
Field of study

Class imbalance classification is a challenging research problem in data mining and machine learning, as most of the real-life datasets are often imbalanced in nature. Existing learning algorithms maximise the classification accuracy by correctly classifying the majority class, but misclassify the minority class. However, the minority class instances are representing the concept with greater interest than the majority class instances in real-life applications. Recently, several techniques based on sampling methods (under-sampling of the majority class and over-sampling the minority class), cost-sensitive learning methods, and ensemble learning have been used in the literature for classifying imbalanced datasets. In this paper, we introduce a new clustering-based under-sampling approach with boosting (AdaBoost) algorithm, called CUSBoost, for effective imbalanced classification. The proposed algorithm provides an alternative to RUSBoost (random under-sampling with AdaBoost) and SMOTEBoost (synthetic minority over-sampling with AdaBoost) algorithms. We evaluated the performance of CUSBoost algorithm with the state-of-the-art methods based on ensemble learning like AdaBoost, RUSBoost, SMOTEBoost on 13 imbalance binary and multi-class datasets with various imbalance ratios. The experimental results show that the CUSBoost is a promising and effective approach for dealing with highly imbalanced datasets.Comment: CSITSS-201

arXiv.org e-Print Archive

Crossref

Using RRC Algorithm Classify the Proteins and Visualize in Biological Databases

Author: N. Deepak Kumar, Dr. A. Ramamohan Reddy
Publication venue: Auricle Global Society of Education and Research
Publication date: 30/09/2017
Field of study

Visualize biological database for protein is very complicated without Classify the protein properties.Protein classification is one of the major application of machine learning algorithms in the field of bio-informatics.The searching classification model works in two steps.Firstly, the correlation based feature selection for protein classification will be taken and strongly correlated features will be considered for classification using MST based . In second step, using Robust Regression, the classification will be performed. Based on results of RRC algorithm, it is highly has classification ratio than traditional machine learning algorithms such as SVM, Na�ve-bayes , Decision Trees

International Journal on Future Revolution in Computer Science & Communication Engineering

Using RRC Algorithm Classify the Proteins and Visualize in Biological Databases

Author: N. Deepak Kumar, Dr. A. Ramamohan Reddy
Publication venue: Auricle Global Society of Education and Research
Publication date: 31/10/2017
Field of study

International Journal on Future Revolution in Computer Science & Communication Engineering

A Boosted Machine Learning Framework for the Improvement of Phase and Crystal Structure Prediction of High Entropy Alloys Using Thermodynamic and Configurational Parameters

Author: Chatterjee Arghya
Das Suchandan
Dey Debsundar
Dey Santanu
Pal Anik
Raul Chandan Kumar
Publication venue
Publication date: 02/09/2023
Field of study

The reason behind the remarkable properties of High-Entropy Alloys (HEAs) is rooted in the diverse phases and the crystal structures they contain. In the realm of material informatics, employing machine learning (ML) techniques to classify phases and crystal structures of HEAs has gained considerable significance. In this study, we assembled a new collection of 1345 HEAs with varying compositions to predict phases. Within this collection, there were 705 sets of data that were utilized to predict the crystal structures with the help of thermodynamics and electronic configuration. Our study introduces a methodical framework i.e., the Pearson correlation coefficient that helps in selecting the strongly co-related features to increase the prediction accuracy. This study employed five distinct boosting algorithms to predict phases and crystal structures, offering an enhanced guideline for improving the accuracy of these predictions. Among all these algorithms, XGBoost gives the highest accuracy of prediction (94.05%) for phases and LightGBM gives the highest accuracy of prediction of crystal structure of the phases (90.07%). The quantification of the influence exerted by parameters on the model's accuracy was conducted and a new approach was made to elucidate the contribution of individual parameters in the process of phase prediction and crystal structure prediction

arXiv.org e-Print Archive

Classification of Caesarean Section and Normal Vaginal Deliveries Using Foetal Heart Rate Signals and Advanced Machine Learning Algorithms

Author: A Georgieva
A Pinas
A Sola
A Ugwumadu
Abir Hussain
AL Goldberger
AR Webb
B Chudacek
CK Karmakar
D Silver
De-Shuang Huang
Dhiya Al-Jumeily
DP Williams
E Kreyszig
F Tetschke
G Koop
H Ocak
J Camm
J Hand
J Kessler
J Nahar
J Nahar
J Spilka
J Spilka
J Spilka
JB Warren
L Omo-Aghoja
L Tong
LM Taft
LM Taft
ME Menai
MG Signorini
N Sarkar
N Srivastava
Nizar Bouguila
NV Chawla
P Fergus
P Pinto
PA Warrick
Paul Fergus
PD Welch
PM Granitto
R Blagus
R Blagus
R Blagus
R Brown
R Czabanski
R Mantel
R Vressler
S Schiermeier
T Sun
T Sun
T Sun
TM Khoshgoftaar
V Lopez
W Lin
W Lin
WL Maner
Y Wang
Y Wang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/07/2017
Field of study

ABSTRACT – Background: Visual inspection of Cardiotocography traces by obstetricians and midwives is the gold standard for monitoring the wellbeing of the foetus during antenatal care. However, inter- and intra-observer variability is high with only a 30% positive predictive value for the classification of pathological outcomes. This has a significant negative impact on the perinatal foetus and often results in cardio-pulmonary arrest, brain and vital organ damage, cerebral palsy, hearing, visual and cognitive defects and in severe cases, death. This paper shows that using machine learning and foetal heart rate signals provides direct information about the foetal state and helps to filter the subjective opinions of medical practitioners when used as a decision support tool. The primary aim is to provide a proof-of-concept that demonstrates how machine learning can be used to objectively determine when medical intervention, such as caesarean section, is required and help avoid preventable perinatal deaths. Methodology: This is evidenced using an open dataset that comprises 506 controls (normal virginal deliveries) and 46 cases (caesarean due to pH ≤7.05 and pathological risk). Several machine-learning algorithms are trained, and validated, using binary classifier performance measures. Results: The findings show that deep learning classification achieves Sensitivity = 94%, Specificity = 91%, Area under the Curve = 99%, F-Score = 100%, and Mean Square Error = 1%. Conclusions: The results demonstrate that machine learning significantly improves the efficiency for the detection of caesarean section and normal vaginal deliveries using foetal heart rate signals compared with obstetrician and midwife predictions and systems reported in previous studies

LJMU Research Online (Liverpool John Moores University)

Crossref

Directory of Open Access Journals

ANALYZING THE IMPACT OF RESAMPLING METHOD FOR IMBALANCED DATA TEXT IN INDONESIAN SCIENTIFIC ARTICLES CATEGORIZATION

Author: Afandi Sjaeful
Indrawati Ariani
Sihombing Andre
Subagyo Hendro
Wagiyah Wagiyah
Publication venue: 'Indonesian Institute of Sciences'
Publication date: 11/12/2020
Field of study

The extremely skewed data in artificial intelligence, machine learning, and data mining cases are often given misleading results. It is caused because machine learning algorithms are designated to work best with balanced data. However, we often meet with imbalanced data in the real situation. To handling imbalanced data issues, the most popular technique is resampling the dataset to modify the number of instances in the majority and minority classes into a standard balanced data. Many resampling techniques, oversampling, undersampling, or combined both of them, have been proposed and continue until now. Resampling techniques may increase or decrease the classifier performance. Comparative research on resampling methods in structured data has been widely carried out, but studies that compare resampling methods with unstructured data are very rarely conducted. That raises many questions, one of which is whether this method is applied to unstructured data such as text that has large dimensions and very diverse characters. To understand how different resampling techniques will affect the learning of classifiers for imbalanced data text, we perform an experimental analysis using various resampling methods with several classification algorithms to classify articles at the Indonesian Scientific Journal Database (ISJD). From this experiment, it is known resampling techniques on imbalanced data text generally to improve the classifier performance but they are doesn’t give significant result because data text has very diverse and large dimensions

BACA: JURNAL DOKUMENTASI DAN INFORMASI

Predicting Kereh River's Water Quality: A comparative study of machine learning models

Author: Ahmad Afida
Nasaruddin Norashikin
Osman Mohamed Syazwan
Ul-Saufie Ahmad Zia
Zakaria Shahida Farhan
Publication venue: e-International Publishing House (e-IPH) Limited, UK
Publication date: 19/09/2023
Field of study

This study introduces a machine learning-based approach to forecast the water quality of the Kereh River and categorize it into 'polluted' or 'slightly polluted' classifications. This work employed three machine learning algorithms: decision tree, random forests (RF), and boosted regression tree, leveraging data spanning from 2010 to 2019. Through comparative analysis, the RF model emerged as the most efficient, boasting an accuracy of 97.30%, sensitivity of 100.00%, specificity of 94.74%, and precision of 95.00%. Notably, the RF model identified dissolved oxygen (DO) as the paramount variable influencing water quality predictions. Keywords: Water quality; machine learning; decision tree; random forest eISSN: 2398-4287 © 2023. The Authors. Published for AMER and cE-Bs by e-International Publishing House, Ltd., UK. This is an open-access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/). Peer–review under the responsibility of AMER (Association of Malaysian Environment-Behaviour Researchers), and cE-Bs (Centre for Environment-Behaviour Studies), College of Built Environment, Universiti Teknologi MARA, Malaysia DOI: https://doi.org/10.21834/e-bpj.v8iSI15.509

Environment-Behaviour Proceedings Journal (E-BPJ)