Search CORE

7,273 research outputs found

Supervised dimensionality reduction for multiple imputation by chained equations

Author: Costantini Edoardo
Lang Kyle M.
Sijtsma Klaas
Publication venue
Publication date: 04/09/2023
Field of study

Multivariate imputation by chained equations (MICE) is one of the most popular approaches to address missing values in a data set. This approach requires specifying a univariate imputation model for every variable under imputation. The specification of which predictors should be included in these univariate imputation models can be a daunting task. Principal component analysis (PCA) can simplify this process by replacing all of the potential imputation model predictors with a few components summarizing their variance. In this article, we extend the use of PCA with MICE to include a supervised aspect whereby information from the variables under imputation is incorporated into the principal component estimation. We conducted an extensive simulation study to assess the statistical properties of MICE with different versions of supervised dimensionality reduction and we compared them with the use of classical unsupervised PCA as a simpler dimensionality reduction technique

arXiv.org e-Print Archive

Multiclass support vector machines for classification of ECG data with missing values

Author: Abdul Aziz Ahmad Fazli
Hashim Shaiful Jahari
Hejazi Maryamsadat
Singh Yashwant Prasad
Syed Mohamed Syed Abdul Rahman Al-Haddad
Publication venue: 'Informa UK Limited'
Publication date: 01/01/2015
Field of study

The article presents an experimental study on multiclass Support Vector Machine (SVM) methods over a cardiac arrhythmia dataset that has missing attribute values for electrocardiogram (ECG) diagnostic application. The presence of an incomplete dataset and high data dimensionality can affect the performance of classifiers. Imputation of missing data and discriminant analysis are commonly used as preprocessing techniques in such large datasets. The article proposes experiments to evaluate performance of One-Against-All (OAA) and One-Against-One (OAO) approaches in kernel multiclass SVM for a heartbeat classification problem with imputation and dimension reduction techniques. The results indicate that the OAA approach has superiority over OAO in multiclass SVM for ECG data analysis with missing values

Crossref

Universiti Putra Malaysia Institutional Repository

Multiclass support vector machines for classification of ECG data with missing values

Author: Hejazi Maryamsadat
Syed Mohamed Syed Abdul Rahman Al-Haddad
Singh Yashwant Prasad
Hashim Shaiful Jahari
Abdul Aziz Ahmad Fazli
Publication venue: Taylor & Francis
Publication date: 01/02/1993
Field of study

Universiti Putra Malaysia Institutional Repository

Kanazawa University Repository for Academic Resources

Improved k-means clustering using principal component analysis and imputation methods for breast cancer dataset

Author: Armina Roslan
Publication venue
Publication date: 01/01/2018
Field of study

Data mining techniques have been used to analyse pattern from data sets in order to derive useful information. Classification of data sets into clusters is one of the essential process for data manipulation. One of the most popular and efficient clustering methods is K-means method. However, the K-means clustering method has some difficulties in the analysis of high dimension data sets with the presence of missing values. Moreover, previous studies showed that high dimensionality of the feature in data set presented poses different problems for K-means clustering. For missing value problem, imputation method is needed to minimise the effect of incomplete high dimensional data sets in K-means clustering process. This research studies the effect of imputation algorithm and dimensionality reduction techniques on the performance of K-means clustering. Three imputation methods are implemented for the missing value estimation which are K-nearest neighbours (KNN), Least Local Square (LLS), and Bayesian Principle Component Analysis (BPCA). Principal Component Analysis (PCA) is a dimension reduction method that has a dimensional reduction capability by removing the unnecessary attribute of high dimensional data sets. Hence, PCA hybrid with K-means (PCA K-means) is proposed to give a better clustering result. The experimental process was performed by using Wisconsin Breast Cancer. By using LLS imputation method, the proposed hybrid PCA K-means outperformed the standard Kmeans clustering based on the results for breast cancer data set; in terms of clustering accuracy (0.29%) and computing time (95.76%)

Universiti Teknologi Malaysia Institutional Repository