Search CORE

2 research outputs found

Effect of Imputation Methods in the Classifier Performance

Author: Cihan Pınar
Gökçe Erhan
Kalıpsız Oya
Publication venue: 'Sakarya University Journal of Science'
Publication date: 01/01/2019
Field of study

Missing values in a dataset present an important problem for almost any traditional and modernstatistical method since most of these methods were developed under the assumption that thedataset was complete. However, in the real world, no complete datasets are available and theissue of missing data is frequently encountered in veterinary field studies as in other fields.While the imputation of missing data is important in veterinary field studies where data miningis newly starting to be implemented, another important issue is how it should be imputed. Thisis because in many studies observations with any variables having missing values are beingremoved or they are completed by traditional methods. In recent years, while alternativeapproaches are widely available to prevent the removal of observations with missing values,they are being used rarely. The aim of this study is to examine mean, median, nearest neighbors,MICE and missForest methods to impute the simulated missing data which is the randomlyremoved with varying frequencies (5 to 25% by 5%) from the original veterinary dataset. Thenhighly accurate methods selected to impute the original dataset for observation of influence inclassifier performance and to determine the optimal imputation method for the original dataset

Namik Kemal University Institutional Repository

Processing of missing data by neural networks

Author: Smieja Marek
Spurek Przemysław
Struski Łukasz
Tabor Jacek
Zieliński Bartosz
Publication venue
Publication date: 01/01/2018
Field of study

We propose a general, theoretically justified mechanism for processing missing data by neural networks. Our idea is to replace typical neuron's response in the first hidden layer by its expected value. This approach can be applied for various types of networks at minimal cost in their modification. Moreover, in contrast to recent approaches, it does not require complete data for training. Experimental results performed on different types of architectures show that our method gives better results than typical imputation strategies and other methods dedicated for incomplete data

arXiv.org e-Print Archive

Jagiellonian Univeristy Repository