Generating Synthetic Missing Data: A Review by Missing Mechanism

Abreu, Pedro Henriques; Costa, Adriana Fonseca; Pereira, Ricardo Cardoso; Santos, Joao; Santos, Miriam Seoane; Soares, Jastin Pompeu

Generating Synthetic Missing Data: A Review by Missing Mechanism

Authors: Pedro Henriques Abreu
Adriana Fonseca Costa
Ricardo Cardoso Pereira
Joao Santos
Miriam Seoane Santos
Jastin Pompeu Soares
Publication date: 1 January 2019
Publisher: 'Institute of Electrical and Electronics Engineers (IEEE)'
Doi

Abstract

The performance evaluation of imputation algorithms often involves the generation of missing values. Missing values can be inserted in only one feature (univariate con guration) or in several features (multivariate con guration) at different percentages (missing rates) and according to distinct missing mechanisms, namely, missing completely at random, missing at random, and missing not at random. Since the missing data generation process de nes the basis for the imputation experiments (con guration, missing rate, and missing mechanism), it is essential that it is appropriately applied; otherwise, conclusions derived from ill-de ned setups may be invalid. The goal of this paper is to review the different approaches to synthetic missing data generation found in the literature and discuss their practical details, elaborating on their strengths and weaknesses. Our analysis revealed that creating missing at random and missing not at random scenarios in datasets comprising qualitative features is the most challenging issue in the related work and, therefore, should be the focus of future work in the field

Similar works

Full text

Available Versions

Estudo Geral

oai:estudogeral.uc.pt:10316/10...

Last time updated on 13/05/2023