An Investigation of Missing Data Methods for Classiffcation Trees

Ding, Yufeng; Simonoff, Jeffrey S.

research

An Investigation of Missing Data Methods for Classiffcation Trees

Authors: Yufeng Ding
Jeffrey S. Simonoff
Publication date: 3 December 2006
Publisher: Stern School of Business, New York University

Abstract

There are many different missing data methods used by classification tree algorithms, but few studies have been done comparing their appropriateness and performance. This paper provides both analytic and Monte Carlo evidence regarding the effectiveness of six popular missing data methods for classification trees. We show that in the context of classification trees, the relationship between the missingness and the dependent variable, rather than the standard missingness classification approach of Little and Rubin (2002) (missing completely at random (MCAR), missing at random (MAR) and not missing at random (NMAR)), is the most helpful criterion to distinguish different missing data methods. We make recommendations as to the best method to use in various situations. The paper concludes with discussion of a real data set related to predicting bankruptcy of a firm.Statistics Working Papers Serie

Similar works

Full text

Open in the Core reader

Download PDF

Available Versions

New York University Faculty Digital Archive

http://hdl.handle.net/2451/263...

Last time updated on 03/08/2016

DSpace at New York University

Last time updated on 07/01/2019