Skip to main content
Article thumbnail
Location of Repository

Missing Data Problems in Machine Learning

By Benjamin M. Marlin

Abstract

Learning, inference, and prediction in the presence of missing data are pervasive problems in machine learning and statistical data analysis. This thesis focuses on the problems of collaborative prediction with non-random missing data and classification with missing features. We begin by presenting and elaborating on the theory of missing data due to Little and Rubin. We place a particular emphasis on the missing at random assumption in the multivariate setting with arbitrary patterns of missing data. We derive inference and prediction methods in the presence of random missing data for a variety of probabilistic models including finite mixture models, Dirichlet process mixture models, and factor analysis. Based on this foundation, we develop several novel models and inference procedures for both the collaborative prediction problem and the problem of classification with missing features. We develop models and methods for collaborative prediction with non-random missing data by combining standard models for complete data with models of the missing data process. Using a novel recommender system data set and experimental protocol, we show that each proposed method achieves a substantial increase in rating prediction performance compared to models that assume missing ratings are missing at random

Year: 2008
OAI identifier: oai:CiteSeerX.psu:10.1.1.160.8408
Provided by: CiteSeerX
Download PDF:
Sorry, we are unable to provide the full text but you may find it at the following location(s):
  • http://citeseerx.ist.psu.edu/v... (external link)
  • http://www.cs.ubc.ca/~bmarlin/... (external link)
  • Suggested articles


    To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.