Search CORE

2 research outputs found

Statistical Learning with Missing Data

Author: Stewart Thomas
Publication venue: University of North Carolina at Chapel Hill Graduate School
Publication date: 01/01/2015
Field of study

Statistical learning is a popular family of data analysis methods which has been successfully employed in biomedical research, the social sciences, public safety applications, and most data dependent areas of research. A major goal of statistical learning methods is to construct rules which predict an outcome y from a set of predictors x, for example, predicting treatment response from a set of pre-treatment biomarkers. Accurate prediction rules of treatment response can guide health care providers to select the best treatment options. The support vector machine (SVM) is a statistical learning method profitably employed in a number of research areas such as biomedical computer vision tasks, drug design, and genetics. Because SVMs admit nonlinear prediction rules, it is a natural choice for analyzing data with potentially complex relationships. One drawback to SVMs is the limited means of handling missing data in the training set, yet missing data is ubiquitous in studies of health-related outcomes. In this research, we review the literature on missing data, and we summarize those scenarios when missing data may bias statistical analysis. We also provide an overview of supervised classification methods, especially those methods which accommodate missing data. We pay special attention to SVMs as this family of methods is the focus of our proposed contributions to this body of work. We propose three methods involving SVMs and missing data. The first paper proposes an EM-based solution for constructing SVMs when the training set includes observations with missing covariates. We present the method for continuous covariates but the method is applicable to discrete covariates as well. The second paper proposes weighting methods inspired by weighted estimating equations, also for the purpose of constructing SVMs when the training set includes observations with missing covariates. The third paper considers scenarios in which class labels are missing or are partially observed, an area of study commonly called semi-supervised learning. We propose an EM-type solution for the semi-supervised learning scenario, and we apply the method to both two-class and multi-class SVMs. In each paper, the proposed methods will be demonstrated in the context of a large multi-center observational study of Hepatitis C patients.Doctor of Philosoph

Carolina Digital Repository

A Semi-supervised SVM Framework for Character Recognition

Author
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref