Article thumbnail

Imputing Missing Genotypes with Weighted k Nearest

By Holger Schwender and Katja Ickstadt

Abstract

Motivation: Missing values are a common problem in genetic association studies concerned with single nucleotide polymorphisms (SNPs). Since most statistical methods cannot handle missing values, they have to be removed prior to the actual analysis. Considering only complete observations, however, often leads to an immense loss of information. Therefore, procedures are needed that can be used to replace such missing values. In this article, we propose a method based on weighted k nearest neighbors that can be employed for imputing such missing genotypes. Results: In a comparison to other imputation approaches, our procedure called KN-NcatImpute shows the lowest rates of falsely imputed genotypes when applied to the SNP data from the GENICA study, a study dedicated to the identification of genetic and gene-environment interactions associated with sporadic breast cancer. Moreover, in contrast to other imputation methods that take all variables into account when replacing missing values of a particular variable, KNNcatImpute is not restricted to association studies comprising several ten to a few hundred SNPs, but can also be applied to data from whole-genome studies, as an application to a subset of the HapMap data shows. Availability: KNNcatImpute is implemented in the R package scrime that can be downloaded fro

Year: 2013
OAI identifier: oai:CiteSeerX.psu:10.1.1.324.4287
Provided by: CiteSeerX

Suggested articles


To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.