SVSI: Fast and Powerful Set-Valued System Identification Approach to Identifying Rare Variants in Sequencing Studies for Ordered Categorical Traits: SVSIfor Genetic Association Studies

Bi, Wenjian; Borowitz, Michael J.; Cheng, Cheng; Cui, Yuehua; Hartford, Christine M.; Hunger, Stephen P.; Kang, Guolian; Leung, Wing; Li, Yun; Liu, Zhifa; Pounds, Stanley B.; Pui, Ching-Hon; Relling, Mary V.; Yan, Song; Yang, Jun J.; Zhang, Ji-Feng; Zhao, Yanlong

SVSI: Fast and Powerful Set-Valued System Identification Approach to Identifying Rare Variants in Sequencing Studies for Ordered Categorical Traits: SVSIfor Genetic Association Studies

Authors: Wenjian Bi
Michael J. Borowitz
Cheng Cheng
Yuehua Cui
Christine M. Hartford
Stephen P. Hunger
Guolian Kang
Wing Leung
Yun Li
Zhifa Liu
Stanley B. Pounds
Ching-Hon Pui
Mary V. Relling
Song Yan
Jun J. Yang
Ji-Feng Zhang
Yanlong Zhao
Publication date: 1 January 2015
Publisher
Doi

Abstract

For genetic association studies that involve an ordered categorical phenotype, we usually either regroup multiple categories of the phenotype into two categories (“cases” and “controls”) and then apply the standard logistic regression (LG), or apply ordered logistic (oLG) or ordered probit (oPRB) regression which accounts for the ordinal nature of the phenotype. However, these approaches may lose statistical power or may not control type I error rate due to their model assumption and/or instable parameter estimation algorithm when the genetic variant is rare or sample size is limited. Here to solve this problem, we propose a set-valued (SV) system model, which assumes that an underlying continuous phenotype follows a normal distribution, to identify genetic variants associated with an ordinal categorical phenotype. We couple this model with a set-valued system identification algorithm to identify all the key system parameters. Simulations and two real data analyses show that SV and LG accurately controlled the Type I error rate even at a significance level of 10−6 but not oLG and oPRB in some cases. LG had significantly smaller power than the other three methods due to disregarding of the ordinal nature of the phenotype, and SV had similar or greater power than oLG and oPRB. For instance, in a simulation with data generated from an additive SV model with odds ratio of 7.4 for a phenotype with three categories, a single nucleotide polymorphism with minor allele frequency of 0.75% and sample size of 999 (333 per category), the power of SV, oLG and LG models were 70%, 40% and <1%, respectively, at a significance level of 10−6. Thus, SV should be employed in genetic association studies for ordered categorical phenotype

Similar works

Full text

Open in the Core reader

Download PDF

Available Versions

Carolina Digital Repository

cdr.lib.unc.edu:xk81js10t

Last time updated on 24/11/2020