SVSI: Fast and Powerful Set-Valued System Identification Approach to Identifying Rare Variants in Sequencing Studies for Ordered Categorical Traits: SVSIfor Genetic Association Studies

Abstract

For genetic association studies that involve an ordered categorical phenotype, we usually either regroup multiple categories of the phenotype into two categories (“cases” and “controls”) and then apply the standard logistic regression (LG), or apply ordered logistic (oLG) or ordered probit (oPRB) regression which accounts for the ordinal nature of the phenotype. However, these approaches may lose statistical power or may not control type I error rate due to their model assumption and/or instable parameter estimation algorithm when the genetic variant is rare or sample size is limited. Here to solve this problem, we propose a set-valued (SV) system model, which assumes that an underlying continuous phenotype follows a normal distribution, to identify genetic variants associated with an ordinal categorical phenotype. We couple this model with a set-valued system identification algorithm to identify all the key system parameters. Simulations and two real data analyses show that SV and LG accurately controlled the Type I error rate even at a significance level of 10−6 but not oLG and oPRB in some cases. LG had significantly smaller power than the other three methods due to disregarding of the ordinal nature of the phenotype, and SV had similar or greater power than oLG and oPRB. For instance, in a simulation with data generated from an additive SV model with odds ratio of 7.4 for a phenotype with three categories, a single nucleotide polymorphism with minor allele frequency of 0.75% and sample size of 999 (333 per category), the power of SV, oLG and LG models were 70%, 40% and <1%, respectively, at a significance level of 10−6. Thus, SV should be employed in genetic association studies for ordered categorical phenotype

    Similar works