26 research outputs found

    SVSI: Fast and Powerful Set-Valued System Identification Approach to Identifying Rare Variants in Sequencing Studies for Ordered Categorical Traits: SVSIfor Genetic Association Studies

    Get PDF
    For genetic association studies that involve an ordered categorical phenotype, we usually either regroup multiple categories of the phenotype into two categories (“cases” and “controls”) and then apply the standard logistic regression (LG), or apply ordered logistic (oLG) or ordered probit (oPRB) regression which accounts for the ordinal nature of the phenotype. However, these approaches may lose statistical power or may not control type I error rate due to their model assumption and/or instable parameter estimation algorithm when the genetic variant is rare or sample size is limited. Here to solve this problem, we propose a set-valued (SV) system model, which assumes that an underlying continuous phenotype follows a normal distribution, to identify genetic variants associated with an ordinal categorical phenotype. We couple this model with a set-valued system identification algorithm to identify all the key system parameters. Simulations and two real data analyses show that SV and LG accurately controlled the Type I error rate even at a significance level of 10−6 but not oLG and oPRB in some cases. LG had significantly smaller power than the other three methods due to disregarding of the ordinal nature of the phenotype, and SV had similar or greater power than oLG and oPRB. For instance, in a simulation with data generated from an additive SV model with odds ratio of 7.4 for a phenotype with three categories, a single nucleotide polymorphism with minor allele frequency of 0.75% and sample size of 999 (333 per category), the power of SV, oLG and LG models were 70%, 40% and <1%, respectively, at a significance level of 10−6. Thus, SV should be employed in genetic association studies for ordered categorical phenotype

    Statistical Significance Threshold Criteria For Analysis of Microarray Gene Expression Data

    No full text
    The methodological advancement in microarray data analysis on the basis of false discovery rate (FDR) control, such as the q-value plots, allows the investigator to examine the FDR from several perspectives. However, when FDR control at the ``customary" levels 0.01, 0.05, or 0.1 does not provide fruitful findings, there is little guidance for making the trade off between the significance threshold and the FDR level by sound statistical or biological considerations. Thus, meaningful statistical significance criteria that complement the existing FDR methods for large-scale multiple tests are desirable. Three statistical significance criteria, the profile information criterion, the total error proportion, and the guide-gene driven selection, are developed in this research. The first two are general significance threshold criteria for large-scale multiple tests; the profile information criterion is related to the recent theoretical studies of the connection between FDR control and minimax estimation, and the total error proportion is closely related to the asymptotic properties of FDR control in terms of the total error risk. The guide-gene driven selection is an approach to combining statistical significance and the existing biological knowledge of the study at hand. Error properties of these criteria are investigated theoretically and by simulation. The proposed methods are illustrated and compared using an example of genomic screening for novel Arf gene targets. Operating characteristics of q-value and the proposed significance threshold criteria are investigated and compared in a simulation study that employs a model mimicking a gene regulatory pathway. A guideline for using these criteria is provided. Splus/R code is available from the corresponding author upon request.
    corecore