99 research outputs found
Robust Tests in Genome-Wide Scans under Incomplete Linkage Disequilibrium
Under complete linkage disequilibrium (LD), robust tests often have greater
power than Pearson's chi-square test and trend tests for the analysis of
case-control genetic association studies. Robust statistics have been used in
candidate-gene and genome-wide association studies (GWAS) when the genetic
model is unknown. We consider here a more general incomplete LD model, and
examine the impact of penetrances at the marker locus when the genetic models
are defined at the disease locus. Robust statistics are then reviewed and their
efficiency and robustness are compared through simulations in GWAS of 300,000
markers under the incomplete LD model. Applications of several robust tests to
the Wellcome Trust Case-Control Consortium [Nature 447 (2007) 661--678] are
presented.Comment: Published in at http://dx.doi.org/10.1214/09-STS314 the Statistical
Science (http://www.imstat.org/sts/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Bayesian Prediction Intervals for Assessing \u3cem\u3eP\u3c/em\u3e-Value Variability in Prospective Replication Studies
Increased availability of data and accessibility of computational tools in recent years have created an unprecedented upsurge of scientific studies driven by statistical analysis. Limitations inherent to statistics impose constraints on the reliability of conclusions drawn from data, so misuse of statistical methods is a growing concern. Hypothesis and significance testing, and the accompanying P-values are being scrutinized as representing the most widely applied and abused practices. One line of critique is that P-values are inherently unfit to fulfill their ostensible role as measures of credibility for scientific hypotheses. It has also been suggested that while P-values may have their role as summary measures of effect, researchers underappreciate the degree of randomness in the P-value. High variability of P-values would suggest that having obtained a small P-value in one study, one is, nevertheless, still likely to obtain a much larger P-value in a similarly powered replication study. Thus, “replicability of P-value” is in itself questionable. To characterize P-value variability, one can use prediction intervals whose endpoints reflect the likely spread of P-values that could have been obtained by a replication study. Unfortunately, the intervals currently in use, the frequentist P-intervals, are based on unrealistic implicit assumptions. Namely, P-intervals are constructed with the assumptions that imply substantial chances of encountering large values of effect size in an observational study, which leads to bias. The long-run frequentist probability provided by P-intervals is similar in interpretation to that of the classical confidence intervals, but the endpoints of any particular interval lack interpretation as probabilistic bounds for the possible spread of future P-values that may have been obtained in replication studies. Along with classical frequentist intervals, there exists a Bayesian viewpoint toward interval construction in which the endpoints of an interval have a meaningful probabilistic interpretation. We propose Bayesian intervals for prediction of P-value variability in prospective replication studies. Contingent upon approximate prior knowledge of the effect size distribution, our proposed Bayesian intervals have endpoints that are directly interpretable as probabilistic bounds for replication P-values, and they are resistant to selection bias. We showcase our approach by its application to P-values reported for five psychiatric disorders by the Psychiatric Genomics Consortium group
Interval estimation of genetic susceptibility for retrospective case-control studies
BACKGROUND: This article describes classical and Bayesian interval estimation of genetic susceptibility based on random samples with pre-specified numbers of unrelated cases and controls. RESULTS: Frequencies of genotypes in cases and controls can be estimated directly from retrospective case-control data. On the other hand, genetic susceptibility defined as the expected proportion of cases among individuals with a particular genotype depends on the population proportion of cases (prevalence). Given this design, prevalence is an external parameter and hence the susceptibility cannot be estimated based on only the observed data. Interval estimation of susceptibility that can incorporate uncertainty in prevalence values is explored from both classical and Bayesian perspective. Similarity between classical and Bayesian interval estimates in terms of frequentist coverage probabilities for this problem allows an appealing interpretation of classical intervals as bounds for genetic susceptibility. In addition, it is observed that both the asymptotic classical and Bayesian interval estimates have comparable average length. These interval estimates serve as a very good approximation to the "exact" (finite sample) Bayesian interval estimates. Extension from genotypic to allelic susceptibility intervals shows dependency on phenotype-induced deviations from Hardy-Weinberg equilibrium. CONCLUSIONS: The suggested classical and Bayesian interval estimates appear to perform reasonably well. Generally, the use of exact Bayesian interval estimation method is recommended for genetic susceptibility, however the asymptotic classical and approximate Bayesian methods are adequate for sample sizes of at least 50 cases and controls
Pleiotropic Effects of CSF Levels of Alzheimer’s Disease Proteins
Cerebrospinal fluid (CSF) analytes harbor potential as diagnostic biomarkers for Alzheimer’s Disease (AD). Quantitative measures of CSF proteins comprise a set of often highly correlated endophenotypes that have previously shown promise in genetic analyses (Cruchaga et al., 2013; Kauwe et al., 2014). Pleiotropic impact of genetic variations on this set may provide additional insights into AD pathology at its earliest stages. To determine which specific endophenotypes are pleiotropic, one can employ methods based on the reverse regression of genotype on phenotypes. Recently, we proposed a method based functional linear models (Vsevolozhskaya et al, 2016) that utilizes reverse regression and simultaneously evaluates all variants within a genetic region for an association with multiple correlated phenotypes. Here we apply our novel methodology to explore pleiotropic effects of CSF analtyes using Alzheimer\u27s Disease Neuroimaging Initiative (ADNI) data
Haplotype associations with quantitative traits in the presence of complex multilocus and heterogeneous effects
In genetic mapping of complex traits, scored haplotypes are likely to represent only a subset of all causal polymorphisms. At the extreme of this scenario, observed polymorphisms are not themselves functional, and only linked to causal ones via linkage disequilibrium (LD). We will demonstrate that due to such incomplete knowledge regarding the underlying genetic mechanism, the variance of a trait may become different between the scored haplotypes. Thus, unequal variances between haplotypes may be indicative of additional functional polymorphisms affecting the trait. Methods accounting for such haplotype-specific variance may also provide an increased power to detect complex associations. We suggest ways to estimate and test these haplotypic variance contrasts, and incorporate them into the haplotypic tests for association. We further extend this approach to data with unknown gametic phase via likelihood-based simultaneous estimation of haplotypic effects and their frequencies. We find our approach to provide additional power, especially under the following types of models: (a) where scored and unobserved variants are epistatically interacting with each other; and (b) under heterogeneity models, where multiple unobserved mutations are linked to nonfunctional observed polymorphisms via LD. An illustrative example of usefulness of the method is discussed, utilizing analysis of multilocus effects within the catechol-O-methyl transferase (COMT) gene
Uncovering Local Trends in Genetic Effects of Multiple Phenotypes via Functional Linear Models
Recent technological advances equipped researchers with capabilities that go beyond traditional genotyping of loci known to be polymorphic in a general population. Genetic sequences of study participants can now be assessed directly. This capability removed technology-driven bias toward scoring predominantly common polymorphisms and let researchers reveal a wealth of rare and sample-specific variants. Although the relative contributions of rare and common polymorphisms to trait variation are being debated, researchers are faced with the need for new statistical tools for simultaneous evaluation of all variants within a region. Several research groups demonstrated flexibility and good statistical power of the functional linear model approach. In this work we extend previous developments to allow inclusion of multiple traits and adjustment for additional covariates. Our functional approach is unique in that it provides a nuanced depiction of effects and interactions for the variables in the model by representing them as curves varying over a genetic region. We demonstrate flexibility and competitive power of our approach by contrasting its performance with commonly used statistical tools and illustrate its potential for discovery and characterization of genetic architecture of complex traits using sequencing data from the Dallas Heart Study
Multi-ethnic GWAS and meta-analysis of sleep quality identify MPP6 as a novel gene that functions in sleep center neurons
Poor sleep quality can have harmful health consequences. Although many aspects of sleep are heritable, the understandings of genetic factors involved in its physiology remain limited. Here, we performed a genome-wide association study (GWAS) using the Pittsburgh Sleep Quality Index (PSQI) in a multi-ethnic discovery cohort (n = 2868) and found two novel genome-wide loci on chromosomes 2 and 7 associated with global sleep quality. A meta-analysis in 12 independent cohorts (100 000 individuals) replicated the association on chromosome 7 between NPY and MPP6. While NPY is an important sleep gene, we tested for an independent functional role of MPP6. Expression data showed an association of this locus with both NPY and MPP6 mRNA levels in brain tissues. Moreover, knockdown of an orthologue of MPP6 in Drosophila melanogaster sleep center neurons resulted in decreased sleep duration. With convergent evidence, we describe a new locus impacting human variability in sleep quality through known NPY and novel MPP6 sleep genes.Peer reviewe
- …