1,779 research outputs found
A geometric interpretation of the permutation -value and its application in eQTL studies
Permutation -values have been widely used to assess the significance of
linkage or association in genetic studies. However, the application in
large-scale studies is hindered by a heavy computational burden. We propose a
geometric interpretation of permutation -values, and based on this geometric
interpretation, we develop an efficient permutation -value estimation method
in the context of regression with binary predictors. An application to a study
of gene expression quantitative trait loci (eQTL) shows that our method
provides reliable estimates of permutation -values while requiring less than
5% of the computational time compared with direct permutations. In fact, our
method takes a constant time to estimate permutation -values, no matter how
small the -value. Our method enables a study of the relationship between
nominal -values and permutation -values in a wide range, and provides a
geometric perspective on the effective number of independent tests.Comment: Published in at http://dx.doi.org/10.1214/09-AOAS298 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Consistent Testing for Recurrent Genomic Aberrations
Genomic aberrations, such as somatic copy number alterations, are frequently
observed in tumor tissue. Recurrent aberrations, occurring in the same region
across multiple subjects, are of interest because they may highlight genes
associated with tumor development or progression. A number of tools have been
proposed to assess the statistical significance of recurrent DNA copy number
aberrations, but their statistical properties have not been carefully studied.
Cyclic shift testing, a permutation procedure using independent random shifts
of genomic marker observations on the genome, has been proposed to identify
recurrent aberrations, and is potentially useful for a wider variety of
purposes, including identifying regions with methylation aberrations or
overrepresented in disease association studies. For data following a
countable-state Markov model, we prove the asymptotic validity of cyclic shift
-values under a fixed sample size regime as the number of observed markers
tends to infinity. We illustrate cyclic shift testing for a variety of data
types, producing biologically relevant findings for three publicly available
datasets.Comment: 35 pages, 7 figure
A statistical framework for testing functional categories in microarray data
Ready access to emerging databases of gene annotation and functional pathways
has shifted assessments of differential expression in DNA microarray studies
from single genes to groups of genes with shared biological function. This
paper takes a critical look at existing methods for assessing the differential
expression of a group of genes (functional category), and provides some
suggestions for improved performance. We begin by presenting a general
framework, in which the set of genes in a functional category is compared to
the complementary set of genes on the array. The framework includes tests for
overrepresentation of a category within a list of significant genes, and
methods that consider continuous measures of differential expression. Existing
tests are divided into two classes. Class 1 tests assume gene-specific measures
of differential expression are independent, despite overwhelming evidence of
positive correlation. Analytic and simulated results are presented that
demonstrate Class 1 tests are strongly anti-conservative in practice. Class 2
tests account for gene correlation, typically through array permutation that by
construction has proper Type I error control for the induced null. However,
both Class 1 and Class 2 tests use a null hypothesis that all genes have the
same degree of differential expression. We introduce a more sensible and
general (Class 3) null under which the profile of differential expression is
the same within the category and complement. Under this broader null, Class 2
tests are shown to be conservative. We propose standard bootstrap methods for
testing against the Class 3 null and demonstrate they provide valid Type I
error control and more power than array permutation in simulated datasets and
real microarray experiments.Comment: Published in at http://dx.doi.org/10.1214/07-AOAS146 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
An Empirical Bayes Approach for Multiple Tissue eQTL Analysis
Expression quantitative trait loci (eQTL) analyses, which identify genetic
markers associated with the expression of a gene, are an important tool in the
understanding of diseases in human and other populations. While most eQTL
studies to date consider the connection between genetic variation and
expression in a single tissue, complex, multi-tissue data sets are now being
generated by the GTEx initiative. These data sets have the potential to improve
the findings of single tissue analyses by borrowing strength across tissues,
and the potential to elucidate the genotypic basis of differences between
tissues.
In this paper we introduce and study a multivariate hierarchical Bayesian
model (MT-eQTL) for multi-tissue eQTL analysis. MT-eQTL directly models the
vector of correlations between expression and genotype across tissues. It
explicitly captures patterns of variation in the presence or absence of eQTLs,
as well as the heterogeneity of effect sizes across tissues. Moreover, the
model is applicable to complex designs in which the set of donors can (i) vary
from tissue to tissue, and (ii) exhibit incomplete overlap between tissues. The
MT-eQTL model is marginally consistent, in the sense that the model for a
subset of tissues can be obtained from the full model via marginalization.
Fitting of the MT-eQTL model is carried out via empirical Bayes, using an
approximate EM algorithm. Inferences concerning eQTL detection and the
configuration of eQTLs across tissues are derived from adaptive thresholding of
local false discovery rates, and maximum a-posteriori estimation, respectively.
We investigate the MT-eQTL model through a simulation study, and rigorously
establish the FDR control of the local FDR testing procedure under mild
assumptions appropriate for dependent data.Comment: accepted by Biostatistic
Two Jupiter-Mass Planets Orbiting HD 154672 and HD 205739
We report the detection of the first two planets from the N2K Doppler planet
search program at the Magellan telescopes. The first planet has a mass of M sin
i = 4.96 M_Jup and is orbiting the G3 IV star HD154672 with an orbital period
of 163.9 days. The second planet is orbiting the F7 V star HD205739 with an
orbital period of 279.8 days and has a mass of M sin i = 1.37 M_Jup. Both
planets are in eccentric orbits, with eccentricities e = 0.61 and e = 0.27,
respectively. Both stars are metal rich and appear to be chromospherically
inactive, based on inspection of their Ca II H and K lines. Finally, the best
Keplerian model fit to HD205739b shows a trend of 0.0649 m/s/day, suggesting
the presence of an additional outer body in that system.Comment: 16 pages, 5 figures, accepted for publication on A
Computational tools for discovery and interpretation of expression quantitative trait loci
Expression quantitative trait locus (eQTL) analysis is rapidly moving from a cutting-edge concept in genomics to a mature area of investigation, with important connections to genome-wide association studies for human disease, pharmacogenomics and toxicogenomics. Despite the importance of the topic, many investigators must develop their own code or use tools not specifically suited for eQTL analysis. Convenient computational tools are becoming available, but they are not widely publicized, and investigators who are interested in discovery or eQTL, or in using them to interpret genome-wide association study results may have difficulty navigating the available resources. The purpose of this review is to help investigators find appropriate programs for eQTL analysis and interpretation
Estimating Odds Ratios in Genome Scans: An Approximate Conditional Likelihood Approach
In modern whole-genome scans, the use of stringent thresholds to control the genome-wide testing error distorts the estimation process, producing estimated effect sizes that may be on average far greater in magnitude than the true effect sizes. We introduce a method, based on the estimate of genetic effect and its standard error as reported by standard statistical software, to correct for this bias in case-control association studies. Our approach is widely applicable, is far easier to implement than competing approaches, and may often be applied to published studies without access to the original data. We evaluate the performance of our approach via extensive simulations for a range of genetic models, minor allele frequencies, and genetic effect sizes. Compared to the naive estimation procedure, our approach reduces the bias and the mean squared error, especially for modest effect sizes. We also develop a principled method to construct confidence intervals for the genetic effect that acknowledges the conditioning on statistical significance. Our approach is described in the specific context of odds ratios and logistic modeling but is more widely applicable. Application to recently published data sets demonstrates the relevance of our approach to modern genome scans
Convergence of sample eigenvalues, eigenvectors, and principal component scores for ultra-high dimensional data
The development of high-throughput biomedical technologies has led to increased interest in the analysis of high-dimensional data where the number of features is much larger than the sample size. In this paper, we investigate principal component analysis under the ultra-high dimensional regime, where both the number of features and the sample size increase as the ratio of the two quantities also increases. We bridge the existing results from the finite and the high-dimension low sample size regimes, embedding the two regimes in a more general framework. We also numerically demonstrate the universal application of the results from the finite regime
- …