104,605 research outputs found
Multiple testing for SNP-SNP interactions
Most genetic diseases are complex, i.e. associated to combinations of SNPs rather than individual SNPs. In the last few years, this topic has often been addressed in terms of SNP-SNP interaction patterns given as expressions linked by logical operators. Methods for multiple testing in high-dimensional settings can be applied when many SNPs are considered simultaneously. However, another less well-known multiple testing problem arises within a fixed subset of SNPs when the logic expression is chosen optimally. In this article, we propose a general asymptotic approach for deriving the distribution of the maximally selected chi-square statistic in various situations. We show how this result can be used for testing logic expressions - in particular SNP-SNP interaction patterns - while controlling for multiple comparisons. Simulations show that our method provides multiple testing adjustment when the logic expression is chosen such as to maximize the statistic. Its benefit is demonstrated through an application to a real
dataset from a large population-based study considering allergy and asthma in KORA. An implementation of our method is available from the Comprehensive R Archive Network (CRAN) as R package 'SNPmaxsel'
Bayesian Model Comparison in Genetic Association Analysis: Linear Mixed Modeling and SNP Set Testing
We consider the problems of hypothesis testing and model comparison under a
flexible Bayesian linear regression model whose formulation is closely
connected with the linear mixed effect model and the parametric models for SNP
set analysis in genetic association studies. We derive a class of analytic
approximate Bayes factors and illustrate their connections with a variety of
frequentist test statistics, including the Wald statistic and the variance
component score statistic. Taking advantage of Bayesian model averaging and
hierarchical modeling, we demonstrate some distinct advantages and
flexibilities in the approaches utilizing the derived Bayes factors in the
context of genetic association studies. We demonstrate our proposed methods
using real or simulated numerical examples in applications of single SNP
association testing, multi-locus fine-mapping and SNP set association testing
Simultaneous SNP identification in association studies with missing data
Association testing aims to discover the underlying relationship between
genotypes (usually Single Nucleotide Polymorphisms, or SNPs) and phenotypes
(attributes, or traits). The typically large data sets used in association
testing often contain missing values. Standard statistical methods either
impute the missing values using relatively simple assumptions, or delete them,
or both, which can generate biased results. Here we describe the Bayesian
hierarchical model BAMD (Bayesian Association with Missing Data). BAMD is a
Gibbs sampler, in which missing values are multiply imputed based upon all of
the available information in the data set. We estimate the parameters and prove
that updating one SNP at each iteration preserves the ergodic property of the
Markov chain, and at the same time improves computational speed. We also
implement a model selection option in BAMD, which enables potential detection
of SNP interactions. Simulations show that unbiased estimates of SNP effects
are recovered with missing genotype data. Also, we validate associations
between SNPs and a carbon isotope discrimination phenotype that were previously
reported using a family based method, and discover an additional SNP associated
with the trait. BAMD is available as an R-package from
http://cran.r-project.org/package=BAMDComment: Published in at http://dx.doi.org/10.1214/11-AOAS516 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Uniform Convergence Rate of the SNP Density Estimator and Testing for Similarity of Two Unknown Densities
This paper studies the uniform convergence rate of the turncated SNP (semi-nonparametric) density estimator. Using the uniform convergence rate result we obtain, we propose a test statistic testing the equivalence of two unknown densities where two densities are estimated using the SNP estimator and supports of densities are possibly unbounded.SNP Density Estimator, Uniform Convergence Rate, Comparison of Two Densities
Simultaneous Selection of Multiple Important Single Nucleotide Polymorphisms in Familial Genome Wide Association Studies Data
We propose a resampling-based fast variable selection technique for selecting
important Single Nucleotide Polymorphisms (SNP) in multi-marker mixed effect
models used in twin studies. Due to computational complexity, current practice
includes testing the effect of one SNP at a time, commonly termed as `single
SNP association analysis'. Joint modeling of genetic variants within a gene or
pathway may have better power to detect the relevant genetic variants, hence we
adapt our recently proposed framework of -values to address this. In this
paper, we propose a computationally efficient approach for single SNP detection
in families while utilizing information on multiple SNPs simultaneously. We
achieve this through improvements in two aspects. First, unlike other model
selection techniques, our method only requires training a model with all
possible predictors. Second, we utilize a fast and scalable bootstrap procedure
that only requires Monte-Carlo sampling to obtain bootstrapped copies of the
estimated vector of coefficients. Using this bootstrap sample, we obtain the
-value for each SNP, and select SNPs having -values below a threshold. We
illustrate through numerical studies that our method is more effective in
detecting SNPs associated with a trait than either single-marker analysis using
family data or model selection methods that ignore the familial dependency
structure. We also use the -values to perform gene-level analysis in nuclear
families and detect several SNPs that have been implicated to be associated
with alcohol consumption
Integrated probability of coronary heart disease subject to the -308 tumor necrosis factor-alpha SNP: a Bayesian meta-analysis
We present a meta-analysis of independent studies on the potential
implication in the occurrence of coronary heart disease (CHD) of the
single-nucleotide polymorphism (SNP) at the -308 position of the tumor necrosis
factor alpha (TNF-alpha) gene. We use Bayesian analysis to integrate
independent data sets and to infer statistically robust measurements of
correlation. Bayesian hypothesis testing indicates that there is no preference
for the hypothesis that the -308 TNF-alpha SNP is related to the occurrence of
CHD, in the Caucasian or in the Asian population, over the null hypothesis. As
a measure of correlation, we use the probability of occurrence of CHD
conditional on the presence of the SNP, derived as the posterior probability of
the Bayesian meta-analysis. The conditional probability indicates that CHD is
not more likely to occur when the SNP is present, which suggests that the -308
TNF-alpha SNP is not implicated in the occurrence of CHD.Comment: 21 pages, 7 figures, Published in PeerJ (2015
Using GWAS Data to Identify Copy Number Variants Contributing to Common Complex Diseases
Copy number variants (CNVs) account for more polymorphic base pairs in the
human genome than do single nucleotide polymorphisms (SNPs). CNVs encompass
genes as well as noncoding DNA, making these polymorphisms good candidates for
functional variation. Consequently, most modern genome-wide association studies
test CNVs along with SNPs, after inferring copy number status from the data
generated by high-throughput genotyping platforms. Here we give an overview of
CNV genomics in humans, highlighting patterns that inform methods for
identifying CNVs. We describe how genotyping signals are used to identify CNVs
and provide an overview of existing statistical models and methods used to
infer location and carrier status from such data, especially the most commonly
used methods exploring hybridization intensity. We compare the power of such
methods with the alternative method of using tag SNPs to identify CNV carriers.
As such methods are only powerful when applied to common CNVs, we describe two
alternative approaches that can be informative for identifying rare CNVs
contributing to disease risk. We focus particularly on methods identifying de
novo CNVs and show that such methods can be more powerful than case-control
designs. Finally we present some recommendations for identifying CNVs
contributing to common complex disorders.Comment: Published in at http://dx.doi.org/10.1214/09-STS304 the Statistical
Science (http://www.imstat.org/sts/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Clinical application of high throughput molecular screening techniques for pharmacogenomics.
Genetic analysis is one of the fastest-growing areas of clinical diagnostics. Fortunately, as our knowledge of clinically relevant genetic variants rapidly expands, so does our ability to detect these variants in patient samples. Increasing demand for genetic information may necessitate the use of high throughput diagnostic methods as part of clinically validated testing. Here we provide a general overview of our current and near-future abilities to perform large-scale genetic testing in the clinical laboratory. First we review in detail molecular methods used for high throughput mutation detection, including techniques able to monitor thousands of genetic variants for a single patient or to genotype a single genetic variant for thousands of patients simultaneously. These methods are analyzed in the context of pharmacogenomic testing in the clinical laboratories, with a focus on tests that are currently validated as well as those that hold strong promise for widespread clinical application in the near future. We further discuss the unique economic and clinical challenges posed by pharmacogenomic markers. Our ability to detect genetic variants frequently outstrips our ability to accurately interpret them in a clinical context, carrying implications both for test development and introduction into patient management algorithms. These complexities must be taken into account prior to the introduction of any pharmacogenomic biomarker into routine clinical testing
Detecting epistasis via Markov bases
Rapid research progress in genotyping techniques have allowed large
genome-wide association studies. Existing methods often focus on determining
associations between single loci and a specific phenotype. However, a
particular phenotype is usually the result of complex relationships between
multiple loci and the environment. In this paper, we describe a two-stage
method for detecting epistasis by combining the traditionally used single-locus
search with a search for multiway interactions. Our method is based on an
extended version of Fisher's exact test. To perform this test, a Markov chain
is constructed on the space of multidimensional contingency tables using the
elements of a Markov basis as moves. We test our method on simulated data and
compare it to a two-stage logistic regression method and to a fully Bayesian
method, showing that we are able to detect the interacting loci when other
methods fail to do so. Finally, we apply our method to a genome-wide data set
consisting of 685 dogs and identify epistasis associated with canine hair
length for four pairs of SNPs
- …
