9,363 research outputs found
Using GWAS Data to Identify Copy Number Variants Contributing to Common Complex Diseases
Copy number variants (CNVs) account for more polymorphic base pairs in the
human genome than do single nucleotide polymorphisms (SNPs). CNVs encompass
genes as well as noncoding DNA, making these polymorphisms good candidates for
functional variation. Consequently, most modern genome-wide association studies
test CNVs along with SNPs, after inferring copy number status from the data
generated by high-throughput genotyping platforms. Here we give an overview of
CNV genomics in humans, highlighting patterns that inform methods for
identifying CNVs. We describe how genotyping signals are used to identify CNVs
and provide an overview of existing statistical models and methods used to
infer location and carrier status from such data, especially the most commonly
used methods exploring hybridization intensity. We compare the power of such
methods with the alternative method of using tag SNPs to identify CNV carriers.
As such methods are only powerful when applied to common CNVs, we describe two
alternative approaches that can be informative for identifying rare CNVs
contributing to disease risk. We focus particularly on methods identifying de
novo CNVs and show that such methods can be more powerful than case-control
designs. Finally we present some recommendations for identifying CNVs
contributing to common complex disorders.Comment: Published in at http://dx.doi.org/10.1214/09-STS304 the Statistical
Science (http://www.imstat.org/sts/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Inference of Population History using Coalescent HMMs: Review and Outlook
Studying how diverse human populations are related is of historical and
anthropological interest, in addition to providing a realistic null model for
testing for signatures of natural selection or disease associations.
Furthermore, understanding the demographic histories of other species is
playing an increasingly important role in conservation genetics. A number of
statistical methods have been developed to infer population demographic
histories using whole-genome sequence data, with recent advances focusing on
allowing for more flexible modeling choices, scaling to larger data sets, and
increasing statistical power. Here we review coalescent hidden Markov models, a
powerful class of population genetic inference methods that can effectively
utilize linkage disequilibrium information. We highlight recent advances, give
advice for practitioners, point out potential pitfalls, and present possible
future research directions.Comment: 12 pages, 2 figure
A Bayesian Method for Detecting and Characterizing Allelic Heterogeneity and Boosting Signals in Genome-Wide Association Studies
The standard paradigm for the analysis of genome-wide association studies
involves carrying out association tests at both typed and imputed SNPs. These
methods will not be optimal for detecting the signal of association at SNPs
that are not currently known or in regions where allelic heterogeneity occurs.
We propose a novel association test, complementary to the SNP-based approaches,
that attempts to extract further signals of association by explicitly modeling
and estimating both unknown SNPs and allelic heterogeneity at a locus. At each
site we estimate the genealogy of the case-control sample by taking advantage
of the HapMap haplotypes across the genome. Allelic heterogeneity is modeled by
allowing more than one mutation on the branches of the genealogy. Our use of
Bayesian methods allows us to assess directly the evidence for a causative SNP
not well correlated with known SNPs and for allelic heterogeneity at each
locus. Using simulated data and real data from the WTCCC project, we show that
our method (i) produces a significant boost in signal and accurately identifies
the form of the allelic heterogeneity in regions where it is known to exist,
(ii) can suggest new signals that are not found by testing typed or imputed
SNPs and (iii) can provide more accurate estimates of effect sizes in regions
of association.Comment: Published in at http://dx.doi.org/10.1214/09-STS311 the Statistical
Science (http://www.imstat.org/sts/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Multiple Testing for Neuroimaging via Hidden Markov Random Field
Traditional voxel-level multiple testing procedures in neuroimaging, mostly
-value based, often ignore the spatial correlations among neighboring voxels
and thus suffer from substantial loss of power. We extend the
local-significance-index based procedure originally developed for the hidden
Markov chain models, which aims to minimize the false nondiscovery rate subject
to a constraint on the false discovery rate, to three-dimensional neuroimaging
data using a hidden Markov random field model. A generalized
expectation-maximization algorithm for maximizing the penalized likelihood is
proposed for estimating the model parameters. Extensive simulations show that
the proposed approach is more powerful than conventional false discovery rate
procedures. We apply the method to the comparison between mild cognitive
impairment, a disease status with increased risk of developing Alzheimer's or
another dementia, and normal controls in the FDG-PET imaging study of the
Alzheimer's Disease Neuroimaging Initiative.Comment: A MATLAB package implementing the proposed FDR procedure is available
with this paper at the Biometrics website on Wiley Online Librar
A hierarchical Bayesian model for inference of copy number variants and their association to gene expression
A number of statistical models have been successfully developed for the
analysis of high-throughput data from a single source, but few methods are
available for integrating data from different sources. Here we focus on
integrating gene expression levels with comparative genomic hybridization (CGH)
array measurements collected on the same subjects. We specify a measurement
error model that relates the gene expression levels to latent copy number
states which, in turn, are related to the observed surrogate CGH measurements
via a hidden Markov model. We employ selection priors that exploit the
dependencies across adjacent copy number states and investigate MCMC stochastic
search techniques for posterior inference. Our approach results in a unified
modeling framework for simultaneously inferring copy number variants (CNV) and
identifying their significant associations with mRNA transcripts abundance. We
show performance on simulated data and illustrate an application to data from a
genomic study on human cancer cell lines.Comment: Published in at http://dx.doi.org/10.1214/13-AOAS705 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Detection of regulator genes and eQTLs in gene networks
Genetic differences between individuals associated to quantitative phenotypic
traits, including disease states, are usually found in non-coding genomic
regions. These genetic variants are often also associated to differences in
expression levels of nearby genes (they are "expression quantitative trait
loci" or eQTLs for short) and presumably play a gene regulatory role, affecting
the status of molecular networks of interacting genes, proteins and
metabolites. Computational systems biology approaches to reconstruct causal
gene networks from large-scale omics data have therefore become essential to
understand the structure of networks controlled by eQTLs together with other
regulatory genes, and to generate detailed hypotheses about the molecular
mechanisms that lead from genotype to phenotype. Here we review the main
analytical methods and softwares to identify eQTLs and their associated genes,
to reconstruct co-expression networks and modules, to reconstruct causal
Bayesian gene and module networks, and to validate predicted networks in
silico.Comment: minor revision with typos corrected; review article; 24 pages, 2
figure
- …