35,315 research outputs found
Causal graphical models in systems genetics: A unified framework for joint inference of causal network and genetic architecture for correlated phenotypes
Causal inference approaches in systems genetics exploit quantitative trait
loci (QTL) genotypes to infer causal relationships among phenotypes. The
genetic architecture of each phenotype may be complex, and poorly estimated
genetic architectures may compromise the inference of causal relationships
among phenotypes. Existing methods assume QTLs are known or inferred without
regard to the phenotype network structure. In this paper we develop a
QTL-driven phenotype network method (QTLnet) to jointly infer a causal
phenotype network and associated genetic architecture for sets of correlated
phenotypes. Randomization of alleles during meiosis and the unidirectional
influence of genotype on phenotype allow the inference of QTLs causal to
phenotypes. Causal relationships among phenotypes can be inferred using these
QTL nodes, enabling us to distinguish among phenotype networks that would
otherwise be distribution equivalent. We jointly model phenotypes and QTLs
using homogeneous conditional Gaussian regression models, and we derive a
graphical criterion for distribution equivalence. We validate the QTLnet
approach in a simulation study. Finally, we illustrate with simulated data and
a real example how QTLnet can be used to infer both direct and indirect effects
of QTLs and phenotypes that co-map to a genomic region.Comment: Published in at http://dx.doi.org/10.1214/09-AOAS288 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Cluster detection and risk estimation for spatio-temporal health data
In epidemiological disease mapping one aims to estimate the spatio-temporal
pattern in disease risk and identify high-risk clusters, allowing health
interventions to be appropriately targeted. Bayesian spatio-temporal models are
used to estimate smoothed risk surfaces, but this is contrary to the aim of
identifying groups of areal units that exhibit elevated risks compared with
their neighbours. Therefore, in this paper we propose a new Bayesian
hierarchical modelling approach for simultaneously estimating disease risk and
identifying high-risk clusters in space and time. Inference for this model is
based on Markov chain Monte Carlo simulation, using the freely available R
package CARBayesST that has been developed in conjunction with this paper. Our
methodology is motivated by two case studies, the first of which assesses if
there is a relationship between Public health Districts and colon cancer
clusters in Georgia, while the second looks at the impact of the smoking ban in
public places in England on cardiovascular disease clusters
Fast Genome-Wide QTL Association Mapping on Pedigree and Population Data
Since most analysis software for genome-wide association studies (GWAS)
currently exploit only unrelated individuals, there is a need for efficient
applications that can handle general pedigree data or mixtures of both
population and pedigree data. Even data sets thought to consist of only
unrelated individuals may include cryptic relationships that can lead to false
positives if not discovered and controlled for. In addition, family designs
possess compelling advantages. They are better equipped to detect rare
variants, control for population stratification, and facilitate the study of
parent-of-origin effects. Pedigrees selected for extreme trait values often
segregate a single gene with strong effect. Finally, many pedigrees are
available as an important legacy from the era of linkage analysis.
Unfortunately, pedigree likelihoods are notoriously hard to compute. In this
paper we re-examine the computational bottlenecks and implement ultra-fast
pedigree-based GWAS analysis. Kinship coefficients can either be based on
explicitly provided pedigrees or automatically estimated from dense markers.
Our strategy (a) works for random sample data, pedigree data, or a mix of both;
(b) entails no loss of power; (c) allows for any number of covariate
adjustments, including correction for population stratification; (d) allows for
testing SNPs under additive, dominant, and recessive models; and (e)
accommodates both univariate and multivariate quantitative traits. On a typical
personal computer (6 CPU cores at 2.67 GHz), analyzing a univariate HDL
(high-density lipoprotein) trait from the San Antonio Family Heart Study
(935,392 SNPs on 1357 individuals in 124 pedigrees) takes less than 2 minutes
and 1.5 GB of memory. Complete multivariate QTL analysis of the three
time-points of the longitudinal HDL multivariate trait takes less than 5
minutes and 1.5 GB of memory
netgwas: An R Package for Network-Based Genome-Wide Association Studies
Graphical models are powerful tools for modeling and making statistical
inferences regarding complex associations among variables in multivariate data.
In this paper we introduce the R package netgwas, which is designed based on
undirected graphical models to accomplish three important and interrelated
goals in genetics: constructing linkage map, reconstructing linkage
disequilibrium (LD) networks from multi-loci genotype data, and detecting
high-dimensional genotype-phenotype networks. The netgwas package deals with
species with any chromosome copy number in a unified way, unlike other
software. It implements recent improvements in both linkage map construction
(Behrouzi and Wit, 2018), and reconstructing conditional independence network
for non-Gaussian continuous data, discrete data, and mixed
discrete-and-continuous data (Behrouzi and Wit, 2017). Such datasets routinely
occur in genetics and genomics such as genotype data, and genotype-phenotype
data. We demonstrate the value of our package functionality by applying it to
various multivariate example datasets taken from the literature. We show, in
particular, that our package allows a more realistic analysis of data, as it
adjusts for the effect of all other variables while performing pairwise
associations. This feature controls for spurious associations between variables
that can arise from classical multiple testing approach. This paper includes a
brief overview of the statistical methods which have been implemented in the
package. The main body of the paper explains how to use the package. The
package uses a parallelization strategy on multi-core processors to speed-up
computations for large datasets. In addition, it contains several functions for
simulation and visualization. The netgwas package is freely available at
https://cran.r-project.org/web/packages/netgwasComment: 32 pages, 9 figures; due to the limitation "The abstract field cannot
be longer than 1,920 characters", the abstract appearing here is slightly
shorter than that in the PDF fil
- …