10,958 research outputs found
A Strategy analysis for genetic association studies with known inbreeding
Background: Association studies consist in identifying the genetic variants which are related to a specific disease through the use of statistical multiple hypothesis testing or segregation analysis in pedigrees. This type of studies has been very successful in the case of Mendelian monogenic disorders while it has been less successful in identifying genetic variants related to complex diseases where the insurgence depends on the interactions between different genes and the environment. The current technology allows to genotype more than a million of markers and this number has been rapidly increasing in the last years with the imputation based on templates sets and whole genome sequencing. This type of data introduces a great amount of noise in the statistical analysis and usually requires a great number of samples. Current methods seldom take into account gene-gene and gene-environment interactions which are fundamental especially in complex diseases. In this paper we propose to use a non-parametric additive model to detect the genetic variants related to diseases which accounts for interactions of unknown order. Although this is not new to
the current literature, we show that in an isolated population, where the most related subjects share also most of their genetic code, the use of additive models may be improved if the available genealogical tree is taken into account. Specifically, we form a sample of cases and controls with the highest inbreeding by means of the Hungarian method, and estimate the set of genes/environmental variables, associated with the disease, by means of Random Forest.
Results: We have evidence, from statistical theory, simulations and two applications, that we build a suitable
procedure to eliminate stratification between cases and controls and that it also has enough precision in
identifying genetic variants responsible for a disease. This procedure has been successfully used for the betathalassemia, which is a well known Mendelian disease, and also to the common asthma where we have identified
candidate genes that underlie to the susceptibility of the asthma. Some of such candidate genes have been also found related to common asthma in the current literature.
Conclusions: The data analysis approach, based on selecting the most related cases and controls along with the Random Forest model, is a powerful tool for detecting genetic variants associated to a disease in isolated
populations. Moreover, this method provides also a prediction model that has accuracy in estimating the unknown disease status and that can be generally used to build kit tests for a wide class of Mendelian diseases
Detection of regulator genes and eQTLs in gene networks
Genetic differences between individuals associated to quantitative phenotypic
traits, including disease states, are usually found in non-coding genomic
regions. These genetic variants are often also associated to differences in
expression levels of nearby genes (they are "expression quantitative trait
loci" or eQTLs for short) and presumably play a gene regulatory role, affecting
the status of molecular networks of interacting genes, proteins and
metabolites. Computational systems biology approaches to reconstruct causal
gene networks from large-scale omics data have therefore become essential to
understand the structure of networks controlled by eQTLs together with other
regulatory genes, and to generate detailed hypotheses about the molecular
mechanisms that lead from genotype to phenotype. Here we review the main
analytical methods and softwares to identify eQTLs and their associated genes,
to reconstruct co-expression networks and modules, to reconstruct causal
Bayesian gene and module networks, and to validate predicted networks in
silico.Comment: minor revision with typos corrected; review article; 24 pages, 2
figure
Unsupervised empirical Bayesian multiple testing with external covariates
In an empirical Bayesian setting, we provide a new multiple testing method,
useful when an additional covariate is available, that influences the
probability of each null hypothesis being true. We measure the posterior
significance of each test conditionally on the covariate and the data, leading
to greater power. Using covariate-based prior information in an unsupervised
fashion, we produce a list of significant hypotheses which differs in length
and order from the list obtained by methods not taking covariate-information
into account. Covariate-modulated posterior probabilities of each null
hypothesis are estimated using a fast approximate algorithm. The new method is
applied to expression quantitative trait loci (eQTL) data.Comment: Published in at http://dx.doi.org/10.1214/08-AOAS158 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Nonparametric false discovery rate control for identifying simultaneous signals
It is frequently of interest to jointly analyze multiple sequences of
multiple tests in order to identify simultaneous signals, defined as features
tested in multiple studies whose test statistics are non-null in each. In many
problems, however, the null distributions of the test statistics may be
complicated or even unknown, and there do not currently exist any procedures
that can be employed in these cases. This paper proposes a new nonparametric
procedure that can identify simultaneous signals across multiple studies even
without knowing the null distributions of the test statistics. The method is
shown to asymptotically control the false discovery rate, and in simulations
had excellent power and error control. In an analysis of gene expression and
histone acetylation patterns in the brains of mice exposed to a conspecific
intruder, it identified genes that were both differentially expressed and next
to differentially accessible chromatin. The proposed method is available in the
R package github.com/sdzhao/ssa
- …