2,880 research outputs found
Mapping the genetic architecture of gene expression in human liver
Genetic variants that are associated with common human diseases do not lead directly to disease, but instead act on intermediate, molecular phenotypes that in turn induce changes in higher-order disease traits. Therefore, identifying the molecular phenotypes that vary in response to changes in DNA and that also associate with changes in disease traits has the potential to provide the functional information required to not only identify and validate the susceptibility genes that are directly affected by changes in DNA, but also to understand the molecular networks in which such genes operate and how changes in these networks lead to changes in disease traits. Toward that end, we profiled more than 39,000 transcripts and we genotyped 782,476 unique single nucleotide polymorphisms (SNPs) in more than 400 human liver samples to characterize the genetic architecture of gene expression in the human liver, a metabolically active tissue that is important in a number of common human diseases, including obesity, diabetes, and atherosclerosis. This genome-wide association study of gene expression resulted in the detection of more than 6,000 associations between SNP genotypes and liver gene expression traits, where many of the corresponding genes identified have already been implicated in a number of human diseases. The utility of these data for elucidating the causes of common human diseases is demonstrated by integrating them with genotypic and expression data from other human and mouse populations. This provides much-needed functional support for the candidate susceptibility genes being identified at a growing number of genetic loci that have been identified as key drivers of disease from genome-wide association studies of disease. By using an integrative genomics approach, we highlight how the gene RPS26 and not ERBB3 is supported by our data as the most likely susceptibility gene for a novel type 1 diabetes locus recently identified in a large-scale, genome-wide association study. We also identify SORT1 and CELSR2 as candidate susceptibility genes for a locus recently associated with coronary artery disease and plasma low-density lipoprotein cholesterol levels in the process. Ā© 2008 Schadt et al
Recommended from our members
Understand Biology Using Single Cell RNA-Sequencing
This dissertation summarizes the development of experimental and analytical tools for single cell RNA sequencing (scRNA-Seq), including 1) scPLATE-Seq, a FACS- and plate-based scRNASeq platform, which is accurate, robust, fully automated and cost-efficient; 2) metaVIPER, an algorithm for transcriptional regulator activity inference based on scRNA-Seq profiles; and 3) iterClust, a statistical framework for iterative clustering analysis, especially suitable for dissecting hierarchy of heterogeneity among single cells. Further this dissertation summarizes biological questions answered by combining these tools, including 1) understanding inter- and intra-tumor heterogeneity of human glioblastoma; 2) elucidating regulators of Ī²-cell de-differentiation in type-2 diabetes; and 3) developing novel therapeutics targeting cell-state regulators of breast cancer stem cells
Dissecting heterogeneous cell populations across drug and disease conditions with PopAlign
Single-cell measurement techniques can now probe gene expression in heterogeneous cell populations from the human body across a range of environmental and physiological conditions. However, new mathematical and computational methods are required to represent and analyze gene expression changes that occur in complex mixtures of single cells as they respond to signals, drugs, or disease states. Here, we introduce a mathematical modeling platform, PopAlign, that automatically identifies subpopulations of cells within a heterogeneous mixture, and tracks gene expression and cell abundance changes across subpopulations by constructing and comparing probabilistic models. We apply PopAlign to analyze the impact of 42 different immunomodulatory compounds on a heterogeneous population of donor-derived human immune cells as well as patient-specific disease signatures in multiple myeloma. PopAlign scales to comparisons involving tens to hundreds of samples, enabling large-scale studies of natural and engineered cell populations as they respond to drugs, signals or physiological change
Inferring Gene Regulatory Networks from Time Series Microarray Data
The innovations and improvements in high-throughput genomic technologies, such as DNA microarray, make it possible for biologists to simultaneously measure dependencies and regulations among genes on a genome-wide scale and provide us genetic information. An important objective of the functional genomics is to understand the controlling mechanism of the expression of these genes and encode the knowledge into gene regulatory network (GRN). To achieve this, computational and statistical algorithms are especially needed.
Inference of GRN is a very challenging task for computational biologists because the degree of freedom of the parameters is redundant. Various computational approaches have been proposed for modeling gene regulatory networks, such as Boolean network, differential equations and Bayesian network. There is no so called golden method which can generally give us the best performance for any data set. The research goal is to improve inference accuracy and reduce computational complexity.
One of the problems in reconstructing GRN is how to deal with the high dimensionality and short time course gene expression data. In this work, some existing inference algorithms are compared and the limitations lie in that they either suffer from low inference accuracy or computational complexity. To overcome such difficulties, a new approach based on state space model and Expectation-Maximization (EM) algorithms is proposed to model the dynamic system of gene regulation and infer gene regulatory networks. In our model, GRN is represented by a state space model that incorporates noises and has the ability to capture more various biological aspects, such as hidden or missing variables. An EM algorithm is used to estimate the parameters based on the given state space functions and the gene interaction matrix is derived by decomposing the observation matrix using singular value decomposition, and then it is used to infer GRN. The new model is validated using synthetic data sets before applying it to real biological data sets. The results reveal that the developed model can infer the gene regulatory networks from large scale gene expression data and significantly reduce the computational time complexity without losing much inference accuracy compared to dynamic Bayesian network
Approximately unbiased tests of regions using multistep-multiscale bootstrap resampling
Approximately unbiased tests based on bootstrap probabilities are considered
for the exponential family of distributions with unknown expectation parameter
vector, where the null hypothesis is represented as an arbitrary-shaped region
with smooth boundaries. This problem has been discussed previously in Efron and
Tibshirani [Ann. Statist. 26 (1998) 1687-1718], and a corrected p-value with
second-order asymptotic accuracy is calculated by the two-level bootstrap of
Efron, Halloran and Holmes [Proc. Natl. Acad. Sci. U.S.A. 93 (1996)
13429-13434] based on the ABC bias correction of Efron [J. Amer. Statist.
Assoc. 82 (1987) 171-185]. Our argument is an extension of their asymptotic
theory, where the geometry, such as the signed distance and the curvature of
the boundary, plays an important role. We give another calculation of the
corrected p-value without finding the ``nearest point'' on the boundary to the
observation, which is required in the two-level bootstrap and is an
implementational burden in complicated problems. The key idea is to alter the
sample size of the replicated dataset from that of the observed dataset. The
frequency of the replicates falling in the region is counted for several sample
sizes, and then the p-value is calculated by looking at the change in the
frequencies along the changing sample sizes. This is the multiscale bootstrap
of Shimodaira [Systematic Biology 51 (2002) 492-508], which is third-order
accurate for the multivariate normal model. Here we introduce a newly devised
multistep-multiscale bootstrap, calculating a third-order accurate p-value for
the exponential family of distributions.Comment: Published at http://dx.doi.org/10.1214/009053604000000823 in the
Annals of Statistics (http://www.imstat.org/aos/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Consensus and meta-analysis regulatory networks for combining multiple microarray gene expression datasets
Microarray data is a key source of experimental data for modelling gene regulatory interactions from expression levels. With the rapid increase of publicly available microarray data comes the opportunity to produce regulatory network models based on multiple datasets. Such models are potentially more robust with greater confidence, and place less reliance on a single dataset. However, combining datasets directly can be difficult as experiments are often conducted on different microarray platforms, and in different laboratories leading to inherent biases in the data that are not always removed through pre-processing such as normalisation. In this paper we compare two frameworks for combining microarray datasets to model regulatory networks: pre- and post-learning aggregation. In pre-learning approaches, such as using simple scale-normalisation prior to the concatenation of datasets, a model is learnt from a combined dataset, whilst in post-learning aggregation individual models are learnt from each dataset and the models are combined. We present two novel approaches for post-learning aggregation, each based on aggregating high-level features of Bayesian network models that have been generated from different microarray expression datasets. Meta-analysis Bayesian networks are based on combining statistical confidences attached to network edges whilst Consensus Bayesian networks identify consistent network features across all datasets. We apply both approaches to multiple datasets from synthetic and real (Escherichia coli and yeast) networks and demonstrate that both methods can improve on networks learnt from a single dataset or an aggregated dataset formed using a standard scale-normalisation
Efficient reverse-engineering of a developmental gene regulatory network
This is the final version of the article. Available from the publisher via the DOI in this record.Understanding the complex regulatory networks underlying development and evolution of multi-cellular organisms is a major problem in biology. Computational models can be used as tools to extract the regulatory structure and dynamics of such networks from gene expression data. This approach is called reverse engineering. It has been successfully applied to many gene networks in various biological systems. However, to reconstitute the structure and non-linear dynamics of a developmental gene network in its spatial context remains a considerable challenge. Here, we address this challenge using a case study: the gap gene network involved in segment determination during early development of Drosophila melanogaster. A major problem for reverse-engineering pattern-forming networks is the significant amount of time and effort required to acquire and quantify spatial gene expression data. We have developed a simplified data processing pipeline that considerably increases the throughput of the method, but results in data of reduced accuracy compared to those previously used for gap gene network inference. We demonstrate that we can infer the correct network structure using our reduced data set, and investigate minimal data requirements for successful reverse engineering. Our results show that timing and position of expression domain boundaries are the crucial features for determining regulatory network structure from data, while it is less important to precisely measure expression levels. Based on this, we define minimal data requirements for gap gene network inference. Our results demonstrate the feasibility of reverse-engineering with much reduced experimental effort. This enables more widespread use of the method in different developmental contexts and organisms. Such systematic application of data-driven models to real-world networks has enormous potential. Only the quantitative investigation of a large number of developmental gene regulatory networks will allow us to discover whether there are rules or regularities governing development and evolution of complex multi-cellular organisms.Funding: The laboratory of Johannes Jaeger and this study in particular was funded by the MEC-EMBL agreement for the EMBL/CRG Research Unit in Systems
Biology, by Grant 153 (MOPDEV) of the ERANet: ComplexityNET program, by SGR Grant 406 from the Catalan funding agency AGAUR, by grant BFU2009-10184
from the Spanish Ministry of Science, and by European Commission grant FP7-KBBE-2011-5/289434 (BioPreDyn). The funders had no role in study design, data
collection and analysis, decision to publish, or preparation of the manuscript
The effect of noise on dynamics and the influence of biochemical systems
Understanding a complex system requires integration and collective analysis of data from many
levels of organisation. Predictive modelling of biochemical systems is particularly challenging
because of the nature of data being plagued by noise operating at each and every level. Inevitably
we have to decide whether we can reliably infer the structure and dynamics of biochemical systems
from present data. Here we approach this problem from many fronts by analysing the interplay
between deterministic and stochastic dynamics in a broad collection of biochemical models.
In a classical mathematical model we first illustrate how this interplay can be described in
surprisingly simple terms; we furthermore demonstrate the advantages of a statistical point of view
also for more complex systems. We then investigate strategies for the integrated analysis of models
characterised by different organisational levels, and trace the propagation of noise through such
systems. We use this approach to uncover, for the first time, the dynamics of metabolic adaptation
of a plant pathogen throughout its life cycle and discuss the ecological implications.
Finally, we investigate how reliably we can infer model parameters of biochemical models.
We develop a novel sensitivity/inferability analysis framework that is generally applicable to a
large fraction of current mathematical models of biochemical systems. By using this framework to
quantify the effect of parametric variation on system dynamics, we provide practical guidelines as
to when and why certain parameters are easily estimated while others are much harder to infer. We
highlight the limitations on parameter inference due to model structure and qualitative dynamical
behaviour, and identify candidate elements of control in biochemical pathways most likely of being
subjected to regulation
- ā¦