Search CORE

10,642 research outputs found

Recommended from our members

Temporal Bayesian classifiers for modelling muscular dystrophy expression data

Author: Hoen PAC't
Liu X
Tucker A
Vinciotti V
Publication venue: 'IOS Press'
Publication date: 01/01/2006
Field of study

The analysis of microarray data from time-series experiments requires specialised algorithms, which take the temporal ordering of the data into account. In this paper we explore a new architecture of Bayesian classifier that can be used to understand how biological mechanisms differ with respect to time. We show that this classifier improves the classification of microarray data and at the same time ensures that the models can easily be analysed by biologists by incorporating time transparently. In this paper we focus on data that has been generated to explore different types of muscular dystrophy

Brunel University Research Archive

Nonparametric Bayesian inference for perturbed and orthologous gene regulatory networks

Author: Baumbach
Breeze
Bulyk
Christopher A. Penfold
Cooke
David L. Wild
Goda
Katherine J. Denby
Kilian
Kimbrough
Klemm
Liu
Lopato
Marbach
Marbach
Matys
Mitsuda
Ou
Park
Penfold
Prill
Rasmussen
Roth
Stegle
Tsutsui
Vicky Buchanan-Wollaston
Werhli
Werhli
Yamaguchi-Shinozaki
Äijö
Publication venue: 'Oxford University Press (OUP)'
Publication date: 09/06/2012
Field of study

Motivation: The generation of time series transcriptomic datasets collected under multiple experimental conditions has proven to be a powerful approach for disentangling complex biological processes, allowing for the reverse engineering of gene regulatory networks (GRNs). Most methods for reverse engineering GRNs from multiple datasets assume that each of the time series were generated from networks with identical topology. In this study, we outline a hierarchical, non-parametric Bayesian approach for reverse engineering GRNs using multiple time series that can be applied in a number of novel situations including: (i) where different, but overlapping sets of transcription factors are expected to bind in the different experimental conditions; that is, where switching events could potentially arise under the different treatments and (ii) for inference in evolutionary related species in which orthologous GRNs exist. More generally, the method can be used to identify context-specific regulation by leveraging time series gene expression data alongside methods that can identify putative lists of transcription factors or transcription factor targets. Results: The hierarchical inference outperforms related (but non-hierarchical) approaches when the networks used to generate the data were identical, and performs comparably even when the networks used to generate data were independent. The method was subsequently used alongside yeast one hybrid and microarray time series data to infer potential transcriptional switches in Arabidopsis thaliana response to stress. The results confirm previous biological studies and allow for additional insights into gene regulation under various abiotic stresses. Availability: The methods outlined in this article have been implemented in Matlab and are available on request

Warwick Research Archives Portal Repository

Data-driven modelling of biological multi-scale processes

Author: Hasenauer Jan
Hross Sabrina
Jagiella Nick
Theis Fabian J.
Publication venue
Publication date: 01/01/2015
Field of study

Biological processes involve a variety of spatial and temporal scales. A holistic understanding of many biological processes therefore requires multi-scale models which capture the relevant properties on all these scales. In this manuscript we review mathematical modelling approaches used to describe the individual spatial scales and how they are integrated into holistic models. We discuss the relation between spatial and temporal scales and the implication of that on multi-scale modelling. Based upon this overview over state-of-the-art modelling approaches, we formulate key challenges in mathematical and computational modelling of biological multi-scale and multi-physics processes. In particular, we considered the availability of analysis tools for multi-scale models and model-based multi-scale data integration. We provide a compact review of methods for model-based data integration and model-based hypothesis testing. Furthermore, novel approaches and recent trends are discussed, including computation time reduction using reduced order and surrogate models, which contribute to the solution of inference problems. We conclude the manuscript by providing a few ideas for the development of tailored multi-scale inference methods.Comment: This manuscript will appear in the Journal of Coupled Systems and Multiscale Dynamics (American Scientific Publishers

arXiv.org e-Print Archive

PuSH

Bayesian models for syndrome- and gene-specific probabilities of novel variant pathogenicity

Author: Balding DJ
Cook SA
Ruklisa D
Walsh R
Ware JS
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

BACKGROUND: With the advent of affordable and comprehensive sequencing technologies, access to molecular genetics for clinical diagnostics and research applications is increasing. However, variant interpretation remains challenging, and tools that close the gap between data generation and data interpretation are urgently required. Here we present a transferable approach to help address the limitations in variant annotation. METHODS: We develop a network of Bayesian logistic regression models that integrate multiple lines of evidence to evaluate the probability that a rare variant is the cause of an individual's disease. We present models for genes causing inherited cardiac conditions, though the framework is transferable to other genes and syndromes. RESULTS: Our models report a probability of pathogenicity, rather than a categorisation into pathogenic or benign, which captures the inherent uncertainty of the prediction. We find that gene- and syndrome-specific models outperform genome-wide approaches, and that the integration of multiple lines of evidence performs better than individual predictors. The models are adaptable to incorporate new lines of evidence, and results can be combined with familial segregation data in a transparent and quantitative manner to further enhance predictions. Though the probability scale is continuous, and innately interpretable, performance summaries based on thresholds are useful for comparisons. Using a threshold probability of pathogenicity of 0.9, we obtain a positive predictive value of 0.999 and sensitivity of 0.76 for the classification of variants known to cause long QT syndrome over the three most important genes, which represents sufficient accuracy to inform clinical decision-making. A web tool APPRAISE [http://www.cardiodb.org/APPRAISE] provides access to these models and predictions. CONCLUSIONS: Our Bayesian framework provides a transparent, flexible and robust framework for the analysis and interpretation of rare genetic variants. Models tailored to specific genes outperform genome-wide approaches, and can be sufficiently accurate to inform clinical decision-making

Springer - Publisher Connector

Spiral - Imperial College Digital Repository

Discovering transcriptional modules by Bayesian data integration

Author: Antoniak
Bar-Joseph
Bernard J. de la Cruz
Bähler
Cho
Dahl
Datta
David L. Wild
Eisen
Falcon
Ferguson
Fritsch
Gasch
Gerber
Geweke
Harbison
Ideker
Ihmels
Jim E. Griffin
Kundaje
Lee
Liu
Liu
Medvedovic
Medvedovic
Qin
Rasmussen
Rasmussen
Reid
Richard S. Savage
Savage
Segal
Segal
Teh
Teh
Wild
Yao
Yeung
Zoubin Ghahramani
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2010
Field of study

Motivation: We present a method for directly inferring transcriptional modules (TMs) by integrating gene expression and transcription factor binding (ChIP-chip) data. Our model extends a hierarchical Dirichlet process mixture model to allow data fusion on a gene-by-gene basis. This encodes the intuition that co-expression and co-regulation are not necessarily equivalent and hence we do not expect all genes to group similarly in both datasets. In particular, it allows us to identify the subset of genes that share the same structure of transcriptional modules in both datasets. Results: We find that by working on a gene-by-gene basis, our model is able to extract clusters with greater functional coherence than existing methods. By combining gene expression and transcription factor binding (ChIP-chip) data in this way, we are better able to determine the groups of genes that are most likely to represent underlying TMs

Warwick Research Archives Portal Repository

A Latent Variable Approach to Multivariate Quantitative Trait Loci

Author: Bob O&#x27
Mikko J. Sillanp&#xe4
Outi Savolainen
P&#xe4
Publication venue
Publication date: 07/01/2010
Field of study

A novel approach based on latent variable modelling is presented for the analysis of multivariate quantitative and qualitative trait loci. The approach is general in the sense that it enables the joint analysis of many kinds of quantitative and qualitative traits (including count data and censored traits) in a single modelling framework. In the framework, the observations are modelled as functions of latent variables, which are then affected by quantitative trait loci. Separating the analysis in this way means that measurement errors in the phenotypic observations can be included easily in the model, providing robust inferences. The performance of the method is illustrated using two real multivariate datasets, from barley and Scots pine

Populations in statistical genetic modelling and inference

Author: Lawson Daniel John
Publication venue
Publication date: 04/06/2013
Field of study

What is a population? This review considers how a population may be defined in terms of understanding the structure of the underlying genetics of the individuals involved. The main approach is to consider statistically identifiable groups of randomly mating individuals, which is well defined in theory for any type of (sexual) organism. We discuss generative models using drift, admixture and spatial structure, and the ancestral recombination graph. These are contrasted with statistical models for inference, principle component analysis and other `non-parametric' methods. The relationships between these approaches are explored with both simulated and real-data examples. The state-of-the-art practical software tools are discussed and contrasted. We conclude that populations are a useful theoretical construct that can be well defined in theory and often approximately exist in practice

arXiv.org e-Print Archive

CiteSeerX