23 research outputs found
Proceedings of The Tenth International Workshop on Ontology Matching (OM-2015)
shvaiko2016aInternational audienceno abstrac
Statistical Inference and Reverse Engineering of Gene Regulatory Networks from Observational Expression Data
In this paper, we present a systematic and conceptual overview of methods for inferring gene regulatory networks from observational gene expression data. Further, we discuss two classic approaches to infer causal structures and compare them with contemporary methods by providing a conceptual categorization thereof. We complement the above by surveying global and local evaluation measures for assessing the performance of inference algorithms
Graphical modelling of biological pathways
Biological pathways underlie the basic functions of a living cell. They are complex diagrams featuring genes, proteins and other small molecules, showing how they work together to achieve a particular biological effect. From a technical point of view, they are networks represented through a graph where genes and their connections are, respectively, nodes and edges of a graph.
The main research objective of this thesis is to develop a framework for simulating effects of gene silencing.
To this end, we propose a three step approach. First, we refine the structure of a pathway via our CK2 algorithm. Next, we assess the uncertainty in the refined structure. Finally, we simulate gene silencing through intervention analysis in causal graphical models. The proposed approach showed promising results when applied to the problem of predicting the effect of the knockdown of the nkd gene in Drosophila Melanogaster
Ontology Matching: OM-2018: Proceedings of the ISWC Workshop
International audienceno abstrac
Recommended from our members
Uncertainty in Neural Networks; Bayesian Ensembles, Priors & Prediction Intervals
The breakout success of deep neural networks (NNs) in the 2010's marked a new era in the quest to build artificial intelligence (AI). With NNs as the building block of these systems, excellent performance has been achieved on narrow, well-defined tasks where large amounts of data are available.
However, these systems lack certain capabilities that are important for broad use in real-world applications. One such capability is the communication of uncertainty in a NN's predictions and decisions. In applications such as healthcare recommendation or heavy machinery prognostics, it is vital that AI systems be aware of and express their uncertainty – this creates safer, more cautious, and ultimately more useful systems.
This thesis explores how to engineer NNs to communicate robust uncertainty estimates on their predictions, whilst minimising the impact on usability. One way to encourage uncertainty estimates to be robust is to adopt the Bayesian framework, which offers a principled approach to handling uncertainty. Two of the major contributions in this thesis relate to Bayesian NNs (BNNs).
Specifying appropriate priors is an important step in any Bayesian model, yet it is not clear how to do this in BNNs. The first contribution shows that the connection between BNNs and Gaussian Processes (GPs) provides an effective lens to study BNN priors. NN architectures are derived which mirror the combining of GP kernels to create priors tailored to a task.
The second major contribution is a novel way to perform approximate Bayesian inference in BNNs using a modified version of ensembling. Novel analysis improves an understanding of a technique known as randomised MAP sampling. It's shown this is particularly effective when strong correlations exist between parameters, making it well suited to NNs.
The third major contribution of the thesis is a non-Bayesian technique that trains a NN to directly output prediction intervals for regression tasks through a tailored objective function. This advances over related works that were incompatible with gradient descent, and ignored one source of uncertainty.EPSRC, Alan Turing Institut
Bayesian inference for protein signalling networks
Cellular response to a changing chemical environment is mediated by a complex system of interactions
involving molecules such as genes, proteins and metabolites. In particular, genetic and epigenetic variation
ensure that cellular response is often highly specific to individual cell types, or to different patients
in the clinical setting. Conceptually, cellular systems may be characterised as networks of interacting
components together with biochemical parameters specifying rates of reaction. Taken together, the network
and parameters form a predictive model of cellular dynamics which may be used to simulate the
effect of hypothetical drug regimens.
In practice, however, both network topology and reaction rates remain partially or entirely unknown,
depending on individual genetic variation and environmental conditions. Prediction under parameter
uncertainty is a classical statistical problem. Yet, doubly uncertain prediction, where both parameters
and the underlying network topology are unknown, leads to highly non-trivial probability distributions
which currently require gross simplifying assumptions to analyse. Recent advances in molecular assay
technology now permit high-throughput data-driven studies of cellular dynamics. This thesis sought to
develop novel statistical methods in this context, focussing primarily on the problems of (i) elucidating
biochemical network topology from assay data and (ii) prediction of dynamical response to therapy when
both network and parameters are uncertain
Parametric classification in domains of characters, numerals, punctuation, typefaces and image qualities
This thesis contributes to the Optical Font Recognition problem (OFR), by developing a classifier system to differentiate ten typefaces using a single English character ‘e’. First, features which need to be used in the classifier system are carefully selected after a thorough typographical study of global font features and previous related experiments. These features have been modeled by multivariate normal laws in order to use parameter estimation in learning. Then, the classifier system is built up on six independent schemes, each performing typeface classification using a different method. The results have shown a remarkable performance in the field of font recognition. Finally, the classifiers have been implemented on Lowercase characters, Uppercase characters, Digits, Punctuation and also on Degraded Images
Recommended from our members
Genetic association of high-dimensional traits
Over the past ten years, more than 4,000 genome-wide association studies (GWAS) have helped to shed light on the genetic architecture of complex traits and diseases. In recent years, phenotyping of the samples has often gone beyond single traits and it has become common to record multi- to high-dimensional phenotypes for individu- als. Whilst these rich datasets offer the potential to analyse complex trait structures and pleiotropic effects at a genome-wide level, novel analytic challenges arise. This thesis summarises my research into genetic associations for high-dimensional phen- otype data.
First, I developed a novel and computationally efficient approach for multivari- ate analysis of high-dimensional phenotypes based on linear mixed models, com- bined with bootstrapping (LiMMBo). Both in simulation studies and on real data, I demonstrate the statistical validity of LiMMBo and that it can scale to hundreds of phenotypes. I show the gain in power of multivariate analyses for high-dimensional phenotypes compared to univariate approaches, and illustrate that LiMMBo allows for detecting pleiotropy in a large number of phenotypic traits.
Aside from their computational challenges in GWAS, the true dimensionality of very high-dimensional phenotypes is often unknown and lies hidden in high-dimen- sional space. Retaining maximum power for association studies of such phenotype data relies on using an appropriate phenotype representation. I systematically ana- lysed twelve unsupervised dimensionality reduction methods based on their per- formance in finding a robust phenotype representation in simulated data of different structure and size. I propose a stability criteria for choosing low-dimensional phen- otype representations and demonstrate that stable phenotypes can recover genetic associations.
Finally, I analysed genetic variants for associations to high-dimensional cardiac phenotypes based on MRI data from 1,500 healthy individuals. I used an unsuper- vised approach to extract a low-dimensional representation of cardiac wall thickness and conducted a GWAS on this representation. In addition, I investigated genetic associations to a trabeculation phenotype generated from a supervised feature ex- traction approach on the cardiac MRI data.
In summary, this thesis highlights and overcomes some of the challenges in per- forming genetic association studies on high-dimensional phenotypes. It describes new approaches for phenotype processing, and genotype to phenotype mapping for high-dimensional datasets, as well as providing new insights in the genetic structure of cardiac morphology in humans.European Molecular Biology Laborator