138 research outputs found

    Model-driven analysis of gene expression control

    Get PDF
    During this PhD, I worked on three different aspects in the broad field of experimental and theoretical analysis of gene regulation. The first part, "Quantifying the strength of miRNA-target interactions", addresses the problem of predicting mRNA targets of miRNAs. I show that biochemical measurements of miRNA-mRNA interactions can be used to optimise the parameter inference of a pre-existing model of miRNA target prediction. This model named MIRZA, predicts miRNA-mRNA binding using 25 energy parameters that describe the miRNA-mRNA hybrid structure, with 2 base pairing parameters for the AU and GC pairs, 3 configuration parameters for the symmetric and asymmetric loops, and 21 positional parameters for the 21 nucleotides of the miRNA sequence. MIRZA was built to infer these parameters from Argonaute protein CLIP data, which captures potential targets of miRNAs. Upon the publication of precise measurements of chemical kinetic constants of miRNA-mRNA binding interactions between a mRNA target and a set of systematically mutated miRNA sequences, we reasoned that such data could be used to improve the parameters inference of the MIRZA model. After showing that the prediction of the existing model on the set of measured miRNA-mRNA pairs shows high correlation with the binding energy calculated from the measurements, I used simulations as a proof of principle of the inference procedure and to design measurements that would be needed to infer the parameters of the MIRZA model. Staying in the field of miRNA, in "Single cell mRNA profiling reveals the hierarchical response of miRNA targets to miRNA induction", I developed an approach to infer miRNA targets based on scRNA-seq data from cells that express the miRNA at different levels. A miRNA can target several hundreds of different mRNAs and is present in the cell in limited quantities, implying that the interaction of a target mRNA with a specific miRNA depends on its concentration and on the interactions of the miRNA with its other targets. In other words, since miRNA binding is exclusive, mRNA targets compete for the same miRNA pool. Therefore, the concentrations of the thereby coupled mRNAs depend not only on the miRNA concentration but also on the concentration of every competing mRNA that is targeted by the same miRNA. To study this, HEK 293 cell lines were constructed to inducibly express a miRNA (hsa-miR-199a) as well as the mRNA encoding a green fluorescent protein. Express from the same promoter as the miRNA, this mRNA allows the monitoring of the miRNA concentration. The study aimed not only to determine the parameters of individual mRNA-mRNA interactions, but also to assess the degree to which mRNAs act in a competitive manner to influence each other's expression. scRNA-seq was chosen to bring the resolution needed to reach these goals. The effect of the miRNA on a bound target is to increase its decay rate, hence the expression levels of the targets depends on the miRNA concentration and their binding energy. To gain insight into the target binding energy, we constructed a model considering mRNA transcription rate, the miRNA-mRNA binding/unbinding rate, the mRNA decay rates in the bound and unbound state, and the free/bound concentration of miRNA. We showed that the model can be factored in terms of the miRNA concentrations in individual cells and the miRNA-mRNA target interaction parameters and we solved the model to obtain estimates of miRNA-mRNA interaction parameters, which we showed explain the mRNA levels in cells more accurately than the sequence-based computationally predicted interaction energies. Finally, in "Bayesian inference of the gene expression states from single-cell RNA-seq data" I carried out fundamental technical work on the normalisation of count data obtained in scRNA-seq experiments. As introduced above, multiple strategies have been developed with the aim of reducing the high level of noise present on such data, and estimating a 'true' biological state of expression for each gene in each cell. While the project aimed to reconstruct the Waddington landscape of regulator activity based on the single cell gene expression measurements, at the start of the project we realised that there is no satisfactory solution to gene expression normalisation in single cells in the literature. Thus, we tackled this problem with a Bayesian model, considering each gene independently and inferring a posterior probability of gene expression in each cell. Our model assumes a log-normal distribution of gene expression across cells and additional Poisson noise caused by the stochastic process of gene expression and the sampling process introduced by the mRNA capture in experimental protocols. These normalised gene expression values are the basis of a motif-activity response based approach for inferring the activity of TFs and miRNAs in individual cells, and for reconstructing the underlying landscape. The application of this normalisation algorithm to reconstruct a landscape is presented in the last part, "Realizing Waddington’s metaphor: Inferring regulatory landscapes from single-cell gene expression data". There I present the mathematical principles needed to formally define a landscape following the idea of Waddington from 1957, and I propose two applications of the landscape. First I show that it defines cell types as local minima, and secondly, in the case of cells undergoing differentiation, I show how the landscape can be used to find developmental path and the transcription factors associated with the differentiation process

    Hidden Markov Models

    Get PDF
    Hidden Markov Models (HMMs), although known for decades, have made a big career nowadays and are still in state of development. This book presents theoretical issues and a variety of HMMs applications in speech recognition and synthesis, medicine, neurosciences, computational biology, bioinformatics, seismology, environment protection and engineering. I hope that the reader will find this book useful and helpful for their own research

    Evolutionary genomics : statistical and computational methods

    Get PDF
    This open access book addresses the challenge of analyzing and understanding the evolutionary dynamics of complex biological systems at the genomic level, and elaborates on some promising strategies that would bring us closer to uncovering of the vital relationships between genotype and phenotype. After a few educational primers, the book continues with sections on sequence homology and alignment, phylogenetic methods to study genome evolution, methodologies for evaluating selective pressures on genomic sequences as well as genomic evolution in light of protein domain architecture and transposable elements, population genomics and other omics, and discussions of current bottlenecks in handling and analyzing genomic data. Written for the highly successful Methods in Molecular Biology series, chapters include the kind of detail and expert implementation advice that lead to the best results. Authoritative and comprehensive, Evolutionary Genomics: Statistical and Computational Methods, Second Edition aims to serve both novices in biology with strong statistics and computational skills, and molecular biologists with a good grasp of standard mathematical concepts, in moving this important field of study forward

    Refactoring the UrQMD model for many-core architectures

    Get PDF
    Ultrarelativistic Quantum Molecular Dynamics is a physics model to describe the transport, collision, scattering, and decay of nuclear particles. The UrQMD framework has been in use for nearly 20 years since its first development. In this period computing aspects, the design of code, and the efficiency of computation have been minor points of interest. Nowadays an additional issue arises due to the fact that the run time of the framework does not diminish any more with new hardware generations. The current development in computing hardware is mainly focused on parallelism. Especially in scientific applications a high order of parallelisation can be achieved due to the superposition principle. In this thesis it is shown how modern design criteria and algorithm redesign are applied to physics frameworks. The redesign with a special emphasise on many-core architectures allows for significant improvements of the execution speed. The most time consuming part of UrQMD is a newly introduced relativistic hydrodynamic phase. The algorithm used to simulate the hydrodynamic evolution is the SHASTA. As the sequential form of SHASTA is successfully applied in various simulation frameworks for heavy ion collisions its possible parallelisation is analysed. Two different implementations of SHASTA are presented. The first one is an improved sequential implementation. By applying a more concise design and evading unnecessary memory copies, the execution time could be reduced to the half of the FORTRAN version’s execution time. The usage of memory could be reduced by 80% compared to the memory needed in the original version. The second implementation concentrates fully on the usage of many-core architectures and deviates significantly from the classical implementation. Contrary to the sequential implementation, it follows the recalculate instead of memory look-up paradigm. By this means the execution speed could be accelerated up to a factor of 460 on GPUs. Additionally a stability analysis of the UrQMD model is presented. Applying metapro- gramming UrQMD is compiled and executed in a massively parallel setup. The resulting simulation data of all parallel UrQMD instances were hereafter gathered and analysed. Hence UrQMD could be proven of high stability to the uncertainty of experimental data. As a further application of modern programming paradigms a prototypical implementa- tion of the worldline formalism is presented. This formalism allows for a direct calculation of Feynman integrals and constitutes therefore an interesting enhancement for the UrQMD model. Its massively parallel implementation on GPUs is examined
    • …
    corecore