271 research outputs found

    Bayesian inference for structured additive regression models for large-scale problems with applications to medical imaging

    Get PDF
    In der angewandten Statistik können Regressionsmodelle mit hochdimensionalen Koeffizienten auftreten, die sich nicht mit gewöhnlichen Computersystemen schätzen lassen. Dies betrifft unter anderem die Analyse digitaler Bilder unter Berücksichtigung räumlich-zeitlicher Abhängigkeiten, wie sie innerhalb der medizinisch-biologischen Forschung häufig vorkommen. In der vorliegenden Arbeit wird ein Verfahren formuliert, das in der Lage ist, Regressionsmodelle mit hochdimensionalen Koeffizienten und nicht-normalverteilten Zielgrößen unter moderaten Anforderungen an die benötigte Hardware zu schätzen. Hierzu wird zunächst im Rahmen strukturiert additiver Regressionsmodelle aufgezeigt, worin die Limitationen aktueller Inferenzansätze bei der Anwendung auf hochdimensionale Problemstellungen liegen, sowie Möglichkeiten diskutiert, diese zu umgehen. Darauf basierend wird ein Algorithmus formuliert, dessen Stärken und Schwächen anhand von Simulationsstudien analysiert werden. Darüber hinaus findet das Verfahren Anwendung in drei verschiedenen Bereichen der medizinisch-biologischen Bildgebung und zeigt dadurch, dass es ein vielversprechender Kandidat für die Beantwortung hochdimensionaler Fragestellungen ist.In applied statistics regression models with high-dimensional coefficients can occur which cannot be estimated using ordinary computers. Amongst others, this applies to the analysis of digital images taking spatio-temporal dependencies into account as they commonly occur within bio-medical research. In this thesis a procedure is formulated which allows to fit regression models with high-dimensional coefficients and non-normal response values requiring only moderate computational equipment. To this end, limitations of different inference strategies for structured additive regression models are demonstrated when applied to high-dimensional problems and possible solutions are discussed. Based thereon an algorithm is formulated whose strengths and weaknesses are subsequently analyzed using simulation studies. Furthermore, the procedure is applied to three different fields of bio-medical imaging from which can be concluded that the algorithm is a promising candidate for answering high-dimensional problems

    Revisiting Hybridization Kinetics with Improved Elementary Step Simulation

    Get PDF
    Nucleic acid strands, which react by forming and breaking Watson-Crick base pairs, can be designed to form complex nanoscale structures or devices. Controlling such systems requires accurate predictions of the reaction rate and of the folding pathways of interacting strands. Simulators such as Multistrand model these kinetic properties using continuous-time Markov chains (CTMCs), whose states and transitions correspond to secondary structures and elementary base pair changes, respectively. The transient dynamics of a CTMC are determined by a kinetic model, which assigns transition rates to pairs of states, and the rate of a reaction can be estimated using the mean first passage time (MFPT) of its CTMC. However, use of Multistrand is limited by its slow runtime, particularly on rare events, and the quality of its rate predictions is compromised by a poorly-calibrated and simplistic kinetic model. The former limitation can be addressed by constructing truncated CTMCs, which only include a small subset of states and transitions, selected either manually or through simulation. As a first step to address the latter limitation, Bayesian posterior inference in an Arrhenius-type kinetic model was performed in earlier work, using a small experimental dataset of DNA reaction rates and a fixed set of manually truncated CTMCs, which we refer to as Assumed Pathway (AP) state spaces. In this work we extend this approach, by introducing a new prior model that is directly motivated by the physical meaning of the parameters and that is compatible with experimental measurements of elementary rates, and by using a larger dataset of 1105 reactions as well as larger truncated state spaces obtained from the recently introduced stochastic Pathway Elaboration (PE) method. We assess the quality of the resulting posterior distribution over kinetic parameters, as well as the quality of the posterior reaction rates predicted using AP and PE state spaces. Finally, we use the newly parameterised PE state spaces and Multistrand simulations to investigate the strong variation of helix hybridization reaction rates in a dataset of Hata et al. While we find strong evidence for the nucleation-zippering model of hybridization, in the classical sense that the rate-limiting phase is composed of elementary steps reaching a small "nucleus" of critical stability, the strongly sequence-dependent structure of the trajectory ensemble up to nucleation appears to be much richer than assumed in the model by Hata et al. In particular, rather than being dominated by the collision probability of nucleation sites, the trajectory segment between first binding and nucleation tends to visit numerous secondary structures involving misnucleation and hairpins, and has a sizeable effect on the probability of overcoming the nucleation barrier

    Probabilistic reasoning and inference for systems biology

    Get PDF
    One of the important challenges in Systems Biology is reasoning and performing hypotheses testing in uncertain conditions, when available knowledge may be incomplete and the experimental data may contain substantial noise. In this thesis we develop methods of probabilistic reasoning and inference that operate consistently within an environment of uncertain knowledge and data. Mechanistic mathematical models are used to describe hypotheses about biological systems. We consider both deductive model based reasoning and model inference from data. The main contributions are a novel modelling approach using continuous time Markov chains that enables deductive derivation of model behaviours and their properties, and the application of Bayesian inferential methods to solve the inverse problem of model inference and comparison, given uncertain knowledge and noisy data. In the first part of the thesis, we consider both individual and population based techniques for modelling biochemical pathways using continuous time Markov chains, and demonstrate why the latter is the most appropriate. We illustrate a new approach, based on symbolic intervals of concentrations, with an example portion of the ERK signalling pathway. We demonstrate that the resulting model approximates the same dynamic system as traditionally defined using ordinary differential equations. The advantage of the new approach is quantitative logical analysis; we formulate a number of biologically significant queries in the temporal logic CSL and use probabilistic symbolic model checking to investigate their veracity. In the second part of the thesis, we consider the inverse problem of model inference and testing of alternative hypotheses, when models are defined by non-linear ordinary differential equations and the experimental data is noisy and sparse. We compare and evaluate a number of statistical techniques, and implement an effective Bayesian inferential framework for systems biology based on Markov chain Monte Carlo methods and estimation of marginal likelihoods by annealing-melting integration. We illustrate the framework with two case studies, one of which involves an open problem concerning the mediation of ERK phosphorylation in the ERK pathway

    Differential geometric MCMC methods and applications

    Get PDF
    This thesis presents novel Markov chain Monte Carlo methodology that exploits the natural representation of a statistical model as a Riemannian manifold. The methods developed provide generalisations of the Metropolis-adjusted Langevin algorithm and the Hybrid Monte Carlo algorithm for Bayesian statistical inference, and resolve many shortcomings of existing Monte Carlo algorithms when sampling from target densities that may be high dimensional and exhibit strong correlation structure. The performance of these Riemannian manifold Markov chain Monte Carlo algorithms is rigorously assessed by performing Bayesian inference on logistic regression models, log-Gaussian Cox point process models, stochastic volatility models, and both parameter and model level inference of dynamical systems described by nonlinear differential equations

    Uncertainty Quantification of geochemical and mechanical compaction in layered sedimentary basins

    Get PDF
    In this work we propose an Uncertainty Quantification methodology for sedimentary basins evolution under mechanical and geochemical compaction processes, which we model as a coupled, time-dependent, non-linear, monodimensional (depth-only) system of PDEs with uncertain parameters. While in previous works (Formaggia et al. 2013, Porta et al., 2014) we assumed a simplified depositional history with only one material, in this work we consider multi-layered basins, in which each layer is characterized by a different material, and hence by different properties. This setting requires several improvements with respect to our earlier works, both concerning the deterministic solver and the stochastic discretization. On the deterministic side, we replace the previous fixed-point iterative solver with a more efficient Newton solver at each step of the time-discretization. On the stochastic side, the multi-layered structure gives rise to discontinuities in the dependence of the state variables on the uncertain parameters, that need an appropriate treatment for surrogate modeling techniques, such as sparse grids, to be effective. We propose an innovative methodology to this end which relies on a change of coordinate system to align the discontinuities of the target function within the random parameter space. The reference coordinate system is built upon exploiting physical features of the problem at hand. We employ the locations of material interfaces, which display a smooth dependence on the random parameters and are therefore amenable to sparse grid polynomial approximations. We showcase the capabilities of our numerical methodologies through two synthetic test cases. In particular, we show that our methodology reproduces with high accuracy multi-modal probability density functions displayed by target state variables (e.g., porosity).Comment: 25 pages, 30 figure

    Gaussian Process Modelling for Uncertainty Quantification in Convectively-Enhanced Dissolution Processes in Porous Media

    Get PDF
    Numerical groundwater flow and dissolution models of physico-chemical processes in deep aquifers are usually subject to uncertainty in one or more of the model input parameters. This uncertainty is propagated through the equations and needs to be quantified and characterised in order to rely on the model outputs. In this paper we present a Gaussian process emulation method as a tool for performing uncertainty quantification in mathematical models for convection and dissolution processes in porous media. One of the advantages of this method is its ability to significantly reduce the computational cost of an uncertainty analysis, while yielding accurate results, compared to classical Monte Carlo methods. We apply the methodology to a model of convectively-enhanced dissolution processes occurring during carbon capture and storage. In this model, the Gaussian process methodology fails due to the presence of multiple branches of solutions emanating from a bifurcation point, i.e., two equilibrium states exist rather than one. To overcome this issue we use a classifier as a precursor to the Gaussian process emulation, after which we are able to successfully perform a full uncertainty analysis in the vicinity of the bifurcation point

    Bayesian inference for structured additive regression models for large-scale problems with applications to medical imaging

    Get PDF
    In der angewandten Statistik können Regressionsmodelle mit hochdimensionalen Koeffizienten auftreten, die sich nicht mit gewöhnlichen Computersystemen schätzen lassen. Dies betrifft unter anderem die Analyse digitaler Bilder unter Berücksichtigung räumlich-zeitlicher Abhängigkeiten, wie sie innerhalb der medizinisch-biologischen Forschung häufig vorkommen. In der vorliegenden Arbeit wird ein Verfahren formuliert, das in der Lage ist, Regressionsmodelle mit hochdimensionalen Koeffizienten und nicht-normalverteilten Zielgrößen unter moderaten Anforderungen an die benötigte Hardware zu schätzen. Hierzu wird zunächst im Rahmen strukturiert additiver Regressionsmodelle aufgezeigt, worin die Limitationen aktueller Inferenzansätze bei der Anwendung auf hochdimensionale Problemstellungen liegen, sowie Möglichkeiten diskutiert, diese zu umgehen. Darauf basierend wird ein Algorithmus formuliert, dessen Stärken und Schwächen anhand von Simulationsstudien analysiert werden. Darüber hinaus findet das Verfahren Anwendung in drei verschiedenen Bereichen der medizinisch-biologischen Bildgebung und zeigt dadurch, dass es ein vielversprechender Kandidat für die Beantwortung hochdimensionaler Fragestellungen ist.In applied statistics regression models with high-dimensional coefficients can occur which cannot be estimated using ordinary computers. Amongst others, this applies to the analysis of digital images taking spatio-temporal dependencies into account as they commonly occur within bio-medical research. In this thesis a procedure is formulated which allows to fit regression models with high-dimensional coefficients and non-normal response values requiring only moderate computational equipment. To this end, limitations of different inference strategies for structured additive regression models are demonstrated when applied to high-dimensional problems and possible solutions are discussed. Based thereon an algorithm is formulated whose strengths and weaknesses are subsequently analyzed using simulation studies. Furthermore, the procedure is applied to three different fields of bio-medical imaging from which can be concluded that the algorithm is a promising candidate for answering high-dimensional problems

    Identificación de la conductividad de un material cuando depende de la presión a la que está sometido

    Get PDF
    Depto. de Análisis Matemático y Matemática AplicadaInstituto de Matemática Interdisciplinar (IMI)Fac. de Ciencias MatemáticasTRUEpu
    corecore