633,169 research outputs found

    Model-order selection in statistical shape models

    Full text link
    Statistical shape models enhance machine learning algorithms providing prior information about deformation. A Point Distribution Model (PDM) is a popular landmark-based statistical shape model for segmentation. It requires choosing a model order, which determines how much of the variation seen in the training data is accounted for by the PDM. A good choice of the model order depends on the number of training samples and the noise level in the training data set. Yet the most common approach for choosing the model order simply keeps a predetermined percentage of the total shape variation. In this paper, we present a technique for choosing the model order based on information-theoretic criteria, and we show empirical evidence that the model order chosen by this technique provides a good trade-off between over- and underfitting.Comment: To appear in 2018 IEEE International Workshop on Machine Learning for Signal Processing, Sept.\ 17--20, 2018, Aalborg, Denmar

    Doctor of Philosophy in Computing

    Get PDF
    dissertationAn important area of medical imaging research is studying anatomical diffeomorphic shape changes and detecting their relationship to disease processes. For example, neurodegenerative disorders change the shape of the brain, thus identifying differences between the healthy control subjects and patients affected by these diseases can help with understanding the disease processes. Previous research proposed a variety of mathematical approaches for statistical analysis of geometrical brain structure in three-dimensional (3D) medical imaging, including atlas building, brain variability quantification, regression, etc. The critical component in these statistical models is that the geometrical structure is represented by transformations rather than the actual image data. Despite the fact that such statistical models effectively provide a way for analyzing shape variation, none of them have a truly probabilistic interpretation. This dissertation contributes a novel Bayesian framework of statistical shape analysis for generic manifold data and its application to shape variability and brain magnetic resonance imaging (MRI). After we carefully define the distributions on manifolds, we then build Bayesian models for analyzing the intrinsic variability of manifold data, involving the mean point, principal modes, and parameter estimation. Because there is no closed-form solution for Bayesian inference of these models on manifolds, we develop a Markov Chain Monte Carlo method to sample the hidden variables from the distribution. The main advantages of these Bayesian approaches are that they provide parameter estimation and automatic dimensionality reduction for analyzing generic manifold-valued data, such as diffeomorphisms. Modeling the mean point of a group of images in a Bayesian manner allows for learning the regularity parameter from data directly rather than having to set it manually, which eliminates the effort of cross validation for parameter selection. In population studies, our Bayesian model of principal modes analysis (1) automatically extracts a low-dimensional, second-order statistics of manifold data variability and (2) gives a better geometric data fit than nonprobabilistic models. To make this Bayesian framework computationally more efficient for high-dimensional diffeomorphisms, this dissertation presents an algorithm, FLASH (finite-dimensional Lie algebras for shooting), that hugely speeds up the diffeomorphic image registration. Instead of formulating diffeomorphisms in a continuous variational problem, Flash defines a completely new discrete reparameterization of diffeomorphisms in a low-dimensional bandlimited velocity space, which results in the Bayesian inference via sampling on the space of diffeomorphisms being more feasible in time. Our entire Bayesian framework in this dissertation is used for statistical analysis of shape data and brain MRIs. It has the potential to improve hypothesis testing, classification, and mixture models

    Make the most of your samples : Bayes factor estimators for high-dimensional models of sequence evolution

    Get PDF
    Background: Accurate model comparison requires extensive computation times, especially for parameter-rich models of sequence evolution. In the Bayesian framework, model selection is typically performed through the evaluation of a Bayes factor, the ratio of two marginal likelihoods (one for each model). Recently introduced techniques to estimate (log) marginal likelihoods, such as path sampling and stepping-stone sampling, offer increased accuracy over the traditional harmonic mean estimator at an increased computational cost. Most often, each model's marginal likelihood will be estimated individually, which leads the resulting Bayes factor to suffer from errors associated with each of these independent estimation processes. Results: We here assess the original 'model-switch' path sampling approach for direct Bayes factor estimation in phylogenetics, as well as an extension that uses more samples, to construct a direct path between two competing models, thereby eliminating the need to calculate each model's marginal likelihood independently. Further, we provide a competing Bayes factor estimator using an adaptation of the recently introduced stepping-stone sampling algorithm and set out to determine appropriate settings for accurately calculating such Bayes factors, with context-dependent evolutionary models as an example. While we show that modest efforts are required to roughly identify the increase in model fit, only drastically increased computation times ensure the accuracy needed to detect more subtle details of the evolutionary process. Conclusions: We show that our adaptation of stepping-stone sampling for direct Bayes factor calculation outperforms the original path sampling approach as well as an extension that exploits more samples. Our proposed approach for Bayes factor estimation also has preferable statistical properties over the use of individual marginal likelihood estimates for both models under comparison. Assuming a sigmoid function to determine the path between two competing models, we provide evidence that a single well-chosen sigmoid shape value requires less computational efforts in order to approximate the true value of the (log) Bayes factor compared to the original approach. We show that the (log) Bayes factors calculated using path sampling and stepping-stone sampling differ drastically from those estimated using either of the harmonic mean estimators, supporting earlier claims that the latter systematically overestimate the performance of high-dimensional models, which we show can lead to erroneous conclusions. Based on our results, we argue that highly accurate estimation of differences in model fit for high-dimensional models requires much more computational effort than suggested in recent studies on marginal likelihood estimation

    Prediction of n-octanol-water partition coefficient for polychlorinated biphenyls from theoretical molecular descriptors

    No full text
    A quantitative structure-property relationship (QSPR) study was performed to develop models that relate the structures of 133 polychlorinated biphenyls to their n-octanol-water partition coefficients (log Kow). Molecular descriptors were derived solely from 3D structures of the molecules. The genetic algorithm-partial least squares (GA-PLS) method was applied as a variable selection tool.  The partial least square (PLS) method was used to select the best descriptors and the selected descriptors were used as input neurons in neural network model. These descriptors are: Balabane index (J), XY Shadow (SXY), Kier shape index (order 3) (3к), Wiener index (W) and Maximum valency of C atom (VmaxC). The use of descriptors calculated only from molecular structure eliminates the need for experimental determination of properties for use in the correlation and allows for the estimation of log Kow for molecules not yet synthesized. The root mean square errors for ANN predicted partition coefficients of training, test and external validation sets were 0.063, 0.112 and 0.126, respectively, while these values are 0.230, 0.164 and 0.297 for the PLS model, respectively. Comparison between these values and other statistical parameters for these two models revealed the superiority of the ANN over the PLS model

    Exact Dimensionality Selection for Bayesian PCA

    Get PDF
    We present a Bayesian model selection approach to estimate the intrinsic dimensionality of a high-dimensional dataset. To this end, we introduce a novel formulation of the probabilisitic principal component analysis model based on a normal-gamma prior distribution. In this context, we exhibit a closed-form expression of the marginal likelihood which allows to infer an optimal number of components. We also propose a heuristic based on the expected shape of the marginal likelihood curve in order to choose the hyperparameters. In non-asymptotic frameworks, we show on simulated data that this exact dimensionality selection approach is competitive with both Bayesian and frequentist state-of-the-art methods

    An update on statistical boosting in biomedicine

    Get PDF
    Statistical boosting algorithms have triggered a lot of research during the last decade. They combine a powerful machine-learning approach with classical statistical modelling, offering various practical advantages like automated variable selection and implicit regularization of effect estimates. They are extremely flexible, as the underlying base-learners (regression functions defining the type of effect for the explanatory variables) can be combined with any kind of loss function (target function to be optimized, defining the type of regression setting). In this review article, we highlight the most recent methodological developments on statistical boosting regarding variable selection, functional regression and advanced time-to-event modelling. Additionally, we provide a short overview on relevant applications of statistical boosting in biomedicine

    Constraining scalar resonances with top-quark pair production at the LHC

    Get PDF
    Constraints on models which predict resonant top-quark pair production at the LHC are provided via a reinterpretation of the Standard Model (SM) particle level measurement of the top-anti-top invariant mass distribution, m(ttˉ)m(t\bar{t}). We make use of state-of-the-art Monte Carlo event simulation to perform a direct comparison with measurements of m(ttˉ)m(t\bar{t}) in the semi-leptonic channels, considering both the boosted and the resolved regime of the hadronic top decays. A simplified model to describe various scalar resonances decaying into top-quarks is considered, including CP-even and CP-odd, color-singlet and color-octet states, and the excluded regions in the respective parameter spaces are provided.Comment: 34 pages, 17 figure

    A Unified Framework of Constrained Regression

    Full text link
    Generalized additive models (GAMs) play an important role in modeling and understanding complex relationships in modern applied statistics. They allow for flexible, data-driven estimation of covariate effects. Yet researchers often have a priori knowledge of certain effects, which might be monotonic or periodic (cyclic) or should fulfill boundary conditions. We propose a unified framework to incorporate these constraints for both univariate and bivariate effect estimates and for varying coefficients. As the framework is based on component-wise boosting methods, variables can be selected intrinsically, and effects can be estimated for a wide range of different distributional assumptions. Bootstrap confidence intervals for the effect estimates are derived to assess the models. We present three case studies from environmental sciences to illustrate the proposed seamless modeling framework. All discussed constrained effect estimates are implemented in the comprehensive R package mboost for model-based boosting.Comment: This is a preliminary version of the manuscript. The final publication is available at http://link.springer.com/article/10.1007/s11222-014-9520-
    • …
    corecore