7 research outputs found

    A Robust Solution to Variational Importance Sampling of Minimum Variance

    Get PDF
    Importance sampling is a Monte Carlo method where samples are obtained from an alternative proposal distribution. This can be used to focus the sampling process in the relevant parts of space, thus reducing the variance. Selecting the proposal that leads to the minimum variance can be formulated as an optimization problem and solved, for instance, by the use of a variational approach. Variational inference selects, from a given family, the distribution which minimizes the divergence to the distribution of interest. The Rényi projection of order 2 leads to the importance sampling estimator of minimum variance, but its computation is very costly. In this study with discrete distributions that factorize over probabilistic graphical models, we propose and evaluate an approximate projection method onto fully factored distributions. As a result of our evaluation it becomes apparent that a proposal distribution mixing the information projection with the approximate Rényi projection of order 2 could be interesting from a practical perspective

    Modeling three sources of uncertainty in assisted reproductive technologies with probabilistic graphical models

    Full text link
    Embryo selection is a critical step in assisted reproduction: good selection criteria are expected to increase the probability of inducing a pregnancy. Machine learning techniques have been applied for implantation prediction or embryo quality assessment, which embryologists can use to make a decision about embryo selection. However, this is a highly uncertain real-world problem, and current proposals do not model always all the sources of uncertainty. We present a novel probabilistic graphical model that accounts for three different sources of uncertainty, the standard embryo and cycle viability, and a third one that represents any unknown factor that can drive a treatment to a failure in otherwise perfect conditions. We derive a parametric learning method based on the Expectation-Maximization strategy, which accounts for uncertainty issues. We empirically analyze the model within a real database consisting of 604 cycles (3125 embryos) carried out at Hospital Donostia (Spain). Embryologists followed the protocol of the Spanish Association for Reproduction Biology Studies (ASEBIR), based on morphological features, for embryo selection. Our model predictions are correlated with the ASEBIR protocol, which validates our model. The benefits of accounting for the different sources of uncertainty and the importance of the cycle characteristics are shown. Considering only transferred embryos, our model does not further discriminate them as implanted or failed, suggesting that the ASEBIR protocol could be understood as a thorough summary of the available morphological features

    A Conceptual Probabilistic Framework for Annotation Aggregation of Citizen Science Data

    Get PDF
    Over the last decade, hundreds of thousands of volunteers have contributed to science by collecting or analyzing data. This public participation in science, also known as citizen science, has contributed to significant discoveries and led to publications in major scientific journals. However, little attention has been paid to data quality issues. In this work we argue that being able to determine the accuracy of data obtained by crowdsourcing is a fundamental question and we point out that, for many real-life scenarios, mathematical tools and processes for the evaluation of data quality are missing. We propose a probabilistic methodology for the evaluation of the accuracy of labeling data obtained by crowdsourcing in citizen science. The methodology builds on an abstract probabilistic graphical model formalism, which is shown to generalize some already existing label aggregation models. We show how to make practical use of the methodology through a comparison of data obtained from different citizen science communities analyzing the earthquake that took place in Albania in 2019

    To select or to weigh: a comparative study of linear combination schemes for superparent-one-dependence estimators

    Get PDF
    We conduct a large-scale comparative study on linearly combining superparent-one-dependence estimators (SPODEs), a popular family of seminaive Bayesian classifiers. Altogether, 16 model selection and weighing schemes, 58 benchmark data sets, and various statistical tests are employed. This paper's main contributions are threefold. First, it formally presents each scheme's definition, rationale, and time complexity and hence can serve as a comprehensive reference for researchers interested in ensemble learning. Second, it offers bias-variance analysis for each scheme's classification error performance. Third, it identifies effective schemes that meet various needs in practice. This leads to accurate and fast classification algorithms which have an immediate and significant impact on real-world applications. Another important feature of our study is using a variety of statistical tests to evaluate multiple learning methods across multiple data sets

    To select or to weigh: a comparative study of linear combination schemes for superparent-one-dependence estimators

    No full text
    We conduct a large-scale comparative study on linearly combining superparent-one-dependence estimators (SPODEs), a popular family of seminaive Bayesian classifiers. Altogether, 16 model selection and weighing schemes, 58 benchmark data sets, and various statistical tests are employed. This paper's main contributions are threefold. First, it formally presents each scheme's definition, rationale, and time complexity and hence can serve as a comprehensive reference for researchers interested in ensemble learning. Second, it offers bias-variance analysis for each scheme's classification error performance. Third, it identifies effective schemes that meet various needs in practice. This leads to accurate and fast classification algorithms which have an immediate and significant impact on real-world applications. Another important feature of our study is using a variety of statistical tests to evaluate multiple learning methods across multiple data sets
    corecore