3,980 research outputs found

    A Full Probabilistic Model for Yes/No Type Crowdsourcing in Multi-Class Classification

    Full text link
    Crowdsourcing has become widely used in supervised scenarios where training sets are scarce and difficult to obtain. Most crowdsourcing models in the literature assume labelers can provide answers to full questions. In classification contexts, full questions require a labeler to discern among all possible classes. Unfortunately, discernment is not always easy in realistic scenarios. Labelers may not be experts in differentiating all classes. In this work, we provide a full probabilistic model for a shorter type of queries. Our shorter queries only require "yes" or "no" responses. Our model estimates a joint posterior distribution of matrices related to labelers' confusions and the posterior probability of the class of every object. We developed an approximate inference approach, using Monte Carlo Sampling and Black Box Variational Inference, which provides the derivation of the necessary gradients. We built two realistic crowdsourcing scenarios to test our model. The first scenario queries for irregular astronomical time-series. The second scenario relies on the image classification of animals. We achieved results that are comparable with those of full query crowdsourcing. Furthermore, we show that modeling labelers' failures plays an important role in estimating true classes. Finally, we provide the community with two real datasets obtained from our crowdsourcing experiments. All our code is publicly available.Comment: SIAM International Conference on Data Mining (SDM19), 9 official pages, 5 supplementary page

    Model Selection in Overlapping Stochastic Block Models

    Full text link
    Networks are a commonly used mathematical model to describe the rich set of interactions between objects of interest. Many clustering methods have been developed in order to partition such structures, among which several rely on underlying probabilistic models, typically mixture models. The relevant hidden structure may however show overlapping groups in several applications. The Overlapping Stochastic Block Model (2011) has been developed to take this phenomenon into account. Nevertheless, the problem of the choice of the number of classes in the inference step is still open. To tackle this issue, we consider the proposed model in a Bayesian framework and develop a new criterion based on a non asymptotic approximation of the marginal log-likelihood. We describe how the criterion can be computed through a variational Bayes EM algorithm, and demonstrate its efficiency by running it on both simulated and real data.Comment: articl

    Statistical modelling of summary values leads to accurate Approximate Bayesian Computations

    Full text link
    Approximate Bayesian Computation (ABC) methods rely on asymptotic arguments, implying that parameter inference can be systematically biased even when sufficient statistics are available. We propose to construct the ABC accept/reject step from decision theoretic arguments on a suitable auxiliary space. This framework, referred to as ABC*, fully specifies which test statistics to use, how to combine them, how to set the tolerances and how long to simulate in order to obtain accuracy properties on the auxiliary space. Akin to maximum-likelihood indirect inference, regularity conditions establish when the ABC* approximation to the posterior density is accurate on the original parameter space in terms of the Kullback-Leibler divergence and the maximum a posteriori point estimate. Fundamentally, escaping asymptotic arguments requires knowledge of the distribution of test statistics, which we obtain through modelling the distribution of summary values, data points on a summary level. Synthetic examples and an application to time series data of influenza A (H3N2) infections in the Netherlands illustrate ABC* in action.Comment: Videos can be played with Acrobat Reader. Manuscript under review and not accepte

    Application of Bayesian graphs to SN Ia data analysis and compression

    Get PDF
    Bayesian graphical models are an efficient tool for modelling complex data and derive self-consistent expressions of the posterior distribution of model parameters. We apply Bayesian graphs to perform statistical analyses of Type Ia supernova (SN Ia) luminosity distance measurements from the joint light-curve analysis (JLA) data set. In contrast to the χ2\chi^2 approach used in previous studies, the Bayesian inference allows us to fully account for the standard-candle parameter dependence of the data covariance matrix. Comparing with χ2\chi^2 analysis results, we find a systematic offset of the marginal model parameter bounds. We demonstrate that the bias is statistically significant in the case of the SN Ia standardization parameters with a maximal 6 σ\sigma shift of the SN light-curve colour correction. In addition, we find that the evidence for a host galaxy correction is now only 2.4 σ\sigma. Systematic offsets on the cosmological parameters remain small, but may increase by combining constraints from complementary cosmological probes. The bias of the χ2\chi^2 analysis is due to neglecting the parameter-dependent log-determinant of the data covariance, which gives more statistical weight to larger values of the standardization parameters. We find a similar effect on compressed distance modulus data. To this end, we implement a fully consistent compression method of the JLA data set that uses a Gaussian approximation of the posterior distribution for fast generation of compressed data. Overall, the results of our analysis emphasize the need for a fully consistent Bayesian statistical approach in the analysis of future large SN Ia data sets.Comment: 14 pages, 13 figures, 5 tables. Submitted to MNRAS. Compression utility available at https://gitlab.com/congma/libsncompress/ and example cosmology code with machine-readable version of Tables A1 & A2 at https://gitlab.com/congma/sn-bayesian-model-example/ v2: corrected typo in author's name. v3: 15 pages, incl. corrections, matches the accepted versio

    Policy Makers Priors and Inflation Density Forecasts

    Get PDF
    This paper models an inflation forecast density framework that closely resembles actual policy makers behaviour regarding the determination of the modal point, the uncertainty and asymmetry in the inflation forecasts. The framework combines policy makers prior information about these parameters with a standard parametric density estimation technique using Bayesian theory. The combination crucially hinges on an information-theoretic utility function gains of the policy maker from performing the forecast exercise.Monetary Policy, Inflation Targeting, Bayesian Methods

    Evaluating Process-Based Integrated Assessment Models of Climate Change Mitigation

    Get PDF
    Process-based integrated assessment models (IAMs) analyse transformation pathways to mitigate climate change. Confidence in models is established by testing their structural assumptions and comparing their behaviour against observations as well as other models. Climate model evaluation is concerted, and prominently reported in a dedicated chapter in the IPCC WG1 assessments. By comparison, evaluation of process-based IAMs tends to be less visible and more dispersed among modelling teams, with the exception of model inter-comparison projects. We contribute the first comprehensive analysis of process-based IAM evaluation, drawing on a wide range of examples across eight different evaluation methods testing both structural and behavioural validity. For each evaluation method, we compare its application to process-based IAMs with its application to climate models, noting similarities and differences, and seeking useful insights for strengthening the evaluation of process-based IAMs. We find that each evaluation method has distinctive strengths and limitations, as well as constraints on their application. We develop a systematic evaluation framework combining multiple methods that should be embedded within the development and use of process-based IAMs

    Bayesian phylogeography of influenza A/H3N2 for the 2014-15 season in the United States using three frameworks of ancestral state reconstruction

    Get PDF
    abstract: Ancestral state reconstructions in Bayesian phylogeography of virus pandemics have been improved by utilizing a Bayesian stochastic search variable selection (BSSVS) framework. Recently, this framework has been extended to model the transition rate matrix between discrete states as a generalized linear model (GLM) of genetic, geographic, demographic, and environmental predictors of interest to the virus and incorporating BSSVS to estimate the posterior inclusion probabilities of each predictor. Although the latter appears to enhance the biological validity of ancestral state reconstruction, there has yet to be a comparison of phylogenies created by the two methods. In this paper, we compare these two methods, while also using a primitive method without BSSVS, and highlight the differences in phylogenies created by each. We test six coalescent priors and six random sequence samples of H3N2 influenza during the 2014–15 flu season in the U.S. We show that the GLMs yield significantly greater root state posterior probabilities than the two alternative methods under five of the six priors, and significantly greater Kullback-Leibler divergence values than the two alternative methods under all priors. Furthermore, the GLMs strongly implicate temperature and precipitation as driving forces of this flu season and nearly unanimously identified a single root state, which exhibits the most tropical climate during a typical flu season in the U.S. The GLM, however, appears to be highly susceptible to sampling bias compared with the other methods, which casts doubt on whether its reconstructions should be favored over those created by alternate methods. We report that a BSSVS approach with a Poisson prior demonstrates less bias toward sample size under certain conditions than the GLMs or primitive models, and believe that the connection between reconstruction method and sampling bias warrants further investigation.The article is published at http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.100538
    corecore