Search CORE

3,980 research outputs found

A Full Probabilistic Model for Yes/No Type Crowdsourcing in Multi-Class Classification

Author: Pichara Karim
Protopapas Pavlos
Saldias Belen
Publication venue: 'Society for Industrial & Applied Mathematics (SIAM)'
Publication date: 13/08/2019
Field of study

Crowdsourcing has become widely used in supervised scenarios where training sets are scarce and difficult to obtain. Most crowdsourcing models in the literature assume labelers can provide answers to full questions. In classification contexts, full questions require a labeler to discern among all possible classes. Unfortunately, discernment is not always easy in realistic scenarios. Labelers may not be experts in differentiating all classes. In this work, we provide a full probabilistic model for a shorter type of queries. Our shorter queries only require "yes" or "no" responses. Our model estimates a joint posterior distribution of matrices related to labelers' confusions and the posterior probability of the class of every object. We developed an approximate inference approach, using Monte Carlo Sampling and Black Box Variational Inference, which provides the derivation of the necessary gradients. We built two realistic crowdsourcing scenarios to test our model. The first scenario queries for irregular astronomical time-series. The second scenario relies on the image classification of animals. We achieved results that are comparable with those of full query crowdsourcing. Furthermore, we show that modeling labelers' failures plays an important role in estimating true classes. Finally, we provide the community with two real datasets obtained from our crowdsourcing experiments. All our code is publicly available.Comment: SIAM International Conference on Data Mining (SDM19), 9 official pages, 5 supplementary page

arXiv.org e-Print Archive

Crossref

Model Selection in Overlapping Stochastic Block Models

Author: Ambroise C.
Birmelé E.
Latouche P.
Publication venue
Publication date: 01/01/2014
Field of study

Networks are a commonly used mathematical model to describe the rich set of interactions between objects of interest. Many clustering methods have been developed in order to partition such structures, among which several rely on underlying probabilistic models, typically mixture models. The relevant hidden structure may however show overlapping groups in several applications. The Overlapping Stochastic Block Model (2011) has been developed to take this phenomenon into account. Nevertheless, the problem of the choice of the number of classes in the inference step is still open. To tackle this issue, we consider the proposed model in a Bayesian framework and develop a new criterion based on a non asymptotic approximation of the marginal log-likelihood. We describe how the criterion can be computed through a variational Bayes EM algorithm, and demonstrate its efficiency by running it on both simulated and real data.Comment: articl

arXiv.org e-Print Archive

Statistical modelling of summary values leads to accurate Approximate Bayesian Computations

Author: Camacho Anton
Donker Gé
Meijer Adam
Ratmann Oliver
Publication venue
Publication date: 22/01/2014
Field of study

Approximate Bayesian Computation (ABC) methods rely on asymptotic arguments, implying that parameter inference can be systematically biased even when sufficient statistics are available. We propose to construct the ABC accept/reject step from decision theoretic arguments on a suitable auxiliary space. This framework, referred to as ABC*, fully specifies which test statistics to use, how to combine them, how to set the tolerances and how long to simulate in order to obtain accuracy properties on the auxiliary space. Akin to maximum-likelihood indirect inference, regularity conditions establish when the ABC* approximation to the posterior density is accurate on the original parameter space in terms of the Kullback-Leibler divergence and the maximum a posteriori point estimate. Fundamentally, escaping asymptotic arguments requires knowledge of the distribution of test statistics, which we obtain through modelling the distribution of summary values, data points on a summary level. Synthetic examples and an application to time series data of influenza A (H3N2) infections in the Netherlands illustrate ABC* in action.Comment: Videos can be played with Acrobat Reader. Manuscript under review and not accepte

arXiv.org e-Print Archive

CiteSeerX

Application of Bayesian graphs to SN Ia data analysis and compression

Author: Bruce A. Bassett
Cong Ma
Dovì
Huterer
Mosher
Patil
Pier-Stefano Corasaniti
Rao
Rubin
Tripp
White
Wigner
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2016
Field of study

Bayesian graphical models are an efficient tool for modelling complex data and derive self-consistent expressions of the posterior distribution of model parameters. We apply Bayesian graphs to perform statistical analyses of Type Ia supernova (SN Ia) luminosity distance measurements from the joint light-curve analysis (JLA) data set. In contrast to the

\chi^2

approach used in previous studies, the Bayesian inference allows us to fully account for the standard-candle parameter dependence of the data covariance matrix. Comparing with

\chi^2

analysis results, we find a systematic offset of the marginal model parameter bounds. We demonstrate that the bias is statistically significant in the case of the SN Ia standardization parameters with a maximal 6

\sigma

shift of the SN light-curve colour correction. In addition, we find that the evidence for a host galaxy correction is now only 2.4

\sigma

. Systematic offsets on the cosmological parameters remain small, but may increase by combining constraints from complementary cosmological probes. The bias of the

\chi^2

analysis is due to neglecting the parameter-dependent log-determinant of the data covariance, which gives more statistical weight to larger values of the standardization parameters. We find a similar effect on compressed distance modulus data. To this end, we implement a fully consistent compression method of the JLA data set that uses a Gaussian approximation of the posterior distribution for fast generation of compressed data. Overall, the results of our analysis emphasize the need for a fully consistent Bayesian statistical approach in the analysis of future large SN Ia data sets.Comment: 14 pages, 13 figures, 5 tables. Submitted to MNRAS. Compression utility available at https://gitlab.com/congma/libsncompress/ and example cosmology code with machine-readable version of Tables A1 & A2 at https://gitlab.com/congma/sn-bayesian-model-example/ v2: corrected typo in author's name. v3: 15 pages, incl. corrections, matches the accepted versio

arXiv.org e-Print Archive

Policy Makers Priors and Inflation Density Forecasts

Author: Marco Vega
Publication venue
Publication date
Field of study

This paper models an inflation forecast density framework that closely resembles actual policy makers behaviour regarding the determination of the modal point, the uncertainty and asymmetry in the inflation forecasts. The framework combines policy makers prior information about these parameters with a standard parametric density estimation technique using Bayesian theory. The combination crucially hinges on an information-theoretic utility function gains of the policy maker from performing the forecast exercise.Monetary Policy, Inflation Targeting, Bayesian Methods

Research Papers in Economics

Evaluating Process-Based Integrated Assessment Models of Climate Change Mitigation

Author: Frame D.
Guivarch C.
Krey V.
Kriegler E.
Osborn T.J.
Schwanitz V.J.
Thompson E.L.
van Vuuren D.P.
Wilson C.
Publication venue: WP-17-007
Publication date: 01/05/2017
Field of study

Process-based integrated assessment models (IAMs) analyse transformation pathways to mitigate climate change. Confidence in models is established by testing their structural assumptions and comparing their behaviour against observations as well as other models. Climate model evaluation is concerted, and prominently reported in a dedicated chapter in the IPCC WG1 assessments. By comparison, evaluation of process-based IAMs tends to be less visible and more dispersed among modelling teams, with the exception of model inter-comparison projects. We contribute the first comprehensive analysis of process-based IAM evaluation, drawing on a wide range of examples across eight different evaluation methods testing both structural and behavioural validity. For each evaluation method, we compare its application to process-based IAMs with its application to climate models, noting similarities and differences, and seeking useful insights for strengthening the evaluation of process-based IAMs. We find that each evaluation method has distinctive strengths and limitations, as well as constraints on their application. We develop a systematic evaluation framework combining multiple methods that should be embedded within the development and use of process-based IAMs

International Institute for Applied Systems Analysis (IIASA)

Bayesian phylogeography of influenza A/H3N2 for the 2014-15 season in the United States using three frameworks of ancestral state reconstruction

Author
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 07/02/2017
Field of study

abstract: Ancestral state reconstructions in Bayesian phylogeography of virus pandemics have been improved by utilizing a Bayesian stochastic search variable selection (BSSVS) framework. Recently, this framework has been extended to model the transition rate matrix between discrete states as a generalized linear model (GLM) of genetic, geographic, demographic, and environmental predictors of interest to the virus and incorporating BSSVS to estimate the posterior inclusion probabilities of each predictor. Although the latter appears to enhance the biological validity of ancestral state reconstruction, there has yet to be a comparison of phylogenies created by the two methods. In this paper, we compare these two methods, while also using a primitive method without BSSVS, and highlight the differences in phylogenies created by each. We test six coalescent priors and six random sequence samples of H3N2 influenza during the 2014–15 flu season in the U.S. We show that the GLMs yield significantly greater root state posterior probabilities than the two alternative methods under five of the six priors, and significantly greater Kullback-Leibler divergence values than the two alternative methods under all priors. Furthermore, the GLMs strongly implicate temperature and precipitation as driving forces of this flu season and nearly unanimously identified a single root state, which exhibits the most tropical climate during a typical flu season in the U.S. The GLM, however, appears to be highly susceptible to sampling bias compared with the other methods, which casts doubt on whether its reconstructions should be favored over those created by alternate methods. We report that a BSSVS approach with a Poisson prior demonstrates less bias toward sample size under certain conditions than the GLMs or primitive models, and believe that the connection between reconstruction method and sampling bias warrants further investigation.The article is published at http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.100538

ASU Digital Repository