3,980 research outputs found
A Full Probabilistic Model for Yes/No Type Crowdsourcing in Multi-Class Classification
Crowdsourcing has become widely used in supervised scenarios where training
sets are scarce and difficult to obtain. Most crowdsourcing models in the
literature assume labelers can provide answers to full questions. In
classification contexts, full questions require a labeler to discern among all
possible classes. Unfortunately, discernment is not always easy in realistic
scenarios. Labelers may not be experts in differentiating all classes. In this
work, we provide a full probabilistic model for a shorter type of queries. Our
shorter queries only require "yes" or "no" responses. Our model estimates a
joint posterior distribution of matrices related to labelers' confusions and
the posterior probability of the class of every object. We developed an
approximate inference approach, using Monte Carlo Sampling and Black Box
Variational Inference, which provides the derivation of the necessary
gradients. We built two realistic crowdsourcing scenarios to test our model.
The first scenario queries for irregular astronomical time-series. The second
scenario relies on the image classification of animals. We achieved results
that are comparable with those of full query crowdsourcing. Furthermore, we
show that modeling labelers' failures plays an important role in estimating
true classes. Finally, we provide the community with two real datasets obtained
from our crowdsourcing experiments. All our code is publicly available.Comment: SIAM International Conference on Data Mining (SDM19), 9 official
pages, 5 supplementary page
Model Selection in Overlapping Stochastic Block Models
Networks are a commonly used mathematical model to describe the rich set of
interactions between objects of interest. Many clustering methods have been
developed in order to partition such structures, among which several rely on
underlying probabilistic models, typically mixture models. The relevant hidden
structure may however show overlapping groups in several applications. The
Overlapping Stochastic Block Model (2011) has been developed to take this
phenomenon into account. Nevertheless, the problem of the choice of the number
of classes in the inference step is still open. To tackle this issue, we
consider the proposed model in a Bayesian framework and develop a new criterion
based on a non asymptotic approximation of the marginal log-likelihood. We
describe how the criterion can be computed through a variational Bayes EM
algorithm, and demonstrate its efficiency by running it on both simulated and
real data.Comment: articl
Statistical modelling of summary values leads to accurate Approximate Bayesian Computations
Approximate Bayesian Computation (ABC) methods rely on asymptotic arguments,
implying that parameter inference can be systematically biased even when
sufficient statistics are available. We propose to construct the ABC
accept/reject step from decision theoretic arguments on a suitable auxiliary
space. This framework, referred to as ABC*, fully specifies which test
statistics to use, how to combine them, how to set the tolerances and how long
to simulate in order to obtain accuracy properties on the auxiliary space. Akin
to maximum-likelihood indirect inference, regularity conditions establish when
the ABC* approximation to the posterior density is accurate on the original
parameter space in terms of the Kullback-Leibler divergence and the maximum a
posteriori point estimate. Fundamentally, escaping asymptotic arguments
requires knowledge of the distribution of test statistics, which we obtain
through modelling the distribution of summary values, data points on a summary
level. Synthetic examples and an application to time series data of influenza A
(H3N2) infections in the Netherlands illustrate ABC* in action.Comment: Videos can be played with Acrobat Reader. Manuscript under review and
not accepte
Application of Bayesian graphs to SN Ia data analysis and compression
Bayesian graphical models are an efficient tool for modelling complex data
and derive self-consistent expressions of the posterior distribution of model
parameters. We apply Bayesian graphs to perform statistical analyses of Type Ia
supernova (SN Ia) luminosity distance measurements from the joint light-curve
analysis (JLA) data set. In contrast to the approach used in previous
studies, the Bayesian inference allows us to fully account for the
standard-candle parameter dependence of the data covariance matrix. Comparing
with analysis results, we find a systematic offset of the marginal
model parameter bounds. We demonstrate that the bias is statistically
significant in the case of the SN Ia standardization parameters with a maximal
6 shift of the SN light-curve colour correction. In addition, we find
that the evidence for a host galaxy correction is now only 2.4 .
Systematic offsets on the cosmological parameters remain small, but may
increase by combining constraints from complementary cosmological probes. The
bias of the analysis is due to neglecting the parameter-dependent
log-determinant of the data covariance, which gives more statistical weight to
larger values of the standardization parameters. We find a similar effect on
compressed distance modulus data. To this end, we implement a fully consistent
compression method of the JLA data set that uses a Gaussian approximation of
the posterior distribution for fast generation of compressed data. Overall, the
results of our analysis emphasize the need for a fully consistent Bayesian
statistical approach in the analysis of future large SN Ia data sets.Comment: 14 pages, 13 figures, 5 tables. Submitted to MNRAS. Compression
utility available at https://gitlab.com/congma/libsncompress/ and example
cosmology code with machine-readable version of Tables A1 & A2 at
https://gitlab.com/congma/sn-bayesian-model-example/ v2: corrected typo in
author's name. v3: 15 pages, incl. corrections, matches the accepted versio
Policy Makers Priors and Inflation Density Forecasts
This paper models an inflation forecast density framework that closely resembles actual policy makers behaviour regarding the determination of the modal point, the uncertainty and asymmetry in the inflation forecasts. The framework combines policy makers prior information about these parameters with a standard parametric density estimation technique using Bayesian theory. The combination crucially hinges on an information-theoretic utility function gains of the policy maker from performing the forecast exercise.Monetary Policy, Inflation Targeting, Bayesian Methods
Evaluating Process-Based Integrated Assessment Models of Climate Change Mitigation
Process-based integrated assessment models (IAMs) analyse transformation pathways to mitigate climate change. Confidence in models is established by testing their structural assumptions and comparing their behaviour against observations as well as other models. Climate model evaluation is concerted, and prominently reported in a dedicated chapter in the IPCC WG1 assessments. By comparison, evaluation of process-based IAMs tends to be less visible and more dispersed among modelling teams, with the exception of model inter-comparison projects. We contribute the first comprehensive analysis of process-based IAM evaluation, drawing on a wide range of examples across eight different evaluation methods testing both structural and behavioural validity. For each evaluation method, we compare its application to process-based IAMs with its application to climate models, noting similarities and differences, and seeking useful insights for strengthening the evaluation of process-based IAMs. We find that each evaluation method has distinctive strengths and limitations, as well as constraints on their application. We develop a systematic evaluation framework combining multiple methods that should be embedded within the development and use of process-based IAMs
Bayesian phylogeography of influenza A/H3N2 for the 2014-15 season in the United States using three frameworks of ancestral state reconstruction
abstract: Ancestral state reconstructions in Bayesian phylogeography of virus pandemics have been improved by utilizing a Bayesian stochastic search variable selection (BSSVS) framework. Recently, this framework has been extended to model the transition rate matrix between discrete states as a generalized linear model (GLM) of genetic, geographic, demographic, and environmental predictors of interest to the virus and incorporating BSSVS to estimate the posterior inclusion probabilities of each predictor. Although the latter appears to enhance the biological validity of ancestral state reconstruction, there has yet to be a comparison of phylogenies created by the two methods. In this paper, we compare these two methods, while also using a primitive method without BSSVS, and highlight the differences in phylogenies created by each. We test six coalescent priors and six random sequence samples of H3N2 influenza during the 2014–15 flu season in the U.S. We show that the GLMs yield significantly greater root state posterior probabilities than the two alternative methods under five of the six priors, and significantly greater Kullback-Leibler divergence values than the two alternative methods under all priors. Furthermore, the GLMs strongly implicate temperature and precipitation as driving forces of this flu season and nearly unanimously identified a single root state, which exhibits the most tropical climate during a typical flu season in the U.S. The GLM, however, appears to be highly susceptible to sampling bias compared with the other methods, which casts doubt on whether its reconstructions should be favored over those created by alternate methods. We report that a BSSVS approach with a Poisson prior demonstrates less bias toward sample size under certain conditions than the GLMs or primitive models, and believe that the connection between reconstruction method and sampling bias warrants further investigation.The article is published at http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.100538
- …