21,660 research outputs found
Automatic Bayesian Density Analysis
Making sense of a dataset in an automatic and unsupervised fashion is a
challenging problem in statistics and AI. Classical approaches for {exploratory
data analysis} are usually not flexible enough to deal with the uncertainty
inherent to real-world data: they are often restricted to fixed latent
interaction models and homogeneous likelihoods; they are sensitive to missing,
corrupt and anomalous data; moreover, their expressiveness generally comes at
the price of intractable inference. As a result, supervision from statisticians
is usually needed to find the right model for the data. However, since domain
experts are not necessarily also experts in statistics, we propose Automatic
Bayesian Density Analysis (ABDA) to make exploratory data analysis accessible
at large. Specifically, ABDA allows for automatic and efficient missing value
estimation, statistical data type and likelihood discovery, anomaly detection
and dependency structure mining, on top of providing accurate density
estimation. Extensive empirical evidence shows that ABDA is a suitable tool for
automatic exploratory analysis of mixed continuous and discrete tabular data.Comment: In proceedings of the Thirty-Third AAAI Conference on Artificial
Intelligence (AAAI-19
Comparison between Suitable Priors for Additive Bayesian Networks
Additive Bayesian networks are types of graphical models that extend the
usual Bayesian generalized linear model to multiple dependent variables through
the factorisation of the joint probability distribution of the underlying
variables. When fitting an ABN model, the choice of the prior of the parameters
is of crucial importance. If an inadequate prior - like a too weakly
informative one - is used, data separation and data sparsity lead to issues in
the model selection process. In this work a simulation study between two weakly
and a strongly informative priors is presented. As weakly informative prior we
use a zero mean Gaussian prior with a large variance, currently implemented in
the R-package abn. The second prior belongs to the Student's t-distribution,
specifically designed for logistic regressions and, finally, the strongly
informative prior is again Gaussian with mean equal to true parameter value and
a small variance. We compare the impact of these priors on the accuracy of the
learned additive Bayesian network in function of different parameters. We
create a simulation study to illustrate Lindley's paradox based on the prior
choice. We then conclude by highlighting the good performance of the
informative Student's t-prior and the limited impact of the Lindley's paradox.
Finally, suggestions for further developments are provided.Comment: 8 pages, 4 figure
- …