2,786 research outputs found
Comparison between Suitable Priors for Additive Bayesian Networks
Additive Bayesian networks are types of graphical models that extend the
usual Bayesian generalized linear model to multiple dependent variables through
the factorisation of the joint probability distribution of the underlying
variables. When fitting an ABN model, the choice of the prior of the parameters
is of crucial importance. If an inadequate prior - like a too weakly
informative one - is used, data separation and data sparsity lead to issues in
the model selection process. In this work a simulation study between two weakly
and a strongly informative priors is presented. As weakly informative prior we
use a zero mean Gaussian prior with a large variance, currently implemented in
the R-package abn. The second prior belongs to the Student's t-distribution,
specifically designed for logistic regressions and, finally, the strongly
informative prior is again Gaussian with mean equal to true parameter value and
a small variance. We compare the impact of these priors on the accuracy of the
learned additive Bayesian network in function of different parameters. We
create a simulation study to illustrate Lindley's paradox based on the prior
choice. We then conclude by highlighting the good performance of the
informative Student's t-prior and the limited impact of the Lindley's paradox.
Finally, suggestions for further developments are provided.Comment: 8 pages, 4 figure
Scalable Recommendation with Poisson Factorization
We develop a Bayesian Poisson matrix factorization model for forming
recommendations from sparse user behavior data. These data are large user/item
matrices where each user has provided feedback on only a small subset of items,
either explicitly (e.g., through star ratings) or implicitly (e.g., through
views or purchases). In contrast to traditional matrix factorization
approaches, Poisson factorization implicitly models each user's limited
attention to consume items. Moreover, because of the mathematical form of the
Poisson likelihood, the model needs only to explicitly consider the observed
entries in the matrix, leading to both scalable computation and good predictive
performance. We develop a variational inference algorithm for approximate
posterior inference that scales up to massive data sets. This is an efficient
algorithm that iterates over the observed entries and adjusts an approximate
posterior over the user/item representations. We apply our method to large
real-world user data containing users rating movies, users listening to songs,
and users reading scientific papers. In all these settings, Bayesian Poisson
factorization outperforms state-of-the-art matrix factorization methods
Streaming Graph Challenge: Stochastic Block Partition
An important objective for analyzing real-world graphs is to achieve scalable
performance on large, streaming graphs. A challenging and relevant example is
the graph partition problem. As a combinatorial problem, graph partition is
NP-hard, but existing relaxation methods provide reasonable approximate
solutions that can be scaled for large graphs. Competitive benchmarks and
challenges have proven to be an effective means to advance state-of-the-art
performance and foster community collaboration. This paper describes a graph
partition challenge with a baseline partition algorithm of sub-quadratic
complexity. The algorithm employs rigorous Bayesian inferential methods based
on a statistical model that captures characteristics of the real-world graphs.
This strong foundation enables the algorithm to address limitations of
well-known graph partition approaches such as modularity maximization. This
paper describes various aspects of the challenge including: (1) the data sets
and streaming graph generator, (2) the baseline partition algorithm with
pseudocode, (3) an argument for the correctness of parallelizing the Bayesian
inference, (4) different parallel computation strategies such as node-based
parallelism and matrix-based parallelism, (5) evaluation metrics for partition
correctness and computational requirements, (6) preliminary timing of a
Python-based demonstration code and the open source C++ code, and (7)
considerations for partitioning the graph in streaming fashion. Data sets and
source code for the algorithm as well as metrics, with detailed documentation
are available at GraphChallenge.org.Comment: To be published in 2017 IEEE High Performance Extreme Computing
Conference (HPEC
- …