Search CORE

1,729 research outputs found

Scalable Bayesian Non-Negative Tensor Factorization for Massive Count Data

Author: DB Dunson
EC Chi
G Heinrich
MD Hoffman
MI Jordan
O Cappé
TG Kolda
Publication venue
Publication date: 18/08/2015
Field of study

We present a Bayesian non-negative tensor factorization model for count-valued tensor data, and develop scalable inference algorithms (both batch and online) for dealing with massive tensors. Our generative model can handle overdispersed counts as well as infer the rank of the decomposition. Moreover, leveraging a reparameterization of the Poisson distribution as a multinomial facilitates conjugacy in the model and enables simple and efficient Gibbs sampling and variational Bayes (VB) inference updates, with a computational cost that only depends on the number of nonzeros in the tensor. The model also provides a nice interpretability for the factors; in our model, each factor corresponds to a "topic". We develop a set of online inference algorithms that allow further scaling up the model to massive tensors, for which batch inference methods may be infeasible. We apply our framework on diverse real-world applications, such as \emph{multiway} topic modeling on a scientific publications database, analyzing a political science data set, and analyzing a massive household transactions data set.Comment: ECML PKDD 201

arXiv.org e-Print Archive

Crossref

Fast and scalable non-parametric Bayesian inference for Poisson point processes

Author: Gugushvili Shota
Schauer Moritz
Spreij Peter
van der Meulen Frank
Publication venue
Publication date: 01/01/2020
Field of study

We study the problem of non-parametric Bayesian estimation of the intensity function of a Poisson point process. The observations are

n

independent realisations of a Poisson point process on the interval

[0,T]

. We propose two related approaches. In both approaches we model the intensity function as piecewise constant on

N

bins forming a partition of the interval

[0,T]

. In the first approach the coefficients of the intensity function are assigned independent gamma priors, leading to a closed form posterior distribution. On the theoretical side, we prove that as

n\rightarrow\infty,

the posterior asymptotically concentrates around the "true", data-generating intensity function at an optimal rate for

h

-H\"older regular intensity functions (

0 < h\leq 1

). In the second approach we employ a gamma Markov chain prior on the coefficients of the intensity function. The posterior distribution is no longer available in closed form, but inference can be performed using a straightforward version of the Gibbs sampler. Both approaches scale well with sample size, but the second is much less sensitive to the choice of

N

. Practical performance of our methods is first demonstrated via synthetic data examples. We compare our second method with other existing approaches on the UK coal mining disasters data. Furthermore, we apply it to the US mass shootings data and Donald Trump's Twitter data.Comment: 45 pages, 22 figure

arXiv.org e-Print Archive

International Migration, Integration and Social Cohesion online publications

UvA-DARE

Analysis of variance--why it is more important than ever

Author: Hoijtink Herbert
Hox Joop
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2005
Field of study

Analysis of variance (ANOVA) is an extremely important method in exploratory and confirmatory data analysis. Unfortunately, in complex problems (e.g., split-plot designs), it is not always easy to set up an appropriate ANOVA. We propose a hierarchical analysis that automatically gives the correct ANOVA comparisons even in complex scenarios. The inferences for all means and variances are performed under a model with a separate batch of effects for each row of the ANOVA table. We connect to classical ANOVA by working with finite-sample variance components: fixed and random effects models are characterized by inferences about existing levels of a factor and new levels, respectively. We also introduce a new graphical display showing inferences about the standard deviations of each batch of effects. We illustrate with two examples from our applied data analysis, first illustrating the usefulness of our hierarchical computations and displays, and second showing how the ideas of ANOVA are helpful in understanding a previously fit hierarchical model.Comment: This paper discussed in: [math.ST/0508526], [math.ST/0508527], [math.ST/0508528], [math.ST/0508529]. Rejoinder in [math.ST/0508530

arXiv.org e-Print Archive

CiteSeerX

Crossref

Statistical models in biogeography

Author: Alvarado Barrantes Ricardo
Publication venue
Publication date: 30/01/2013
Field of study

We concentrate on the statistical methods used in Biogeography for modelling the spatial distribution of bird species. Due to the difficulty of specifying a joint multivariate spatial covariance structure in environmental processes, we factor such a joint distribution into a series of conditional models linked together in a hierarchical framework. We have a process that corresponds to an unobservable map with the actual information about a bird species, and the data correspond to the observations that are connected to that process. Markov chain Monte Carlo (MCMC) simulation approaches are used for models involving multiple levels incorporating dependence structures. We use a Bayesian algorithm for drawing samples from the posterior distribution in order to obtain estimates of the parameters and reconstruct the true map based on data. We present different methods to overcome the problem of calculating the distribution of the Markov random field that is used in the MCMC algorithm. During the analysis it is desirable to delete some of the predictors from the model and only use a subset of covariates in the estimation procedure. We use the method by Kuo & Mallick (1998) (KM) for variable selection and combine it with multiple independent chains which successfully improves the mixing behaviour. In simulation studies we show the better performance of the pseudolikelihood over other likelihood approximation methods, and the good performance of the KM method with this type of data. We illustrate the application of the methods with the complete analysis of the spatial distribution of two bird species (Sturnella magna and Anas rubripes) based on a real data set. We show the advantages of using the hidden structure and the spatial interaction parameter in the spatial hidden Markov model over other simpler models, like the ordinary logistic model or the autologistic model without observation errors

Archivio istituzionale della ricerca - Università di Padova

Towards derandomising Markov chain Monte Carlo

Author: Feng Weiming
Guo Heng
Wang Chunyang
Wang Jiaheng
Yin Yitong
Publication venue
Publication date: 07/11/2022
Field of study

We present a new framework to derandomise certain Markov chain Monte Carlo (MCMC) algorithms. As in MCMC, we first reduce counting problems to sampling from a sequence of marginal distributions. For the latter task, we introduce a method called coupling towards the past that can, in logarithmic time, evaluate one or a constant number of variables from a stationary Markov chain state. Since there are at most logarithmic random choices, this leads to very simple derandomisation. We provide two applications of this framework, namely efficient deterministic approximate counting algorithms for hypergraph independent sets and hypergraph colourings, under local lemma type conditions matching, up to lower order factors, their state-of-the-art randomised counterparts.Comment: 57 page

arXiv.org e-Print Archive

The use of Polarimetric EMISAR for the Mapping and Characterization of the Semi-Natural Environment

Author: Sørensen Stefán Meulengracht
Publication venue: Technical University of Denmark
Publication date: 01/07/2005
Field of study

Online Research Database In Technology