38,537 research outputs found
Clustering Boolean Tensors
Tensor factorizations are computationally hard problems, and in particular,
are often significantly harder than their matrix counterparts. In case of
Boolean tensor factorizations -- where the input tensor and all the factors are
required to be binary and we use Boolean algebra -- much of that hardness comes
from the possibility of overlapping components. Yet, in many applications we
are perfectly happy to partition at least one of the modes. In this paper we
investigate what consequences does this partitioning have on the computational
complexity of the Boolean tensor factorizations and present a new algorithm for
the resulting clustering problem. This algorithm can alternatively be seen as a
particularly regularized clustering algorithm that can handle extremely
high-dimensional observations. We analyse our algorithms with the goal of
maximizing the similarity and argue that this is more meaningful than
minimizing the dissimilarity. As a by-product we obtain a PTAS and an efficient
0.828-approximation algorithm for rank-1 binary factorizations. Our algorithm
for Boolean tensor clustering achieves high scalability, high similarity, and
good generalization to unseen data with both synthetic and real-world data
sets
Multimodal nested sampling: an efficient and robust alternative to MCMC methods for astronomical data analysis
In performing a Bayesian analysis of astronomical data, two difficult
problems often emerge. First, in estimating the parameters of some model for
the data, the resulting posterior distribution may be multimodal or exhibit
pronounced (curving) degeneracies, which can cause problems for traditional
MCMC sampling methods. Second, in selecting between a set of competing models,
calculation of the Bayesian evidence for each model is computationally
expensive. The nested sampling method introduced by Skilling (2004), has
greatly reduced the computational expense of calculating evidences and also
produces posterior inferences as a by-product. This method has been applied
successfully in cosmological applications by Mukherjee et al. (2006), but their
implementation was efficient only for unimodal distributions without pronounced
degeneracies. Shaw et al. (2007), recently introduced a clustered nested
sampling method which is significantly more efficient in sampling from
multimodal posteriors and also determines the expectation and variance of the
final evidence from a single run of the algorithm, hence providing a further
increase in efficiency. In this paper, we build on the work of Shaw et al. and
present three new methods for sampling and evidence evaluation from
distributions that may contain multiple modes and significant degeneracies; we
also present an even more efficient technique for estimating the uncertainty on
the evaluated evidence. These methods lead to a further substantial improvement
in sampling efficiency and robustness, and are applied to toy problems to
demonstrate the accuracy and economy of the evidence calculation and parameter
estimation. Finally, we discuss the use of these methods in performing Bayesian
object detection in astronomical datasets.Comment: 14 pages, 11 figures, submitted to MNRAS, some major additions to the
previous version in response to the referee's comment
- …