17 research outputs found
The Binary Space Partitioning-Tree Process
The Mondrian process represents an elegant and powerful approach for space
partition modelling. However, as it restricts the partitions to be
axis-aligned, its modelling flexibility is limited. In this work, we propose a
self-consistent Binary Space Partitioning (BSP)-Tree process to generalize the
Mondrian process. The BSP-Tree process is an almost surely right continuous
Markov jump process that allows uniformly distributed oblique cuts in a
two-dimensional convex polygon. The BSP-Tree process can also be extended using
a non-uniform probability measure to generate direction differentiated cuts.
The process is also self-consistent, maintaining distributional invariance
under a restricted subdomain. We use Conditional-Sequential Monte Carlo for
inference using the tree structure as the high-dimensional variable. The
BSP-Tree process's performance on synthetic data partitioning and relational
modelling demonstrates clear inferential improvements over the standard
Mondrian process and other related methods
Random Tessellation Forests
Space partitioning methods such as random forests and the Mondrian process
are powerful machine learning methods for multi-dimensional and relational
data, and are based on recursively cutting a domain. The flexibility of these
methods is often limited by the requirement that the cuts be axis aligned. The
Ostomachion process and the self-consistent binary space partitioning-tree
process were recently introduced as generalizations of the Mondrian process for
space partitioning with non-axis aligned cuts in the two dimensional plane.
Motivated by the need for a multi-dimensional partitioning tree with non-axis
aligned cuts, we propose the Random Tessellation Process (RTP), a framework
that includes the Mondrian process and the binary space partitioning-tree
process as special cases. We derive a sequential Monte Carlo algorithm for
inference, and provide random forest methods. Our process is self-consistent
and can relax axis-aligned constraints, allowing complex inter-dimensional
dependence to be captured. We present a simulation study, and analyse gene
expression data of brain tissue, showing improved accuracies over other
methods.Comment: 11 pages, 4 figure
Isolation Mondrian Forest for Batch and Online Anomaly Detection
We propose a new method, named isolation Mondrian forest (iMondrian forest),
for batch and online anomaly detection. The proposed method is a novel hybrid
of isolation forest and Mondrian forest which are existing methods for batch
anomaly detection and online random forest, respectively. iMondrian forest
takes the idea of isolation, using the depth of a node in a tree, and
implements it in the Mondrian forest structure. The result is a new data
structure which can accept streaming data in an online manner while being used
for anomaly detection. Our experiments show that iMondrian forest mostly
performs better than isolation forest in batch settings and has better or
comparable performance against other batch and online anomaly detection
methods.Comment: Accepted for presentation at the IEEE International Conference on
Systems, Man, and Cybernetics (SMC) 2020. The first three authors contributed
equally to this wor
Fragmentation Coagulation Based Mixed Membership Stochastic Blockmodel
The Mixed-Membership Stochastic Blockmodel~(MMSB) is proposed as one of the
state-of-the-art Bayesian relational methods suitable for learning the complex
hidden structure underlying the network data. However, the current formulation
of MMSB suffers from the following two issues: (1), the prior information~(e.g.
entities' community structural information) can not be well embedded in the
modelling; (2), community evolution can not be well described in the
literature. Therefore, we propose a non-parametric fragmentation coagulation
based Mixed Membership Stochastic Blockmodel (fcMMSB). Our model performs
entity-based clustering to capture the community information for entities and
linkage-based clustering to derive the group information for links
simultaneously. Besides, the proposed model infers the network structure and
models community evolution, manifested by appearances and disappearances of
communities, using the discrete fragmentation coagulation process (DFCP). By
integrating the community structure with the group compatibility matrix we
derive a generalized version of MMSB. An efficient Gibbs sampling scheme with
Polya Gamma (PG) approach is implemented for posterior inference. We validate
our model on synthetic and real world data.Comment: AAAI 202