2,287 research outputs found
Evidence Transfer for Improving Clustering Tasks Using External Categorical Evidence
In this paper we introduce evidence transfer for clustering, a deep learning
method that can incrementally manipulate the latent representations of an
autoencoder, according to external categorical evidence, in order to improve a
clustering outcome. By evidence transfer we define the process by which the
categorical outcome of an external, auxiliary task is exploited to improve a
primary task, in this case representation learning for clustering. Our proposed
method makes no assumptions regarding the categorical evidence presented, nor
the structure of the latent space. We compare our method, against the baseline
solution by performing k-means clustering before and after its deployment.
Experiments with three different kinds of evidence show that our method
effectively manipulates the latent representations when introduced with real
corresponding evidence, while remaining robust when presented with low quality
evidence
Auxiliary Deep Generative Models
Deep generative models parameterized by neural networks have recently
achieved state-of-the-art performance in unsupervised and semi-supervised
learning. We extend deep generative models with auxiliary variables which
improves the variational approximation. The auxiliary variables leave the
generative model unchanged but make the variational distribution more
expressive. Inspired by the structure of the auxiliary variable we also propose
a model with two stochastic layers and skip connections. Our findings suggest
that more expressive and properly specified deep generative models converge
faster with better results. We show state-of-the-art performance within
semi-supervised learning on MNIST, SVHN and NORB datasets.Comment: Proceedings of the 33rd International Conference on Machine Learning,
New York, NY, USA, 2016, JMLR: Workshop and Conference Proceedings volume 48,
Proceedings of the 33rd International Conference on Machine Learning, New
York, NY, USA, 201
A nonparametric HMM for genetic imputation and coalescent inference
Genetic sequence data are well described by hidden Markov models (HMMs) in
which latent states correspond to clusters of similar mutation patterns. Theory
from statistical genetics suggests that these HMMs are nonhomogeneous (their
transition probabilities vary along the chromosome) and have large support for
self transitions. We develop a new nonparametric model of genetic sequence
data, based on the hierarchical Dirichlet process, which supports these self
transitions and nonhomogeneity. Our model provides a parameterization of the
genetic process that is more parsimonious than other more general nonparametric
models which have previously been applied to population genetics. We provide
truncation-free MCMC inference for our model using a new auxiliary sampling
scheme for Bayesian nonparametric HMMs. In a series of experiments on male X
chromosome data from the Thousand Genomes Project and also on data simulated
from a population bottleneck we show the benefits of our model over the popular
finite model fastPHASE, which can itself be seen as a parametric truncation of
our model. We find that the number of HMM states found by our model is
correlated with the time to the most recent common ancestor in population
bottlenecks. This work demonstrates the flexibility of Bayesian nonparametrics
applied to large and complex genetic data
A Novel Approach for Effective Multi-View Clustering with Information-Theoretic Perspective
Multi-view clustering (MVC) is a popular technique for improving clustering
performance using various data sources. However, existing methods primarily
focus on acquiring consistent information while often neglecting the issue of
redundancy across multiple views. This study presents a new approach called
Sufficient Multi-View Clustering (SUMVC) that examines the multi-view
clustering framework from an information-theoretic standpoint. Our proposed
method consists of two parts. Firstly, we develop a simple and reliable
multi-view clustering method SCMVC (simple consistent multi-view clustering)
that employs variational analysis to generate consistent information. Secondly,
we propose a sufficient representation lower bound to enhance consistent
information and minimise unnecessary information among views. The proposed
SUMVC method offers a promising solution to the problem of multi-view
clustering and provides a new perspective for analyzing multi-view data.
To verify the effectiveness of our model, we conducted a theoretical analysis
based on the Bayes Error Rate, and experiments on multiple multi-view datasets
demonstrate the superior performance of SUMVC
- …