4,434 research outputs found
Sliced Wasserstein Distance for Learning Gaussian Mixture Models
Gaussian mixture models (GMM) are powerful parametric tools with many
applications in machine learning and computer vision. Expectation maximization
(EM) is the most popular algorithm for estimating the GMM parameters. However,
EM guarantees only convergence to a stationary point of the log-likelihood
function, which could be arbitrarily worse than the optimal solution. Inspired
by the relationship between the negative log-likelihood function and the
Kullback-Leibler (KL) divergence, we propose an alternative formulation for
estimating the GMM parameters using the sliced Wasserstein distance, which
gives rise to a new algorithm. Specifically, we propose minimizing the
sliced-Wasserstein distance between the mixture model and the data distribution
with respect to the GMM parameters. In contrast to the KL-divergence, the
energy landscape for the sliced-Wasserstein distance is more well-behaved and
therefore more suitable for a stochastic gradient descent scheme to obtain the
optimal GMM parameters. We show that our formulation results in parameter
estimates that are more robust to random initializations and demonstrate that
it can estimate high-dimensional data distributions more faithfully than the EM
algorithm
Propagation Kernels
We introduce propagation kernels, a general graph-kernel framework for
efficiently measuring the similarity of structured data. Propagation kernels
are based on monitoring how information spreads through a set of given graphs.
They leverage early-stage distributions from propagation schemes such as random
walks to capture structural information encoded in node labels, attributes, and
edge information. This has two benefits. First, off-the-shelf propagation
schemes can be used to naturally construct kernels for many graph types,
including labeled, partially labeled, unlabeled, directed, and attributed
graphs. Second, by leveraging existing efficient and informative propagation
schemes, propagation kernels can be considerably faster than state-of-the-art
approaches without sacrificing predictive performance. We will also show that
if the graphs at hand have a regular structure, for instance when modeling
image or video data, one can exploit this regularity to scale the kernel
computation to large databases of graphs with thousands of nodes. We support
our contributions by exhaustive experiments on a number of real-world graphs
from a variety of application domains
Persistence Bag-of-Words for Topological Data Analysis
Persistent homology (PH) is a rigorous mathematical theory that provides a
robust descriptor of data in the form of persistence diagrams (PDs). PDs
exhibit, however, complex structure and are difficult to integrate in today's
machine learning workflows. This paper introduces persistence bag-of-words: a
novel and stable vectorized representation of PDs that enables the seamless
integration with machine learning. Comprehensive experiments show that the new
representation achieves state-of-the-art performance and beyond in much less
time than alternative approaches.Comment: Accepted for the Twenty-Eight International Joint Conference on
Artificial Intelligence (IJCAI-19). arXiv admin note: substantial text
overlap with arXiv:1802.0485
- …