100,804 research outputs found
Methods of Hierarchical Clustering
We survey agglomerative hierarchical clustering algorithms and discuss
efficient implementations that are available in R and other software
environments. We look at hierarchical self-organizing maps, and mixture models.
We review grid-based clustering, focusing on hierarchical density-based
approaches. Finally we describe a recently developed very efficient (linear
time) hierarchical clustering algorithm, which can also be viewed as a
hierarchical grid-based algorithm.Comment: 21 pages, 2 figures, 1 table, 69 reference
Probabilistic Self-Organizing Maps for Text-Independent Speaker Identification
The present paper introduces a novel speaker modeling technique for text-independent speaker identification using probabilistic self-organizing maps (PbSOMs). The basic motivation behind the introduced technique was to combine the self-organizing quality of the self-organizing maps and generative power of Gaussian mixture models. Experimental results show that the introduced modeling technique using probabilistic self-organizing maps significantly outperforms the traditional technique using the classical GMMs and the EM algorithm or its deterministic variant. More precisely, a relative accuracy improvement of roughly 39% has been gained, as well as, a much less sensitivity to the model-parameters initialization has been exhibited by using the introduced speaker modeling technique using probabilistic self-organizing maps
Probabilistic Point Cloud Modeling via Self-Organizing Gaussian Mixture Models
This letter presents a continuous probabilistic modeling methodology for
spatial point cloud data using finite Gaussian Mixture Models (GMMs) where the
number of components are adapted based on the scene complexity. Few
hierarchical and adaptive methods have been proposed to address the challenge
of balancing model fidelity with size. Instead, state-of-the-art mapping
approaches require tuning parameters for specific use cases, but do not
generalize across diverse environments. To address this gap, we utilize a
self-organizing principle from information-theoretic learning to automatically
adapt the complexity of the GMM model based on the relevant information in the
sensor data. The approach is evaluated against existing point cloud modeling
techniques on real-world data with varying degrees of scene complexity.Comment: 8 pages, 6 figures, to appear in IEEE Robotics and Automation Letter
Incremental Multimodal Surface Mapping via Self-Organizing Gaussian Mixture Models
This letter describes an incremental multimodal surface mapping methodology,
which represents the environment as a continuous probabilistic model. This
model enables high-resolution reconstruction while simultaneously compressing
spatial and intensity point cloud data. The strategy employed in this work
utilizes Gaussian mixture models (GMMs) to represent the environment. While
prior GMM-based mapping works have developed methodologies to determine the
number of mixture components using information-theoretic techniques, these
approaches either operate on individual sensor observations, making them
unsuitable for incremental mapping, or are not real-time viable, especially for
applications where high-fidelity modeling is required. To bridge this gap, this
letter introduces a spatial hash map for rapid GMM submap extraction combined
with an approach to determine relevant and redundant data in a point cloud.
These contributions increase computational speed by an order of magnitude
compared to state-of-the-art incremental GMM-based mapping. In addition, the
proposed approach yields a superior tradeoff in map accuracy and size when
compared to state-of-the-art mapping methodologies (both GMM- and not
GMM-based). Evaluations are conducted using both simulated and real-world data.
The software is released open-source to benefit the robotics community.Comment: 7 pages, 7 figures, under review at IEEE Robotics and Automation
Letter
Algorithms for Hierarchical Clustering: An Overview, II
We survey agglomerative hierarchical clustering algorithms and discuss efficient implementations that are available in R and other software environments. We look at hierarchical self-organizing maps, and mixture models. We review grid-based clustering, focusing on hierarchical density-based approaches. Finally we describe a recently developed very efficient (linear time) hierarchical clustering algorithm, which can also be viewed as a hierarchical grid-based algorithm. This review adds to the earlier version, Murtagh and Contreras (2012)
Description of Input Patterns by Linear Mixtures of SOM Models
This paper introduces a novel way of analyzing input patterns presented to the Self-Organizing Map (SOM). Instead of identifying only the "winner," i.e., the model that matches best with the input, we determine the linear mixture of the models (reference vectors) of the SOM that approximates to the input vector best. It will be shown that if only nonnegative weights are allowed in this linear mixture, the expansion of the input pattern in terms of the models is very meaningful, contains only few terms, and provides a better insight into the input state than what the mere "winner" can give. If then the models fall into classes that are known a priori, the sums of the weights over each class can be interpreted as expressing the affiliation of the input with the due classes
Missing data imputation through generative topographic mapping as a mixture of t-distributions: Theoretical developments
The Generative Topographic Mapping (GTM) was originally conceived as a probabilistic alternative to the well-known, neural network-inspired, Self-Organizing Map (SOM). The GTM can also be interpreted as a constrained mixture of distributions model. In recent years, much attention has been directed towards Student t-distributions as an alternative to Gaussians in mixture models due to their robustness towards outliers. In this report, the GTM is redefined as a constrained mixture of t-distributions: the t-GTM, and the Expectation-Maximization algorithm that is used to fit the model to the data is modified to provide missing data imputation.Postprint (published version
- …