Search CORE

10,449 research outputs found

Semi-supervised model-based clustering with controlled clusters leakage

Author: Struski Łukasz
Tabor Jacek
Śmieja Marek
Publication venue
Publication date: 01/01/2017
Field of study

In this paper, we focus on finding clusters in partially categorized data sets. We propose a semi-supervised version of Gaussian mixture model, called C3L, which retrieves natural subgroups of given categories. In contrast to other semi-supervised models, C3L is parametrized by user-defined leakage level, which controls maximal inconsistency between initial categorization and resulting clustering. Our method can be implemented as a module in practical expert systems to detect clusters, which combine expert knowledge with true distribution of data. Moreover, it can be used for improving the results of less flexible clustering techniques, such as projection pursuit clustering. The paper presents extensive theoretical analysis of the model and fast algorithm for its efficient optimization. Experimental results show that C3L finds high quality clustering model, which can be applied in discovering meaningful groups in partially classified data

arXiv.org e-Print Archive

Jagiellonian Univeristy Repository

Nonlinear Fusion of Multi-Dimensional Densities in Joint State Space

Author: Hanebeck Uwe D.
Klumpp Vesa
Publication venue
Publication date: 03/08/2012
Field of study

Nonlinear fusion of multi-dimensional densities is an important application in Bayesian state estimation. In the approach proposed here, a joint density over all considered densities is build, which is then approximated by means of a Dirac mixture density by partitioning the joint state space into regions that are represented by single Dirac components. This approximation procedure depends on the nonlinear fusion model and only areas relevant to this model are considered. The processing in joint state space has advantages, especially when fusing Dirac mixture densities. Within this approach, degeneration can be avoided and even densities without mutual support can be combined. Thus, this approach gives an alternative to multiplication of Dirac mixtures with a likelihood, as used in the particle filter. Furthermore, a nonlinear Bayesian estimator with filter and prediction step can be formulated, which is able to cope with both discrete and continuous densities

KITopen

Porting concepts from DNNs back to GMMs

Author: Demuynck Kris
Triefenbach Fabian
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2013
Field of study

Deep neural networks (DNNs) have been shown to outperform Gaussian Mixture Models (GMM) on a variety of speech recognition benchmarks. In this paper we analyze the differences between the DNN and GMM modeling techniques and port the best ideas from the DNN-based modeling to a GMM-based system. By going both deep (multiple layers) and wide (multiple parallel sub-models) and by sharing model parameters, we are able to close the gap between the two modeling techniques on the TIMIT database. Since the 'deep' GMMs retain the maximum-likelihood trained Gaussians as first layer, advanced techniques such as speaker adaptation and model-based noise robustness can be readily incorporated. Regardless of their similarities, the DNNs and the deep GMMs still show a sufficient amount of complementarity to allow effective system combination

Crossref

Ghent University Academic Bibliography

Probabilistic Framework for Sensor Management

Author: Huber Marco
Publication venue: KIT Scientific Publishing, Karlsruhe
Publication date: 01/01/2009
Field of study

A probabilistic sensor management framework is introduced, which maximizes the utility of sensor systems with many different sensing modalities by dynamically configuring the sensor system in the most beneficial way. For this purpose, techniques from stochastic control and Bayesian estimation are combined such that long-term effects of possible sensor configurations and stochastic uncertainties resulting from noisy measurements can be incorporated into the sensor management decisions

KITopen

Directory of Open Access Books (DOAB)

Semi-supervised cross-entropy clustering with information bottleneck constraint

Author: Geiger Bernhard C.
Śmieja Marek
Publication venue: 'Elsevier BV'
Publication date: 01/01/2017
Field of study

In this paper, we propose a semi-supervised clustering method, CEC-IB, that models data with a set of Gaussian distributions and that retrieves clusters based on a partial labeling provided by the user (partition-level side information). By combining the ideas from cross-entropy clustering (CEC) with those from the information bottleneck method (IB), our method trades between three conflicting goals: the accuracy with which the data set is modeled, the simplicity of the model, and the consistency of the clustering with side information. Experiments demonstrate that CEC-IB has a performance comparable to Gaussian mixture models (GMM) in a classical semi-supervised scenario, but is faster, more robust to noisy labels, automatically determines the optimal number of clusters, and performs well when not all classes are present in the side information. Moreover, in contrast to other semi-supervised models, it can be successfully applied in discovering natural subgroups if the partition-level side information is derived from the top levels of a hierarchical clustering

arXiv.org e-Print Archive

Jagiellonian Univeristy Repository

Exact Non-Parametric Bayesian Inference on Infinite Trees

Author: Hutter Marcus
Publication venue
Publication date: 01/01/2009
Field of study

Given i.i.d. data from an unknown distribution, we consider the problem of predicting future items. An adaptive way to estimate the probability density is to recursively subdivide the domain to an appropriate data-dependent granularity. A Bayesian would assign a data-independent prior probability to "subdivide", which leads to a prior over infinite(ly many) trees. We derive an exact, fast, and simple inference algorithm for such a prior, for the data evidence, the predictive distribution, the effective model dimension, moments, and other quantities. We prove asymptotic convergence and consistency results, and illustrate the behavior of our model on some prototypical functions.Comment: 32 LaTeX pages, 9 figures, 5 theorems, 1 algorith

arXiv.org e-Print Archive

CiteSeerX

The Australian National University

Fast Non-Parametric Bayesian Inference on Infinite Trees

Author: Hutter Marcus
Publication venue
Publication date: 23/11/2004
Field of study

Given i.i.d. data from an unknown distribution, we consider the problem of predicting future items. An adaptive way to estimate the probability density is to recursively subdivide the domain to an appropriate data-dependent granularity. A Bayesian would assign a data-independent prior probability to "subdivide", which leads to a prior over infinite(ly many) trees. We derive an exact, fast, and simple inference algorithm for such a prior, for the data evidence, the predictive distribution, the effective model dimension, and other quantities.Comment: 8 twocolumn pages, 3 figure

arXiv.org e-Print Archive

CiteSeerX

The Australian National University