10,449 research outputs found
Semi-supervised model-based clustering with controlled clusters leakage
In this paper, we focus on finding clusters in partially categorized data
sets. We propose a semi-supervised version of Gaussian mixture model, called
C3L, which retrieves natural subgroups of given categories. In contrast to
other semi-supervised models, C3L is parametrized by user-defined leakage
level, which controls maximal inconsistency between initial categorization and
resulting clustering. Our method can be implemented as a module in practical
expert systems to detect clusters, which combine expert knowledge with true
distribution of data. Moreover, it can be used for improving the results of
less flexible clustering techniques, such as projection pursuit clustering. The
paper presents extensive theoretical analysis of the model and fast algorithm
for its efficient optimization. Experimental results show that C3L finds high
quality clustering model, which can be applied in discovering meaningful groups
in partially classified data
Nonlinear Fusion of Multi-Dimensional Densities in Joint State Space
Nonlinear fusion of multi-dimensional densities is an important application in Bayesian state estimation. In the approach proposed here, a joint density over all considered densities is build, which is then approximated by means of a Dirac mixture density by partitioning the joint state space into regions that are represented by single Dirac components. This approximation procedure depends on the nonlinear fusion model and only areas relevant to this model are considered. The processing in joint state space has advantages, especially when fusing Dirac mixture densities. Within this approach, degeneration can be avoided and even densities without mutual support can be combined. Thus, this approach gives an alternative to multiplication of Dirac mixtures with a likelihood, as used in the particle filter. Furthermore, a nonlinear Bayesian estimator with filter and prediction step can be formulated, which is able to cope with both discrete and continuous densities
Porting concepts from DNNs back to GMMs
Deep neural networks (DNNs) have been shown to outperform Gaussian Mixture Models (GMM) on a variety of speech recognition benchmarks. In this paper we analyze the differences between the DNN and GMM modeling techniques and port the best ideas from the DNN-based modeling to a GMM-based system. By going both deep (multiple layers) and wide (multiple parallel sub-models) and by sharing model parameters, we are able to close the gap between the two modeling techniques on the TIMIT database. Since the 'deep' GMMs retain the maximum-likelihood trained Gaussians as first layer, advanced techniques such as speaker adaptation and model-based noise robustness can be readily incorporated. Regardless of their similarities, the DNNs and the deep GMMs still show a sufficient amount of complementarity to allow effective system combination
Probabilistic Framework for Sensor Management
A probabilistic sensor management framework is introduced, which maximizes the utility of sensor systems with many different sensing modalities by dynamically configuring the sensor system in the most beneficial way. For this purpose, techniques from stochastic control and Bayesian estimation are combined such that long-term effects of possible sensor configurations and stochastic uncertainties resulting from noisy measurements can be incorporated into the sensor management decisions
Semi-supervised cross-entropy clustering with information bottleneck constraint
In this paper, we propose a semi-supervised clustering method, CEC-IB, that
models data with a set of Gaussian distributions and that retrieves clusters
based on a partial labeling provided by the user (partition-level side
information). By combining the ideas from cross-entropy clustering (CEC) with
those from the information bottleneck method (IB), our method trades between
three conflicting goals: the accuracy with which the data set is modeled, the
simplicity of the model, and the consistency of the clustering with side
information. Experiments demonstrate that CEC-IB has a performance comparable
to Gaussian mixture models (GMM) in a classical semi-supervised scenario, but
is faster, more robust to noisy labels, automatically determines the optimal
number of clusters, and performs well when not all classes are present in the
side information. Moreover, in contrast to other semi-supervised models, it can
be successfully applied in discovering natural subgroups if the partition-level
side information is derived from the top levels of a hierarchical clustering
Exact Non-Parametric Bayesian Inference on Infinite Trees
Given i.i.d. data from an unknown distribution, we consider the problem of
predicting future items. An adaptive way to estimate the probability density is
to recursively subdivide the domain to an appropriate data-dependent
granularity. A Bayesian would assign a data-independent prior probability to
"subdivide", which leads to a prior over infinite(ly many) trees. We derive an
exact, fast, and simple inference algorithm for such a prior, for the data
evidence, the predictive distribution, the effective model dimension, moments,
and other quantities. We prove asymptotic convergence and consistency results,
and illustrate the behavior of our model on some prototypical functions.Comment: 32 LaTeX pages, 9 figures, 5 theorems, 1 algorith
Fast Non-Parametric Bayesian Inference on Infinite Trees
Given i.i.d. data from an unknown distribution, we consider the problem of
predicting future items. An adaptive way to estimate the probability density is
to recursively subdivide the domain to an appropriate data-dependent
granularity. A Bayesian would assign a data-independent prior probability to
"subdivide", which leads to a prior over infinite(ly many) trees. We derive an
exact, fast, and simple inference algorithm for such a prior, for the data
evidence, the predictive distribution, the effective model dimension, and other
quantities.Comment: 8 twocolumn pages, 3 figure
- …