29,051 research outputs found
A Bayesian alternative to mutual information for the hierarchical clustering of dependent random variables
The use of mutual information as a similarity measure in agglomerative
hierarchical clustering (AHC) raises an important issue: some correction needs
to be applied for the dimensionality of variables. In this work, we formulate
the decision of merging dependent multivariate normal variables in an AHC
procedure as a Bayesian model comparison. We found that the Bayesian
formulation naturally shrinks the empirical covariance matrix towards a matrix
set a priori (e.g., the identity), provides an automated stopping rule, and
corrects for dimensionality using a term that scales up the measure as a
function of the dimensionality of the variables. Also, the resulting log Bayes
factor is asymptotically proportional to the plug-in estimate of mutual
information, with an additive correction for dimensionality in agreement with
the Bayesian information criterion. We investigated the behavior of these
Bayesian alternatives (in exact and asymptotic forms) to mutual information on
simulated and real data. An encouraging result was first derived on
simulations: the hierarchical clustering based on the log Bayes factor
outperformed off-the-shelf clustering techniques as well as raw and normalized
mutual information in terms of classification accuracy. On a toy example, we
found that the Bayesian approaches led to results that were similar to those of
mutual information clustering techniques, with the advantage of an automated
thresholding. On real functional magnetic resonance imaging (fMRI) datasets
measuring brain activity, it identified clusters consistent with the
established outcome of standard procedures. On this application, normalized
mutual information had a highly atypical behavior, in the sense that it
systematically favored very large clusters. These initial experiments suggest
that the proposed Bayesian alternatives to mutual information are a useful new
tool for hierarchical clustering
Speaker segmentation and clustering
This survey focuses on two challenging speech processing topics, namely: speaker segmentation and speaker clustering. Speaker segmentation aims at finding speaker change points in an audio stream, whereas speaker clustering aims at grouping speech segments based on speaker characteristics. Model-based, metric-based, and hybrid speaker segmentation algorithms are reviewed. Concerning speaker clustering, deterministic and probabilistic algorithms are examined. A comparative assessment of the reviewed algorithms is undertaken, the algorithm advantages and disadvantages are indicated, insight to the algorithms is offered, and deductions as well as recommendations are given. Rich transcription and movie analysis are candidate applications that benefit from combined speaker segmentation and clustering. © 2007 Elsevier B.V. All rights reserved
Automatic post-processing for tolerance inspection of digitized parts made by injection moulding
This paper presents the advancements of an automatic segmentation procedure based on the concept of Hierarchical Space Partitioning. It is aimed at tolerance inspection of electromechanical parts produced by injection moulding and acquired by laser scanning. After a general overview of the procedure, its application for recognising cylindrical surfaces is presented and discussed through a specific industrial test case
Multivariate Approaches to Classification in Extragalactic Astronomy
Clustering objects into synthetic groups is a natural activity of any
science. Astrophysics is not an exception and is now facing a deluge of data.
For galaxies, the one-century old Hubble classification and the Hubble tuning
fork are still largely in use, together with numerous mono-or bivariate
classifications most often made by eye. However, a classification must be
driven by the data, and sophisticated multivariate statistical tools are used
more and more often. In this paper we review these different approaches in
order to situate them in the general context of unsupervised and supervised
learning. We insist on the astrophysical outcomes of these studies to show that
multivariate analyses provide an obvious path toward a renewal of our
classification of galaxies and are invaluable tools to investigate the physics
and evolution of galaxies.Comment: Open Access paper.
http://www.frontiersin.org/milky\_way\_and\_galaxies/10.3389/fspas.2015.00003/abstract\>.
\<10.3389/fspas.2015.00003 \&g
On clustering procedures and nonparametric mixture estimation
This paper deals with nonparametric estimation of conditional den-sities in
mixture models in the case when additional covariates are available. The
proposed approach consists of performing a prelim-inary clustering algorithm on
the additional covariates to guess the mixture component of each observation.
Conditional densities of the mixture model are then estimated using kernel
density estimates ap-plied separately to each cluster. We investigate the
expected L 1 -error of the resulting estimates and derive optimal rates of
convergence over classical nonparametric density classes provided the
clustering method is accurate. Performances of clustering algorithms are
measured by the maximal misclassification error. We obtain upper bounds of this
quantity for a single linkage hierarchical clustering algorithm. Lastly,
applications of the proposed method to mixture models involving elec-tricity
distribution data and simulated data are presented
Automated construction of a hierarchy of self-organized neural network classifiers
This paper documents an effort to design and implement a neural network-based, automatic classification system which dynamically constructs and trains a decision tree. The system is a combination of neural network and decision tree technology. The decision tree is constructed to partition a large classification problem into smaller problems. The neural network modules then solve these smaller problems. We used a variant of the Fuzzy ARTMAP neural network which can be trained much more quickly than traditional neural networks. The research extends the concept of self-organization from within the neural network to the overall structure of the dynamically constructed decision hierarchy. The primary advantage is avoidance of manual tedium and subjective bias in constructing decision hierarchies. Additionally, removing the need for manual construction of the hierarchy opens up a large class of potential classification applications. When tested on data from real-world images, the automatically generated hierarchies performed slightly better than an intuitive (handbuilt) hierarchy. Because the neural networks at the nodes of the decision hierarchy are solving smaller problems, generalization performance can really be improved if the number of features used to solve these problems is reduced. Algorithms for automatically selecting which features to use for each individual classification module were also implemented. We were able to achieve the same level of performance as in previous manual efforts, but in an efficient, automatic manner. The technology developed has great potential in a number of commercial areas, including data mining, pattern recognition, and intelligent interfaces for personal computer applications. Sample applications include: fraud detection, bankruptcy prediction, data mining agent, scalable object recognition system, email agent, resource librarian agent, and a decision aid agent
Hierarchical structure-and-motion recovery from uncalibrated images
This paper addresses the structure-and-motion problem, that requires to find
camera motion and 3D struc- ture from point matches. A new pipeline, dubbed
Samantha, is presented, that departs from the prevailing sequential paradigm
and embraces instead a hierarchical approach. This method has several
advantages, like a provably lower computational complexity, which is necessary
to achieve true scalability, and better error containment, leading to more
stability and less drift. Moreover, a practical autocalibration procedure
allows to process images without ancillary information. Experiments with real
data assess the accuracy and the computational efficiency of the method.Comment: Accepted for publication in CVI
- …