20,721 research outputs found
Robust constrained fuzzy clustering
It is well-known that outliers and noisy data can be very harmful when applying
clustering methods. Several fuzzy clustering methods which are able
to handle the presence of noise have been proposed. In this work, we propose
a robust clustering approach called F-TCLUST based on an âimpartialâ
(i.e., self-determined by data) trimming. The proposed approach considers
an eigenvalue ratio constraint that makes it a mathematically well-defined
problem and serves to control the allowed differences among cluster scatters.
A computationally feasible algorithm is proposed for its practical implementation.
Some guidelines about how to choose the parameters controlling the
performance of the fuzzy clustering procedure are also given.EstadĂstica e I
Fuzzy Clustering Throug Robust Factor Analyzers
ProducciĂłn CientĂficaIn fuzzy clustering, data elements can belong to more than one cluster , and membership levels are associated with each element, to indicate the strength of the association between that data element and a particular cluster. Unfortunately, fuzzy clustering is not robust, while in real applications the data is contaminated by outliers and noise, and the assumed underlying Gaussian distributions could be unrealistic. Here we propose a robust fuzzy estimator for clustering through Factor Analyzers, by introducing the joint usage of trimming and of constrained estimation of noise matrices in the classic Maximum Likelihood approach
Semi-supervised cross-entropy clustering with information bottleneck constraint
In this paper, we propose a semi-supervised clustering method, CEC-IB, that
models data with a set of Gaussian distributions and that retrieves clusters
based on a partial labeling provided by the user (partition-level side
information). By combining the ideas from cross-entropy clustering (CEC) with
those from the information bottleneck method (IB), our method trades between
three conflicting goals: the accuracy with which the data set is modeled, the
simplicity of the model, and the consistency of the clustering with side
information. Experiments demonstrate that CEC-IB has a performance comparable
to Gaussian mixture models (GMM) in a classical semi-supervised scenario, but
is faster, more robust to noisy labels, automatically determines the optimal
number of clusters, and performs well when not all classes are present in the
side information. Moreover, in contrast to other semi-supervised models, it can
be successfully applied in discovering natural subgroups if the partition-level
side information is derived from the top levels of a hierarchical clustering
Multimodal decision-level fusion for person authentication
In this paper, the use of clustering algorithms for decision-level data fusion is proposed. Person authentication results coming from several modalities (e.g., still image, speech), are combined by using fuzzy k-means (FKM), fuzzy vector quantization (FVQ) algorithms, and median radial basis function (MRBF) network. The quality measure of the modalities data is used for fuzzification. Two modifications of the FKM and FVQ algorithms, based on a novel fuzzy vector distance definition, are proposed to handle the fuzzy data and utilize the quality measure. Simulations show that fuzzy clustering algorithms have better performance compared to the classical clustering algorithms and other known fusion algorithms. MRBF has better performance especially when two modalities are combined. Moreover, the use of the quality via the proposed modified algorithms increases the performance of the fusion system
Laplacian Mixture Modeling for Network Analysis and Unsupervised Learning on Graphs
Laplacian mixture models identify overlapping regions of influence in
unlabeled graph and network data in a scalable and computationally efficient
way, yielding useful low-dimensional representations. By combining Laplacian
eigenspace and finite mixture modeling methods, they provide probabilistic or
fuzzy dimensionality reductions or domain decompositions for a variety of input
data types, including mixture distributions, feature vectors, and graphs or
networks. Provable optimal recovery using the algorithm is analytically shown
for a nontrivial class of cluster graphs. Heuristic approximations for scalable
high-performance implementations are described and empirically tested.
Connections to PageRank and community detection in network analysis demonstrate
the wide applicability of this approach. The origins of fuzzy spectral methods,
beginning with generalized heat or diffusion equations in physics, are reviewed
and summarized. Comparisons to other dimensionality reduction and clustering
methods for challenging unsupervised machine learning problems are also
discussed.Comment: 13 figures, 35 reference
- âŠ