15,123 research outputs found
Relational visual cluster validity
The assessment of cluster validity plays a very important role in cluster analysis. Most commonly used cluster validity methods are based on statistical hypothesis testing or finding the best clustering scheme by computing a number of different cluster validity indices. A number of visual methods of cluster validity have been produced to display directly the validity of clusters by mapping data into two- or three-dimensional space. However, these methods may lose too much information to correctly estimate the results of clustering algorithms. Although the visual cluster validity (VCV) method of Hathaway and Bezdek can successfully solve this problem, it can only be applied for object data, i.e. feature measurements. There are very few validity methods that can be used to analyze the validity of data where only a similarity or dissimilarity relation exists – relational data. To tackle this problem, this paper presents a relational visual cluster validity (RVCV) method to assess the validity of clustering relational data. This is done by combining the results of the non-Euclidean relational fuzzy c-means (NERFCM) algorithm with a modification of the VCV method to produce a visual representation of cluster validity. RVCV can cluster complete and incomplete relational data and adds to the visual cluster validity theory. Numeric examples using synthetic and real data are presente
LinkCluE: A MATLAB Package for Link-Based Cluster Ensembles
Cluster ensembles have emerged as a powerful meta-learning paradigm that provides improved accuracy and robustness by aggregating several input data clusterings. In particular, link-based similarity methods have recently been introduced with superior performance to the conventional co-association approach. This paper presents a MATLAB package, LinkCluE, that implements the link-based cluster ensemble framework. A variety of functional methods for evaluating clustering results, based on both internal and external criteria, are also provided. Additionally, the underlying algorithms together with the sample uses of the package with interesting real and synthetic datasets are demonstrated herein.
Comparing clusterings and numbers of clusters by aggregation of calibrated clustering validity indexes
A key issue in cluster analysis is the choice of an appropriate clustering
method and the determination of the best number of clusters. Different
clusterings are optimal on the same data set according to different criteria,
and the choice of such criteria depends on the context and aim of clustering.
Therefore, researchers need to consider what data analytic characteristics the
clusters they are aiming at are supposed to have, among others within-cluster
homogeneity, between-clusters separation, and stability. Here, a set of
internal clustering validity indexes measuring different aspects of clustering
quality is proposed, including some indexes from the literature. Users can
choose the indexes that are relevant in the application at hand. In order to
measure the overall quality of a clustering (for comparing clusterings from
different methods and/or different numbers of clusters), the index values are
calibrated for aggregation. Calibration is relative to a set of random
clusterings on the same data. Two specific aggregated indexes are proposed and
compared with existing indexes on simulated and real data.Comment: 42 pages, 11 figure
Neuroengineering of Clustering Algorithms
Cluster analysis can be broadly divided into multivariate data visualization, clustering algorithms, and cluster validation. This dissertation contributes neural network-based techniques to perform all three unsupervised learning tasks. Particularly, the first paper provides a comprehensive review on adaptive resonance theory (ART) models for engineering applications and provides context for the four subsequent papers. These papers are devoted to enhancements of ART-based clustering algorithms from (a) a practical perspective by exploiting the visual assessment of cluster tendency (VAT) sorting algorithm as a preprocessor for ART offline training, thus mitigating ordering effects; and (b) an engineering perspective by designing a family of multi-criteria ART models: dual vigilance fuzzy ART and distributed dual vigilance fuzzy ART (both of which are capable of detecting complex cluster structures), merge ART (aggregates partitions and lessens ordering effects in online learning), and cluster validity index vigilance in fuzzy ART (features a robust vigilance parameter selection and alleviates ordering effects in offline learning). The sixth paper consists of enhancements to data visualization using self-organizing maps (SOMs) by depicting in the reduced dimension and topology-preserving SOM grid information-theoretic similarity measures between neighboring neurons. This visualization\u27s parameters are estimated using samples selected via a single-linkage procedure, thereby generating heatmaps that portray more homogeneous within-cluster similarities and crisper between-cluster boundaries. The seventh paper presents incremental cluster validity indices (iCVIs) realized by (a) incorporating existing formulations of online computations for clusters\u27 descriptors, or (b) modifying an existing ART-based model and incrementally updating local density counts between prototypes. Moreover, this last paper provides the first comprehensive comparison of iCVIs in the computational intelligence literature --Abstract, page iv
The Consensus Clustering as a Contribution to Parental Recognition Problem Based on Hand Biometrics
The clustering analysis is a subject that has been interesting researchers from several areas, such as health (medical diagnosis, clustering of proteins and genes), marketing (market analysis and image segmentation), information management (clustering of web pages). The clustering algorithms are usually applied in Data Mining, allowing the identification of natural groups for a given data set. The use of different clustering methods for the same data set can produce different groups. So, several studies have been led to validate the resulting clusters. There has been an increasing interest on how to determine a consensus clustering that combines the different individual clusterings, reflecting the main structure in clusters inherent to each of them, as a perspective to get a higher quality clustering. As several techniques of consensus clustering have been researched, the present work focuses on problem of finding the best partition in the consensus clustering. We analyze the most referred techniques in literature, the consensus clustering techniques with different mechanisms to achieve the consensus, i.e.; Voting mechanisms; Co-association matrix; Mutual Information and hyper-graphs; and a multi-objective consensus clustering existing on literature. In this paper we discuss these approaches and a comparative study is presented, that considers a set of experiments using two-dimensional synthetic data sets with different characteristics, as number of clusters, their cardinality, shape, homogeneity and separability, and a real-world data set based on hand\u27s biometrics shape, in context of people parental recognition. With this data we intend to investigate the ability of the consensus clustering algorithms in correctly cluster a child and her/his parents. This has an enormous business potential leading to a great economic value, since that with this technology a website can match data, as hand\u27s photographs, and say if A and B are related somehow. We conclude that, in some cases, the multi-objective technique proved to outperform the other techniques, and unlike the other techniques, is little influenced by poor clustering even in situations like noise introduction and clusters with different homogeneity or overlapped. Furthermore, shows that can capture the performance of the best base clustering and still outperform it. Regarding to real data, no technique was capable of identifying a person\u27s mother/father. However, the research of distances between hands from a person and its father, mother, siblings, can retrieve the probability of that person being his/her familiar. This doesn\u27t enable the identification of relatives but instead, decreases the size of database for seeking the matches
A Survey of Adaptive Resonance Theory Neural Network Models for Engineering Applications
This survey samples from the ever-growing family of adaptive resonance theory
(ART) neural network models used to perform the three primary machine learning
modalities, namely, unsupervised, supervised and reinforcement learning. It
comprises a representative list from classic to modern ART models, thereby
painting a general picture of the architectures developed by researchers over
the past 30 years. The learning dynamics of these ART models are briefly
described, and their distinctive characteristics such as code representation,
long-term memory and corresponding geometric interpretation are discussed.
Useful engineering properties of ART (speed, configurability, explainability,
parallelization and hardware implementation) are examined along with current
challenges. Finally, a compilation of online software libraries is provided. It
is expected that this overview will be helpful to new and seasoned ART
researchers
- …