57,724 research outputs found
Comparison of different strategies of utilizing fuzzy clustering in structure identification
Fuzzy systems approximate highly nonlinear systems by means of fuzzy "if-then"
rules. In the literature, various algorithms are proposed for mining. These algorithms commonly utilize fuzzy clustering in structure identification. Basically, there are three different approaches in which one can utilize fuzzy clustering; the �first one is based on input space clustering, the second one considers clustering realized in the output space, while the third one is concerned with clustering realized in the combined input-output space. In this study, we analyze these three approaches. We discuss each of the algorithms in great detail and o¤er a thorough comparative analysis. Finally, we compare the performances of these algorithms in a medical diagnosis classi�cation problem, namely Aachen Aphasia Test. The experiment and the results provide a valuable insight about the merits and the shortcomings of these three clustering approaches
Comparison of Similarity Measures for Trajectory Clustering - Aviation Use Case
Various distance-based clustering algorithms have been reported, but the core component of all of them is a similarity or distance measure for classification of data. Rather than setting the priority to comparison of the performance of different clustering algorithms, it may be worthy to analyze the influence of different similarity measures on the results of clustering algorithms. The main contribution of this work is a comparative study of the impact of 9 similarity measures on similarity-based trajectory clustering using DBSCAN algorithm for commercial flight dataset. The novelty in this comparison is exploring the robustness of the clustering algorithm with respect to algorithm parameter. We evaluate the accuracy of clustering, accuracy of anomaly detection, algorithmic efficiency, and we determine the behavior profile for each measure. We show that DTW and Frechet distance lead to the best clustering results, while LCSS and Hausdorff Cosine should be avoided for this task
Comparison of Various Improved-Partition Fuzzy c-Means Clustering Algorithms in Fast Color Reduction
This paper provides a comparative study of sev-
eral enhanced versions of the fuzzy
c
-means clustering al-
gorithm in an application of histogram-based image color
reduction. A common preprocessing is performed before clus-
tering, consisting of a preliminary color quantization, histogram
extraction and selection of frequently occurring colors of the
image. These selected colors will be clustered by tested
c
-means
algorithms. Clustering is followed by another common step,
which creates the output image. Besides conventional hard
(HCM) and fuzzy
c
-means (FCM) clustering, the so-called
generalized improved partition FCM algorithm, and several
versions of the suppressed FCM (s-FCM) in its conventional
and generalized form, are included in this study. Accuracy is
measured as the average color difference between pixels of the
input and output image, while efficiency is mostly characterized
by the total runtime of the performed color reduction. Nu-
merical evaluation found all enhanced FCM algorithms more
accurate, and four out of seven enhanced algorithms faster than
FCM. All tested algorithms can create reduced color images of
acceptable quality
A systematic comparison of genome-scale clustering algorithms
Background: A wealth of clustering algorithms has been applied to gene co-expression experiments. These algorithms cover a broad range of approaches, from conventional techniques such as k-means and hierarchical clustering, to graphical approaches such as k-clique communities, weighted gene co-expression networks (WGCNA) and paraclique. Comparison of these methods to evaluate their relative effectiveness provides guidance to algorithm selection, development and implementation. Most prior work on comparative clustering evaluation has focused on parametric methods. Graph theoretical methods are recent additions to the tool set for the global analysis and decomposition of microarray co-expression matrices that have not generally been included in earlier methodological comparisons. In the present study, a variety of parametric and graph theoretical clustering algorithms are compared using well-characterized transcriptomic data at a genome scale from Saccharomyces cerevisiae. Methods: For each clustering method under study, a variety of parameters were tested. Jaccard similarity was used to measure each clusters agreement with every GO and KEGG annotation set, and the highest Jaccard score was assigned to the cluster. Clusters were grouped into small, medium, and large bins, and the Jaccard score of the top five scoring clusters in each bin were averaged and reported as the best average top 5 (BAT5) score for the particular method. Results: Clusters produced by each method were evaluated based upon the positive match to known pathways. This produces a readily interpretable ranking of the relative effectiveness of clustering on the genes. Methods were also tested to determine whether they were able to identify clusters consistent with those identified by other clustering methods. Conclusions: Validation of clusters against known gene classifications demonstrate that for this data, graph-based techniques outperform conventional clustering approaches, suggesting that further development and application of combinatorial strategies is warranted
Towards Clustering of Mobile and Smartwatch Accelerometer Data for Physical Activity Recognition
Mobile and wearable devices now have a greater capability of sensing human activity ubiquitously and unobtrusively through advancements in miniaturization and sensing abilities. However, outstanding issues remain around the energy restrictions of these devices when processing large sets of data. This paper presents our approach that uses feature selection to refine the clustering of accelerometer data to detect physical activity. This also has a positive effect on the computational burden that is associated with processing large sets of data, as energy efficiency and resource use is decreased because less data is processed by the clustering algorithms. Raw accelerometer data, obtained from smartphones and smartwatches, have been preprocessed to extract both time and frequency domain features. Principle component analysis feature selection (PCAFS) and correlation feature selection (CFS) have been used to remove redundant features. The reduced feature sets have then been evaluated against three widely used clustering algorithms, including hierarchical clustering analysis (HCA), k-means, and density-based spatial clustering of applications with noise (DBSCAN). Using the reduced feature sets resulted in improved separability, reduced uncertainty, and improved efficiency compared with the baseline, which utilized all features. Overall, the CFS approach in conjunction with HCA produced higher Dunn Index results of 9.7001 for the phone and 5.1438 for the watch features, which is an improvement over the baseline. The results of this comparative study of feature selection and clustering, with the specific algorithms used, has not been performed previously and provides an optimistic and usable approach to recognize activities using either a smartphone or smartwatch
Mining Extremes through Fuzzy Clustering
Archetypes are extreme points that synthesize data representing "pure" individual types.
Archetypes are assigned by the most discriminating features of data points, and are almost
always useful in applications when one is interested in extremes and not on commonalities.
Recent applications include talent analysis in sports and science, fraud detection,
profiling of users and products in recommendation systems, climate extremes, as well as
other machine learning applications.
The furthest-sum Archetypal Analysis (FS-AA) (Mørup and Hansen, 2012) and the
Fuzzy Clustering with Proportional Membership (FCPM) (Nascimento, 2005) propose
distinct models to find clusters with extreme prototypes. Even though the FCPM model
does not impose its prototypes to lie in the convex hull of data, it belongs to the framework
of data recovery from clustering (Mirkin, 2005), a powerful property for unsupervised
cluster analysis. The baseline version of FCPM, FCPM-0, provides central prototypes
whereas its smooth version, FCPM-2 provides extreme prototypes as AA archetypes.
The comparative study between FS-AA and FCPM algorithms conducted in this dissertation
covers the following aspects. First, the analysis of FS-AA on data recovery from
clustering using a collection of 100 data sets of diverse dimensionalities, generated with
a proper data generator (FCPM-DG) as well as 14 real world data. Second, testing the
robustness of the clustering algorithms in the presence of outliers, with the peculiar behaviour
of FCPM-0 on removing the proper number of prototypes from data. Third, a
collection of five popular fuzzy validation indices are explored on accessing the quality
of clustering results. Forth, the algorithms undergo a study to evaluate how different
initializations affect their convergence as well as the quality of the clustering partitions.
The Iterative Anomalous Pattern (IAP) algorithm allows to improve the convergence of
FCPM algorithm as well as to fine-tune the level of resolution to look at clustering results,
which is an advantage from FS-AA. Proper visualization functionalities for FS-AA and
FCPM support the easy interpretation of the clustering results
- …