18,484 research outputs found
Noise resistant generalized parametric validity index of clustering for gene expression data
This article has been made available through the Brunel Open Access Publishing Fund.Validity indices have been investigated for decades. However, since there is no study of noise-resistance performance of these indices in the literature, there is no guideline for determining the best clustering in noisy data sets, especially microarray data sets. In this paper, we propose a generalized parametric validity (GPV) index which employs two tunable parameters α and ÎČ to control the proportions of objects being considered to calculate the dissimilarities. The greatest advantage of the proposed GPV index is its noise-resistance ability, which results from the flexibility of tuning the parameters. Several rules are set to guide the selection of parameter values. To illustrate the noise-resistance performance of the proposed index, we evaluate the GPV index for assessing five clustering algorithms in two gene expression data simulation models with different noise levels and compare the ability of determining the number of clusters with eight existing indices. We also test the GPV in three groups of real gene expression data sets. The experimental results suggest that the proposed GPV index has superior noise-resistance ability and provides fairly accurate judgements
Observer-biased bearing condition monitoring: from fault detection to multi-fault classification
Bearings are simultaneously a fundamental component and one of the principal causes of failure in rotary machinery. The work focuses on the employment of fuzzy clustering for bearing condition monitoring, i.e., fault detection and classification. The output of a clustering algorithm is a data partition (a set of clusters) which is merely a hypothesis on the structure of the data. This hypothesis requires validation by domain experts. In general, clustering algorithms allow a limited usage of domain knowledge on the cluster formation process. In this study, a novel method allowing for interactive clustering in bearing fault diagnosis is proposed. The method resorts to shrinkage to generalize an otherwise unbiased clustering algorithm into a biased one. In this way, the method provides a natural and intuitive way to control the cluster formation process, allowing for the employment of domain knowledge to guiding it. The domain expert can select a desirable level of granularity ranging from fault detection to classification of a variable number of faults and can select a specific region of the feature space for detailed analysis. Moreover, experimental results under realistic conditions show that the adopted algorithm outperforms the corresponding unbiased algorithm (fuzzy c-means) which is being widely used in this type of problems. (C) 2016 Elsevier Ltd. All rights reserved.Grant number: 145602
An integrative clustering approach combining particle swarm optimization and formal concept analysis
A Survey on Soft Subspace Clustering
Subspace clustering (SC) is a promising clustering technology to identify
clusters based on their associations with subspaces in high dimensional spaces.
SC can be classified into hard subspace clustering (HSC) and soft subspace
clustering (SSC). While HSC algorithms have been extensively studied and well
accepted by the scientific community, SSC algorithms are relatively new but
gaining more attention in recent years due to better adaptability. In the
paper, a comprehensive survey on existing SSC algorithms and the recent
development are presented. The SSC algorithms are classified systematically
into three main categories, namely, conventional SSC (CSSC), independent SSC
(ISSC) and extended SSC (XSSC). The characteristics of these algorithms are
highlighted and the potential future development of SSC is also discussed.Comment: This paper has been published in Information Sciences Journal in 201
Structural Equation Modeling and simultaneous clustering through the Partial Least Squares algorithm
The identification of different homogeneous groups of observations and their
appropriate analysis in PLS-SEM has become a critical issue in many appli-
cation fields. Usually, both SEM and PLS-SEM assume the homogeneity of all
units on which the model is estimated, and approaches of segmentation present
in literature, consist in estimating separate models for each segments of
statistical units, which have been obtained either by assigning the units to
segments a priori defined. However, these approaches are not fully accept- able
because no causal structure among the variables is postulated. In other words,
a modeling approach should be used, where the obtained clusters are homogeneous
with respect to the structural causal relationships. In this paper, a new
methodology for simultaneous non-hierarchical clus- tering and PLS-SEM is
proposed. This methodology is motivated by the fact that the sequential
approach of applying first SEM or PLS-SEM and second the clustering algorithm
such as K-means on the latent scores of the SEM/PLS-SEM may fail to find the
correct clustering structure existing in the data. A simulation study and an
application on real data are included to evaluate the performance of the
proposed methodology
Relational visual cluster validity
The assessment of cluster validity plays a very important role in cluster analysis. Most commonly used cluster validity methods are based on statistical hypothesis testing or finding the best clustering scheme by computing a number of different cluster validity indices. A number of visual methods of cluster validity have been produced to display directly the validity of clusters by mapping data into two- or three-dimensional space. However, these methods may lose too much information to correctly estimate the results of clustering algorithms. Although the visual cluster validity (VCV) method of Hathaway and Bezdek can successfully solve this problem, it can only be applied for object data, i.e. feature measurements. There are very few validity methods that can be used to analyze the validity of data where only a similarity or dissimilarity relation exists â relational data. To tackle this problem, this paper presents a relational visual cluster validity (RVCV) method to assess the validity of clustering relational data. This is done by combining the results of the non-Euclidean relational fuzzy c-means (NERFCM) algorithm with a modification of the VCV method to produce a visual representation of cluster validity. RVCV can cluster complete and incomplete relational data and adds to the visual cluster validity theory. Numeric examples using synthetic and real data are presente
A similarity-based community detection method with multiple prototype representation
Communities are of great importance for understanding graph structures in
social networks. Some existing community detection algorithms use a single
prototype to represent each group. In real applications, this may not
adequately model the different types of communities and hence limits the
clustering performance on social networks. To address this problem, a
Similarity-based Multi-Prototype (SMP) community detection approach is proposed
in this paper. In SMP, vertices in each community carry various weights to
describe their degree of representativeness. This mechanism enables each
community to be represented by more than one node. The centrality of nodes is
used to calculate prototype weights, while similarity is utilized to guide us
to partitioning the graph. Experimental results on computer generated and
real-world networks clearly show that SMP performs well for detecting
communities. Moreover, the method could provide richer information for the
inner structure of the detected communities with the help of prototype weights
compared with the existing community detection models
- âŠ