19,689 research outputs found
Techniques for clustering gene expression data
Many clustering techniques have been proposed for the analysis of gene expression data obtained from microarray experiments. However, choice of suitable method(s) for a given experimental dataset is not straightforward. Common approaches do not translate well and fail to take account of the data profile. This review paper surveys state of the art applications which recognises these limitations and implements procedures to overcome them. It provides a framework for the evaluation of clustering in gene expression analyses. The nature of microarray data is discussed briefly. Selected examples are presented for the clustering methods considered
A similarity-based community detection method with multiple prototype representation
Communities are of great importance for understanding graph structures in
social networks. Some existing community detection algorithms use a single
prototype to represent each group. In real applications, this may not
adequately model the different types of communities and hence limits the
clustering performance on social networks. To address this problem, a
Similarity-based Multi-Prototype (SMP) community detection approach is proposed
in this paper. In SMP, vertices in each community carry various weights to
describe their degree of representativeness. This mechanism enables each
community to be represented by more than one node. The centrality of nodes is
used to calculate prototype weights, while similarity is utilized to guide us
to partitioning the graph. Experimental results on computer generated and
real-world networks clearly show that SMP performs well for detecting
communities. Moreover, the method could provide richer information for the
inner structure of the detected communities with the help of prototype weights
compared with the existing community detection models
A hierarchical Mamdani-type fuzzy modelling approach with new training data selection and multi-objective optimisation mechanisms: A special application for the prediction of mechanical properties of alloy steels
In this paper, a systematic data-driven fuzzy modelling methodology is proposed, which allows to construct Mamdani fuzzy models considering both accuracy (precision) and transparency (interpretability) of fuzzy systems. The new methodology employs a fast hierarchical clustering algorithm to generate an initial fuzzy model efficiently; a training data selection mechanism is developed to identify appropriate and efficient data as learning samples; a high-performance Particle Swarm Optimisation (PSO) based multi-objective optimisation mechanism is developed to further improve the fuzzy model in terms of both the structure and the parameters; and a new tolerance analysis method is proposed to derive the confidence bands relating to the final elicited models. This proposed modelling approach is evaluated using two benchmark problems and is shown to outperform other modelling approaches. Furthermore, the proposed approach is successfully applied to complex high-dimensional modelling problems for manufacturing of alloy steels, using ârealâ industrial data. These problems concern the prediction of the mechanical properties of alloy steels by correlating them with the heat treatment process conditions as well as the weight percentages of the chemical compositions
Simple Measures of Individual Cluster-Membership Certainty for Hard Partitional Clustering
We propose two probability-like measures of individual cluster-membership
certainty which can be applied to a hard partition of the sample such as that
obtained from the Partitioning Around Medoids (PAM) algorithm, hierarchical
clustering or k-means clustering. One measure extends the individual silhouette
widths and the other is obtained directly from the pairwise dissimilarities in
the sample. Unlike the classic silhouette, however, the measures behave like
probabilities and can be used to investigate an individual's tendency to belong
to a cluster. We also suggest two possible ways to evaluate the hard partition.
We evaluate the performance of both measures in individuals with ambiguous
cluster membership, using simulated binary datasets that have been partitioned
by the PAM algorithm or continuous datasets that have been partitioned by
hierarchical clustering and k-means clustering. For comparison, we also present
results from soft clustering algorithms such as soft analysis clustering
(FANNY) and two model-based clustering methods. Our proposed measures perform
comparably to the posterior-probability estimators from either FANNY or the
model-based clustering methods. We also illustrate the proposed measures by
applying them to Fisher's classic iris data set
A Short Survey on Data Clustering Algorithms
With rapidly increasing data, clustering algorithms are important tools for
data analytics in modern research. They have been successfully applied to a
wide range of domains; for instance, bioinformatics, speech recognition, and
financial analysis. Formally speaking, given a set of data instances, a
clustering algorithm is expected to divide the set of data instances into the
subsets which maximize the intra-subset similarity and inter-subset
dissimilarity, where a similarity measure is defined beforehand. In this work,
the state-of-the-arts clustering algorithms are reviewed from design concept to
methodology; Different clustering paradigms are discussed. Advanced clustering
algorithms are also discussed. After that, the existing clustering evaluation
metrics are reviewed. A summary with future insights is provided at the end
A new fuzzy set merging technique using inclusion-based fuzzy clustering
This paper proposes a new method of merging parameterized fuzzy sets based on clustering in the parameters space, taking into account the degree of inclusion of each fuzzy set in the cluster prototypes. The merger method is applied to fuzzy rule base simplification by automatically replacing the fuzzy sets corresponding to a given cluster with that pertaining to cluster prototype. The feasibility and the performance of the proposed method are studied using an application in mobile robot navigation. The results indicate that the proposed merging and rule base simplification approach leads to good navigation performance in the application considered and to fuzzy models that are interpretable by experts. In this paper, we concentrate mainly on fuzzy systems with Gaussian membership functions, but the general approach can also be applied to other parameterized fuzzy sets
- âŠ