3,859 research outputs found
Techniques for clustering gene expression data
Many clustering techniques have been proposed for the analysis of gene expression data obtained from microarray experiments. However, choice of suitable method(s) for a given experimental dataset is not straightforward. Common approaches do not translate well and fail to take account of the data profile. This review paper surveys state of the art applications which recognises these limitations and implements procedures to overcome them. It provides a framework for the evaluation of clustering in gene expression analyses. The nature of microarray data is discussed briefly. Selected examples are presented for the clustering methods considered
A CLUE for CLUster Ensembles
Cluster ensembles are collections of individual solutions to a given clustering problem which are useful or necessary to consider in a wide range of applications. The R package clue provides an extensible computational environment for creating and analyzing cluster ensembles, with basic data structures for representing partitions and hierarchies, and facilities for computing on these, including methods for measuring proximity and obtaining consensus and "secondary" clusterings.
Adaptive Evolutionary Clustering
In many practical applications of clustering, the objects to be clustered
evolve over time, and a clustering result is desired at each time step. In such
applications, evolutionary clustering typically outperforms traditional static
clustering by producing clustering results that reflect long-term trends while
being robust to short-term variations. Several evolutionary clustering
algorithms have recently been proposed, often by adding a temporal smoothness
penalty to the cost function of a static clustering method. In this paper, we
introduce a different approach to evolutionary clustering by accurately
tracking the time-varying proximities between objects followed by static
clustering. We present an evolutionary clustering framework that adaptively
estimates the optimal smoothing parameter using shrinkage estimation, a
statistical approach that improves a naive estimate using additional
information. The proposed framework can be used to extend a variety of static
clustering algorithms, including hierarchical, k-means, and spectral
clustering, into evolutionary clustering algorithms. Experiments on synthetic
and real data sets indicate that the proposed framework outperforms static
clustering and existing evolutionary clustering algorithms in many scenarios.Comment: To appear in Data Mining and Knowledge Discovery, MATLAB toolbox
available at http://tbayes.eecs.umich.edu/xukevin/affec
A between-cluster approach for clustering skew-symmetric data
In order to investigate exchanges between objects, a clustering model for skew-symmetric data is proposed, which relies on the between-cluster effects of the skew-symmetries that represent the imbalances of the observed exchanges between pairs of objects. The aim is to detect clusters of objects that share the same behaviour of exchange so that origin and destination clusters are identified. The proposed model is based on the decomposition of the skew-symmetric matrix pertaining to the imbalances between clusters into a sum of a number of off-diagonal block matrices. Each matrix can be approximated by a skew-symmetric matrix
by using a truncated Singular Value Decomposition (SVD) which exploits the properties of the skew-symmetric matrices. The model is fitted in a least-squares framework and an efficient Alternating Least Squares algorithm is provided. Finally, in order to show the potentiality of the model and the features of the resulting clusters, an extensive simulation study and an illustrative application to real data are presented
Recurrence-based time series analysis by means of complex network methods
Complex networks are an important paradigm of modern complex systems sciences
which allows quantitatively assessing the structural properties of systems
composed of different interacting entities. During the last years, intensive
efforts have been spent on applying network-based concepts also for the
analysis of dynamically relevant higher-order statistical properties of time
series. Notably, many corresponding approaches are closely related with the
concept of recurrence in phase space. In this paper, we review recent
methodological advances in time series analysis based on complex networks, with
a special emphasis on methods founded on recurrence plots. The potentials and
limitations of the individual methods are discussed and illustrated for
paradigmatic examples of dynamical systems as well as for real-world time
series. Complex network measures are shown to provide information about
structural features of dynamical systems that are complementary to those
characterized by other methods of time series analysis and, hence,
substantially enrich the knowledge gathered from other existing (linear as well
as nonlinear) approaches.Comment: To be published in International Journal of Bifurcation and Chaos
(2011
Clustering and its Application in Requirements Engineering
Large scale software systems challenge almost every activity in the software development life-cycle, including tasks related to eliciting, analyzing, and specifying requirements. Fortunately many of these complexities can be addressed through clustering the requirements in order to create abstractions that are meaningful to human stakeholders. For example, the requirements elicitation process can be supported through dynamically clustering incoming stakeholders’ requests into themes. Cross-cutting concerns, which have a significant impact on the architectural design, can be identified through the use of fuzzy clustering techniques and metrics designed to detect when a theme cross-cuts the dominant decomposition of the system. Finally, traceability techniques, required in critical software projects by many regulatory bodies, can be automated and enhanced by the use of cluster-based information retrieval methods. Unfortunately, despite a significant body of work describing document clustering techniques, there is almost no prior work which directly addresses the challenges, constraints, and nuances of requirements clustering. As a result, the effectiveness of software engineering tools and processes that depend on requirements clustering is severely limited. This report directly addresses the problem of clustering requirements through surveying standard clustering techniques and discussing their application to the requirements clustering process
A Survey on Clustering Algorithm for Microarray Gene Expression Data
The DNA data are huge multidimensional which contains the simultaneous gene expression and it uses the microarray chip technology, also handling these data are cumbersome. Microarray technique is used to measure the expression level from tens of thousands of gene in different condition such as time series during biological process. Clustering is an unsupervised learning process which partitions the given data set into similar or dissimilar groups. The mission of this research paper is to analyze the accuracy level of the microarray data using different clustering algorithms and identify the suitable algorithm for further research process
Robust techniques and applications in fuzzy clustering
This dissertation addresses issues central to frizzy classification. The issue of sensitivity to noise and outliers of least squares minimization based clustering techniques, such as Fuzzy c-Means (FCM) and its variants is addressed. In this work, two novel and robust clustering schemes are presented and analyzed in detail. They approach the problem of robustness from different perspectives. The first scheme scales down the FCM memberships of data points based on the distance of the points from the cluster centers. Scaling done on outliers reduces their membership in true clusters. This scheme, known as the Mega-clustering, defines a conceptual mega-cluster which is a collective cluster of all data points but views outliers and good points differently (as opposed to the concept of Dave\u27s Noise cluster). The scheme is presented and validated with experiments and similarities with Noise Clustering (NC) are also presented. The other scheme is based on the feasible solution algorithm that implements the Least Trimmed Squares (LTS) estimator. The LTS estimator is known to be resistant to noise and has a high breakdown point. The feasible solution approach also guarantees convergence of the solution set to a global optima. Experiments show the practicability of the proposed schemes in terms of computational requirements and in the attractiveness of their simplistic frameworks.
The issue of validation of clustering results has often received less attention than clustering itself. Fuzzy and non-fuzzy cluster validation schemes are reviewed and a novel methodology for cluster validity using a test for random position hypothesis is developed. The random position hypothesis is tested against an alternative clustered hypothesis on every cluster produced by the partitioning algorithm. The Hopkins statistic is used as a basis to accept or reject the random position hypothesis, which is also the null hypothesis in this case. The Hopkins statistic is known to be a fair estimator of randomness in a data set. The concept is borrowed from the clustering tendency domain and its applicability to validating clusters is shown here.
A unique feature selection procedure for use with large molecular conformational datasets with high dimensionality is also developed. The intelligent feature extraction scheme not only helps in reducing dimensionality of the feature space but also helps in eliminating contentious issues such as the ones associated with labeling of symmetric atoms in the molecule. The feature vector is converted to a proximity matrix, and is used as an input to the relational fuzzy clustering (FRC) algorithm with very promising results. Results are also validated using several cluster validity measures from literature. Another application of fuzzy clustering considered here is image segmentation. Image analysis on extremely noisy images is carried out as a precursor to the development of an automated real time condition state monitoring system for underground pipelines. A two-stage FCM with intelligent feature selection is implemented as the segmentation procedure and results on a test image are presented. A conceptual framework for automated condition state assessment is also developed
- …