5,866 research outputs found
Techniques for clustering gene expression data
Many clustering techniques have been proposed for the analysis of gene expression data obtained from microarray experiments. However, choice of suitable method(s) for a given experimental dataset is not straightforward. Common approaches do not translate well and fail to take account of the data profile. This review paper surveys state of the art applications which recognises these limitations and implements procedures to overcome them. It provides a framework for the evaluation of clustering in gene expression analyses. The nature of microarray data is discussed briefly. Selected examples are presented for the clustering methods considered
Adaptive Evolutionary Clustering
In many practical applications of clustering, the objects to be clustered
evolve over time, and a clustering result is desired at each time step. In such
applications, evolutionary clustering typically outperforms traditional static
clustering by producing clustering results that reflect long-term trends while
being robust to short-term variations. Several evolutionary clustering
algorithms have recently been proposed, often by adding a temporal smoothness
penalty to the cost function of a static clustering method. In this paper, we
introduce a different approach to evolutionary clustering by accurately
tracking the time-varying proximities between objects followed by static
clustering. We present an evolutionary clustering framework that adaptively
estimates the optimal smoothing parameter using shrinkage estimation, a
statistical approach that improves a naive estimate using additional
information. The proposed framework can be used to extend a variety of static
clustering algorithms, including hierarchical, k-means, and spectral
clustering, into evolutionary clustering algorithms. Experiments on synthetic
and real data sets indicate that the proposed framework outperforms static
clustering and existing evolutionary clustering algorithms in many scenarios.Comment: To appear in Data Mining and Knowledge Discovery, MATLAB toolbox
available at http://tbayes.eecs.umich.edu/xukevin/affec
On Randomly Projected Hierarchical Clustering with Guarantees
Hierarchical clustering (HC) algorithms are generally limited to small data
instances due to their runtime costs. Here we mitigate this shortcoming and
explore fast HC algorithms based on random projections for single (SLC) and
average (ALC) linkage clustering as well as for the minimum spanning tree
problem (MST). We present a thorough adaptive analysis of our algorithms that
improve prior work from by up to a factor of for a
dataset of points in Euclidean space. The algorithms maintain, with
arbitrary high probability, the outcome of hierarchical clustering as well as
the worst-case running-time guarantees. We also present parameter-free
instances of our algorithms.Comment: This version contains the conference paper "On Randomly Projected
Hierarchical Clustering with Guarantees'', SIAM International Conference on
Data Mining (SDM), 2014 and, additionally, proofs omitted in the conference
versio
Spatial analysis for the distribution of cells in tissue sections
Spatial analysis, playing an essential role in data mining, is applied in a considerable number of fields. It is because of its broad applicability that dealing with the interdisciplinary issues is becoming more prevalent. It aims at exploring the underlying patterns of the data. In this project, we will employ the methodology that we utilize to tackle spatial problems to investigate how the cells distribute in the infected tissue sections and if there are clusters existing among the cells. The cells that are neighboring to the viruses are of interest. The data were provided by the Medetect Company in the form of 2-dimensional point data. We firstly adopted two common spatial analysis methods, clustering methods and proximity methods. In addition, a method for constructing a 2-dimensional hull was developed in order to delineate the compartments in tissue sections. A binomial test was conducted to evaluate the results. It is detectable that the clusters do exist among cells. The immune cells would accumulate around the viruses. We also found different patterns near and far away from viruses. This study implicates that the cells are interactive with each other and thus present the spatial patterns. However, our analyses are restricted in a planar circumstance instead of treating them in 3-dimensional space. For the further study, the spatial analysis could be carried out in three dimensions.It has been popular to utilize the heuristic methods or the existing methods to discover new findings and explain the mysterious phenomena in other subjects. And it is known that everything in nature relates to each other. In this sense, we could assume that the entire distribution of objects is relative to the locations of individuals. The idea of my work is attempting to explore this spatial relationship existing among cells. In my project, the relationships between individual cells or groups of cells are interesting. Our data is presented like the point cloud. It is doubted that if there are any groups existing among these points and if the viruses have neighbors. The methods are mainly categorized into three parts. The first method is to integrate the similar objects into groups. Here the similar objects could be the ones that are close to each other. The second method analyzes the degree of closeness between objects and looks for the neighbors of viruses. The last method can be used to draw the border of a point cloud, which seems like constructing the boundary of districts. Within each method, we carried out the corresponding case studies. Since similar objects can be grouped together, it is interesting to look into the details of each group. Thus we can know which two objects are similar in the same group. Basically, different types of cells in the same group can be checked and studied. In the closeness analysis, we found that some cells are indeed closer to each other. The constructed border could help us know the shape of point clouds. It can be concluded that the spatial relationship does exist among the cells. Groups of cells can be identified at a large extent. And one certain type of cells could be more attracted by some cells from a local level. However, this study is carried out in a 2D space. Actually, we neglect the real shape of cells which have heights. This could be a more interesting topic in the future
- …