Location of Repository

Similarity and cluster analysis algorithms for Microarrays using R * trees

By Jiaxiong Pi, Yong Shi and Zhengxin Chen

Abstract

Similarity and cluster analysis are important aspects for analyzing microarray data. Based on our perspective of viewing microarrays as time series data, both similarity analysis and cluster analysis are carried out through indexing on time series data using R*-Trees. We have developed algorithms for similarity and cluster analysis on microarray data, and conducted experimental studies and comparative studies. First, our study shows that principle components analysis (PCA) has superiority over several other methods (such as DFT and PAA) as far as distance conservation is concerned. A similarity analysis tool based on PCA has been developed, which is able to explore less R*-Tree nodes before finding its similar counterparts and returns less false positives than other methods. In addition, we also extend R*-Tree’s application to cluster analysis. With the aid of R*-Tree indexing, two clustering algorithms, KMeans-R and Hierarchy-R, are proposed as an improved version of K-Means and hierarchical clustering, respectively. Experiments for similarity search and cluster analysis based on proposed algorithms have been carried out and have shown favorable results. Experiments related to yeast cell cycle dataset are reported in this paper. 1

Year: 2011
OAI identifier: oai:CiteSeerX.psu:10.1.1.192.2433
Provided by: CiteSeerX
Download PDF:
Sorry, we are unable to provide the full text but you may find it at the following location(s):
  • http://citeseerx.ist.psu.edu/v... (external link)
  • http://conferences.computer.or... (external link)
  • Suggested articles


    To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.