3 research outputs found

    E-CAST: A Data Mining Algorithm for Gene Expression Data

    No full text
    Data clustering methods have been proven to be a successful data mining technique in the analysis of gene expression data. The Cluster affinity search technique (CAST) developed by Ben-Dor, et. al., 1999, which has been shown to cluster gene expression data well, has two drawbacks. First, the algorithm uses a fixed initial threshold value to start the clustering. As stated in the original paper, this parameter directly affects the size and number of clusters produced. Second, the algorithm requires a final cleaning step, which takes O(n ), to relocate n data points among the existing clusters. In this paper, we have developed and enhanced CAST algorithm, called E-CAST, that uses a dynamic threshold. The threshold value is computed at the beginning of each new cluster. We have implemented both CAST and E-CAST algorithms and tested their performance using three different data sets. The datasets are real gene expression data from melanoma, pheochromocytoma and brain cell tissue samples generated using micro-arrays technology. The results of both implementations were compared to the output from the hierarchical clustering program, written by Michael Eisen, with very comparable results. Not only did the final results compare favorably with the hierarchical approach, but they also indicate that the cleaning step of the original CAST algorithm may be unnecessary. Keywords: Clustering, Data mining, Bio-informatics, Gene expression, Graph theory, Micro-array. 1

    Natural Language Processing for Social Media, Second Edition

    No full text
    corecore