Sequential Extraction of Several Gene-sets with Proper Groups of Individuals for Gene Expression Data Analysis

Abstract

One of the ultimate goals of microarray gene expression data analysis in bioinformatics is to identify individual genes or gene-sets which influence the gene expression patterns. There are several research areas in bioinformatics, where data analysis offers a challenging statistical problem due to their high dimensionality with small sample of sizes. Clustering is one of the most popular statistical techniques to addressing these challenges. Nowak and Tibshirani (2008) proposed complementary hierarchical clustering (CHC) for sequential exaction of several gene-sets having relatively low expressions than highly expressed genes. However it produces misleading clustering results for sequential exaction of several gene-sets if there exist some contaminations (outliers) in the gene expression data, which is an important issue in gene expression data analysis research field. Therefore, in this paper we proposed a robust statistical clustering technique based on the value of tuning parameter β, we called β- CHC for sequential extraction of biologically important gene-sets has similar expression patters with proper groups of individuals the genes expression data analysis in bioinformatics from the robustness points of view. The proposed robust method reduces to the traditional method when we put the value of tuning parameter β→0. Simulation gene expression data clustering results show that the performance of the proposed method is better than performance of the traditional method in the case of data contaminations; otherwise, it shows almost equal performance.International Conference on Statistical Data Mining for Bioinformatics Health Agriculture and Environment, 21-24 December, 2012, Rajshahi University, Banglades

    Similar works