5 research outputs found

    A novel principal component analysis method for identifying differentially expressed gene signatures

    Get PDF
    Microarray data sets contain a wealth of information on the gene expression levels for thousands of genes for small number of different conditions called assays. But, the information is hidden by high noise levels, and low signal levels. Data mining techniques are used to extract the information of genes related to the assays. This work proposed a powerful principal component analysis (PCA) based method in extend of PCA approach of Rollins et. al. (2006). The proposed method is able to generate gene signatures that expressed the most differently between two assay groups in a microarray data set. This work developed and evaluated two new test statistics based on PCA and they were found to be effective as evaluated in different case studies including real and simulated data. The methods proposed in this work were compared to the current method. The proposed method was favor in term of high statistical power and low false discovery rate. Therefore, the PCA based approach is highly recommended for use in gene expressions data analysis

    An extended data mining method for identifying differentially expressed assay-specific signatures in functional genomic studies

    No full text
    Background: Microarray data sets provide relative expression levels for thousands of genes for a small number, in comparison, of different experimental conditions called assays. Data mining techniques are used to extract specific information of genes as they relate to the assays. The multivariate statistical technique of principal component analysis (PCA) has proven useful in providing effective data mining methods. This article extends the PCA approach of Rollins et al. to the development of ranking genes of microarray data sets that express most differently between two biologically different grouping of assays. This method is evaluated on real and simulated data and compared to a current approach on the basis of false discovery rate (FDR) and statistical power (SP) which is the ability to correctly identify important genes. Results. This work developed and evaluated two new test statistics based on PCA and compared them to a popular method that is not PCA based. Both test statistics were found to be effective as evaluated in three case studies: (i) exposing E. coli cells to two different ethanol levels; (ii) application of myostatin to two groups of mice; and (iii) a simulated data study derived from the properties of (ii). The proposed method (PM) effectively identified critical genes in these studies based on comparison with the current method (CM). The simulation study supports higher identification accuracy for PM over CM for both proposed test statistics when the gene variance is constant and for one of the test statistics when the gene variance is non-constant. Conclusions. PM compares quite favorably to CM in terms of lower FDR and much higher SP. Thus, PM can be quite effective in producing accurate signatures from large microarray data sets for differential expression between assays groups identified in a preliminary step of the PCA procedure and is, therefore, recommended for use in these applications.This article is from BioData Mining 3 (2010): article no. 11, doi: 10.1186/1756-0381-3-11.</p
    corecore