4 research outputs found

    Data Mining Based on Principal Component Analysis: Application to the Nitric Oxide Response in Escherichia coli

    No full text
    This work evaluates a recently developed multivariate statistical method based on the creation of pseudo or latent variables using principal component analysis (PCA). The application is the data mining of gene expression data to find a small subset of the most important genes in a set of thousand or tens of thousands of genes from a relatively small number of experimental runs. The method was previously developed and evaluated on artificially generated data and real data sets. Its evaluations consisted of its ability to rank the genes against known truth in simulated data studies and to identify known important genes in real data studies. The purpose of the work described here is to identify a ranked set of genes in an experimental study and then for a few of the most highly ranked unverified genes, experimentally verify their importance.This method was evaluated using the transcriptional response of Escherichia coli to treatment with four distinct inhibitory compounds: nitric oxide, S-nitrosoglutathione, serine hydroxamate and potassium cyanide. Our analysis identified genes previously recognized in the response to these compounds and also identified new genes.Three of these new genes, ycbR, yfhA and yahN, were found to significantly (p-values<0.002) affect the sensitivityof E. coli to nitric oxide-mediated growth inhibition. Given that the three genes were not highly ranked in the selected ranked set (RS), these results support strong sensitivity in the ability of the method to successfully identify genes related to challenge by NO and GSNO. This ability to identify genes related to the response to an inhibitory compound is important for engineering tolerance to inhibitory metabolic products, such as biofuels, and utilization of cheap sugar streams, such as biomass-derived sugars or hydrolysate.This article is published as Teh, AiLing, D. S. Layton, Daniel R. Hyduke, Laura R. Jarboe, and D. K. Rollins. "Data Mining Based on Principal Component Analysis: Application to the Nitric Oxide Response in Escherichia coli." Journal of Statistical Science and Application 2 (2014): 1-18. DOI: 10.17265/2328-224X/2014.01.001. Attribution-NonCommercial 4.0 International (CC BY-NC 4.0). Copyright 2014 David Publishing Company. Posted with permission
    corecore