64 research outputs found

    Inferring a Transcriptional Regulatory Network from Gene Expression Data Using Nonlinear Manifold Embedding

    Get PDF
    Transcriptional networks consist of multiple regulatory layers corresponding to the activity of global regulators, specialized repressors and activators of transcription as well as proteins and enzymes shaping the DNA template. Such intrinsic multi-dimensionality makes uncovering connectivity patterns difficult and unreliable and it calls for adoption of methodologies commensurate with the underlying organization of the data source. Here we present a new computational method that predicts interactions between transcription factors and target genes using a compendium of microarray gene expression data and the knowledge of known interactions between genes and transcription factors. The proposed method called Kernel Embedding of REgulatory Networks (KEREN) is based on the concept of gene-regulon association and it captures hidden geometric patterns of the network via manifold embedding. We applied KEREN to reconstruct gene regulatory interactions in the model bacteria E.coli on a genome-wide scale. Our method not only yields accurate prediction of verifiable interactions, which outperforms on certain metrics comparable methodologies, but also demonstrates the utility of a geometric approach to the analysis of high-dimensional biological data. We also describe the general application of kernel embedding techniques to some other function and network discovery algorithms

    S-MART, A Software Toolbox to Aid RNA-seq Data Analysis

    Get PDF
    High-throughput sequencing is now routinely performed in many experiments. But the analysis of the millions of sequences generated, is often beyond the expertise of the wet labs who have no personnel specializing in bioinformatics. Whereas several tools are now available to map high-throughput sequencing data on a genome, few of these can extract biological knowledge from the mapped reads. We have developed a toolbox called S-MART, which handles mapped RNA-Seq data. S-MART is an intuitive and lightweight tool which performs many of the tasks usually required for the analysis of mapped RNA-Seq reads. S-MART does not require any computer science background and thus can be used by all of the biologist community through a graphical interface. S-MART can run on any personal computer, yielding results within an hour even for Gb of data for most queries. S-MART may perform the entire analysis of the mapped reads, without any need for other ad hoc scripts. With this tool, biologists can easily perform most of the analyses on their computer for their RNA-Seq data, from the mapped data to the discovery of important loci

    Reconstruction of Escherichia coli transcriptional regulatory networks via regulon-based associations

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Network reconstruction methods that rely on covariance of expression of transcription regulators and their targets ignore the fact that transcription of regulators and their targets can be controlled differently and/or independently. Such oversight would result in many erroneous predictions. However, accurate prediction of gene regulatory interactions can be made possible through modeling and estimation of transcriptional activity of groups of co-regulated genes.</p> <p>Results</p> <p>Incomplete regulatory connectivity and expression data are used here to construct a consensus network of transcriptional regulation in <it>Escherichia coli </it>(<it>E. coli</it>). The network is updated via a covariance model describing the activity of gene sets controlled by common regulators. The proposed model-selection algorithm was used to annotate the likeliest regulatory interactions in <it>E. coli </it>on the basis of two independent sets of expression data, each containing many microarray experiments under a variety of conditions. The key regulatory predictions have been verified by an experiment and literature survey. In addition, the estimated activity profiles of transcription factors were used to describe their responses to environmental and genetic perturbations as well as drug treatments.</p> <p>Conclusion</p> <p>Information about transcriptional activity of documented co-regulated genes (a core regulon) should be sufficient for discovering new target genes, whose transcriptional activities significantly co-vary with the activity of the core regulon members. Our ability to derive a highly significant consensus network by applying the regulon-based approach to two very different data sets demonstrated the efficiency of this strategy. We believe that this approach can be used to reconstruct gene regulatory networks of other organisms for which partial sets of known interactions are available.</p

    Operon information improves gene expression estimation for cDNA microarrays

    Get PDF
    BACKGROUND: In prokaryotic genomes, genes are organized in operons, and the genes within an operon tend to have similar levels of expression. Because of co-transcription of genes within an operon, borrowing information from other genes within the same operon can improve the estimation of relative transcript levels; the estimation of relative levels of transcript abundances is one of the most challenging tasks in experimental genomics due to the high noise level in microarray data. Therefore, techniques that can improve such estimations, and moreover are based on sound biological premises, are expected to benefit the field of microarray data analysis RESULTS: In this paper, we propose a hierarchical Bayesian model, which relies on borrowing information from other genes within the same operon, to improve the estimation of gene expression levels and, hence, the detection of differentially expressed genes. The simulation studies and the analysis of experiential data demonstrated that the proposed method outperformed other techniques that are routinely used to estimate transcript levels and detect differentially expressed genes, including the sample mean and SAM t statistics. The improvement became more significant as the noise level in microarray data increases. CONCLUSION: By borrowing information about transcriptional activity of genes within classified operons, we improved the estimation of gene expression levels and the detection of differentially expressed genes

    A Case Study on Choosing Normalization Methods and Test Statistics for Two-Channel Microarray Data

    Get PDF
    DNA microarray analysis is a biological technology which permits the whole genome to be monitored simultaneously on a single slide. Microarray technology not only opens an exciting research area for biologists, but also provides significant new challenges to statisticians. Two very common questions in the analysis of microarray data are, first, should we normalize arrays to remove potential systematic biases, and if so, what normalization method should we use? Second, how should we then implement tests of statistical significance? Straightforward and uniform answers to these questions remain elusive. In this paper, we use a real data example to illustrate a practical approach to addressing these questions. Our data is taken from a DNA–protein binding microarray experiment aimed at furthering our understanding of transcription regulation mechanisms, one of the most important issues in biology. For the purpose of preprocessing data, we suggest looking at descriptive plots first to decide whether we need preliminary normalization and, if so, how this should be accomplished. For subsequent comparative inference, we recommend use of an empirical Bayes method (the B statistic), since it performs much better than traditional methods, such as the sample mean (M statistic) and Student's t statistic, and it is also relatively easy to compute and explain compared to the others. The false discovery rate (FDR) is used to evaluate the different methods, and our comparative results lend support to our above suggestions

    Persisters: a distinct physiological state of E. coli

    Get PDF
    BACKGROUND: Bacterial populations contain persisters, phenotypic variants that constitute approximately 1% of cells in stationary phase and biofilm cultures. Multidrug tolerance of persisters is largely responsible for the inability of antibiotics to completely eradicate infections. Recent progress in understanding persisters is encouraging, but the main obstacle in understanding their nature was our inability to isolate these elusive cells from a wild-type population since their discovery in 1944. RESULTS: We hypothesized that persisters are dormant cells with a low level of translation, and used this to physically sort dim E. coli cells which do not contain sufficient amounts of unstable GFP expressed from a promoter whose activity depends on the growth rate. The dim cells were tolerant to antibiotics and exhibited a gene expression profile distinctly different from those observed for cells in exponential or stationary phases. Genes coding for toxin-antitoxin module proteins were expressed in persisters and are likely contributors to this condition. CONCLUSION: We report a method for persister isolation and conclude that these cells represent a distinct state of bacterial physiology

    Mariprofundus ferrooxydans PV-1 the First Genome of a Marine Fe(II) Oxidizing Zetaproteobacterium

    Get PDF
    © The Author(s), 2011. This article is distributed under the terms of the Creative Commons Attribution License. The definitive version was published in PLoS One 6 (2011): e25386, doi:10.1371/journal.pone.0025386.Mariprofundus ferrooxydans PV-1 has provided the first genome of the recently discovered Zetaproteobacteria subdivision. Genome analysis reveals a complete TCA cycle, the ability to fix CO2, carbon-storage proteins and a sugar phosphotransferase system (PTS). The latter could facilitate the transport of carbohydrates across the cell membrane and possibly aid in stalk formation, a matrix composed of exopolymers and/or exopolysaccharides, which is used to store oxidized iron minerals outside the cell. Two-component signal transduction system genes, including histidine kinases, GGDEF domain genes, and response regulators containing CheY-like receivers, are abundant and widely distributed across the genome. Most of these are located in close proximity to genes required for cell division, phosphate uptake and transport, exopolymer and heavy metal secretion, flagellar biosynthesis and pilus assembly suggesting that these functions are highly regulated. Similar to many other motile, microaerophilic bacteria, genes encoding aerotaxis as well as antioxidant functionality (e.g., superoxide dismutases and peroxidases) are predicted to sense and respond to oxygen gradients, as would be required to maintain cellular redox balance in the specialized habitat where M. ferrooxydans resides. Comparative genomics with other Fe(II) oxidizing bacteria residing in freshwater and marine environments revealed similar content, synteny, and amino acid similarity of coding sequences potentially involved in Fe(II) oxidation, signal transduction and response regulation, oxygen sensation and detoxification, and heavy metal resistance. This study has provided novel insights into the molecular nature of Zetaproteobacteria.Funding has been provided by the NSF Microbial Observatories Program (KJE, DE), NSF’s Science and Technology Program, by the Gordon and Betty Moore Foundation (KJE), the College of Letters, Arts, and Sciences at the University of Southern California (KJE), and by the NASA Astrobiology Institute (KJE, DE). Advanced Light Source analyses at the Lawrence Berkeley National Lab are supported by the Office of Science, Basic Energy Sciences, Division of Materials Science of the United States Department of Energy (DE-AC02-05CH11231)

    Cross-Platform Comparison of Microarray-Based Multiple-Class Prediction

    Get PDF
    High-throughput microarray technology has been widely applied in biological and medical decision-making research during the past decade. However, the diversity of platforms has made it a challenge to re-use and/or integrate datasets generated in different experiments or labs for constructing array-based diagnostic models. Using large toxicogenomics datasets generated using both Affymetrix and Agilent microarray platforms, we carried out a benchmark evaluation of cross-platform consistency in multiple-class prediction using three widely-used machine learning algorithms. After an initial assessment of model performance on different platforms, we evaluated whether predictive signature features selected in one platform could be directly used to train a model in the other platform and whether predictive models trained using data from one platform could predict datasets profiled using the other platform with comparable performance. Our results established that it is possible to successfully apply multiple-class prediction models across different commercial microarray platforms, offering a number of important benefits such as accelerating the possible translation of biomarkers identified with microarrays to clinically-validated assays. However, this investigation focuses on a technical platform comparison and is actually only the beginning of exploring cross-platform consistency. Further studies are needed to confirm the feasibility of microarray-based cross-platform prediction, especially using independent datasets

    The Characterisation of Three Types of Genes that Overlie Copy Number Variable Regions

    Get PDF
    Background: Due to the increased accuracy of Copy Number Variable region (CNV) break point mapping, it is now possible to say with a reasonable degree of confidence whether a gene (i) falls entirely within a CNV; (ii) overlaps the CNV or (iii) actually contains the CNV. We classify these as type I, II and III CNV genes respectively. Principal Findings: Here we show that although type I genes vary in copy number along with the CNV, most of these type I genes have the same expression levels as wild type copy numbers of the gene. These genes must, therefore, be under homeostatic dosage compensation control. Looking into possible mechanisms for the regulation of gene expression we found that type I genes have a significant paucity of genes regulated by miRNAs and are not significantly enriched for monoallelically expressed genes. Type III genes, on the other hand, have a significant excess of genes regulated by miRNAs and are enriched for genes that are monoallelically expressed. Significance: Many diseases and genomic disorders are associated with CNVs so a better understanding of the different ways genes are associated with normal CNVs will help focus on candidate genes in genome wide association studies
    corecore