108 research outputs found

    Mariprofundus ferrooxydans PV-1 the First Genome of a Marine Fe(II) Oxidizing Zetaproteobacterium

    Get PDF
    © The Author(s), 2011. This article is distributed under the terms of the Creative Commons Attribution License. The definitive version was published in PLoS One 6 (2011): e25386, doi:10.1371/journal.pone.0025386.Mariprofundus ferrooxydans PV-1 has provided the first genome of the recently discovered Zetaproteobacteria subdivision. Genome analysis reveals a complete TCA cycle, the ability to fix CO2, carbon-storage proteins and a sugar phosphotransferase system (PTS). The latter could facilitate the transport of carbohydrates across the cell membrane and possibly aid in stalk formation, a matrix composed of exopolymers and/or exopolysaccharides, which is used to store oxidized iron minerals outside the cell. Two-component signal transduction system genes, including histidine kinases, GGDEF domain genes, and response regulators containing CheY-like receivers, are abundant and widely distributed across the genome. Most of these are located in close proximity to genes required for cell division, phosphate uptake and transport, exopolymer and heavy metal secretion, flagellar biosynthesis and pilus assembly suggesting that these functions are highly regulated. Similar to many other motile, microaerophilic bacteria, genes encoding aerotaxis as well as antioxidant functionality (e.g., superoxide dismutases and peroxidases) are predicted to sense and respond to oxygen gradients, as would be required to maintain cellular redox balance in the specialized habitat where M. ferrooxydans resides. Comparative genomics with other Fe(II) oxidizing bacteria residing in freshwater and marine environments revealed similar content, synteny, and amino acid similarity of coding sequences potentially involved in Fe(II) oxidation, signal transduction and response regulation, oxygen sensation and detoxification, and heavy metal resistance. This study has provided novel insights into the molecular nature of Zetaproteobacteria.Funding has been provided by the NSF Microbial Observatories Program (KJE, DE), NSF’s Science and Technology Program, by the Gordon and Betty Moore Foundation (KJE), the College of Letters, Arts, and Sciences at the University of Southern California (KJE), and by the NASA Astrobiology Institute (KJE, DE). Advanced Light Source analyses at the Lawrence Berkeley National Lab are supported by the Office of Science, Basic Energy Sciences, Division of Materials Science of the United States Department of Energy (DE-AC02-05CH11231)

    Algebraic Comparison of Partial Lists in Bioinformatics

    Get PDF
    The outcome of a functional genomics pipeline is usually a partial list of genomic features, ranked by their relevance in modelling biological phenotype in terms of a classification or regression model. Due to resampling protocols or just within a meta-analysis comparison, instead of one list it is often the case that sets of alternative feature lists (possibly of different lengths) are obtained. Here we introduce a method, based on the algebraic theory of symmetric groups, for studying the variability between lists ("list stability") in the case of lists of unequal length. We provide algorithms evaluating stability for lists embedded in the full feature set or just limited to the features occurring in the partial lists. The method is demonstrated first on synthetic data in a gene filtering task and then for finding gene profiles on a recent prostate cancer dataset

    Interplay of DNA supercoiling and catenation during the segregation of sister duplexes

    Get PDF
    The discrete regulation of supercoiling, catenation and knotting by DNA topoisomerases is well documented both in vivo and in vitro, but the interplay between them is still poorly understood. Here we studied DNA catenanes of bacterial plasmids arising as a result of DNA replication in Escherichia coli cells whose topoisomerase IV activity was inhibited. We combined high-resolution two-dimensional agarose gel electrophoresis with numerical simulations in order to better understand the relationship between the negative supercoiling of DNA generated by DNA gyrase and the DNA interlinking resulting from replication of circular DNA molecules. We showed that in those replication intermediates formed in vivo, catenation and negative supercoiling compete with each other. In interlinked molecules with high catenation numbers negative supercoiling is greatly limited. However, when interlinking decreases, as required for the segregation of newly replicated sister duplexes, their negative supercoiling increases. This observation indicates that negative supercoiling plays an active role during progressive decatenation of newly replicated DNA molecules in vivo

    The Characterisation of Three Types of Genes that Overlie Copy Number Variable Regions

    Get PDF
    Background: Due to the increased accuracy of Copy Number Variable region (CNV) break point mapping, it is now possible to say with a reasonable degree of confidence whether a gene (i) falls entirely within a CNV; (ii) overlaps the CNV or (iii) actually contains the CNV. We classify these as type I, II and III CNV genes respectively. Principal Findings: Here we show that although type I genes vary in copy number along with the CNV, most of these type I genes have the same expression levels as wild type copy numbers of the gene. These genes must, therefore, be under homeostatic dosage compensation control. Looking into possible mechanisms for the regulation of gene expression we found that type I genes have a significant paucity of genes regulated by miRNAs and are not significantly enriched for monoallelically expressed genes. Type III genes, on the other hand, have a significant excess of genes regulated by miRNAs and are enriched for genes that are monoallelically expressed. Significance: Many diseases and genomic disorders are associated with CNVs so a better understanding of the different ways genes are associated with normal CNVs will help focus on candidate genes in genome wide association studies

    CAGO: A Software Tool for Dynamic Visual Comparison and Correlation Measurement of Genome Organization

    Get PDF
    CAGO (Comparative Analysis of Genome Organization) is developed to address two critical shortcomings of conventional genome atlas plotters: lack of dynamic exploratory functions and absence of signal analysis for genomic properties. With dynamic exploratory functions, users can directly manipulate chromosome tracks of a genome atlas and intuitively identify distinct genomic signals by visual comparison. Signal analysis of genomic properties can further detect inconspicuous patterns from noisy genomic properties and calculate correlations between genomic properties across various genomes. To implement dynamic exploratory functions, CAGO presents each genome atlas in Scalable Vector Graphics (SVG) format and allows users to interact with it using a SVG viewer through JavaScript. Signal analysis functions are implemented using R statistical software and a discrete wavelet transformation package waveslim. CAGO is not only a plotter for generating complex genome atlases, but also a platform for exploring genome atlases with dynamic exploratory functions for visual comparison and with signal analysis for comparing genomic properties across multiple organisms. The web-based application of CAGO, its source code, user guides, video demos, and live examples are publicly available and can be accessed at http://cbs.ym.edu.tw/cago

    Simplivariate Models: Uncovering the Underlying Biology in Functional Genomics Data

    Get PDF
    One of the first steps in analyzing high-dimensional functional genomics data is an exploratory analysis of such data. Cluster Analysis and Principal Component Analysis are then usually the method of choice. Despite their versatility they also have a severe drawback: they do not always generate simple and interpretable solutions. On the basis of the observation that functional genomics data often contain both informative and non-informative variation, we propose a method that finds sets of variables containing informative variation. This informative variation is subsequently expressed in easily interpretable simplivariate components

    Exploiting Nucleotide Composition to Engineer Promoters

    Get PDF
    The choice of promoter is a critical step in optimizing the efficiency and stability of recombinant protein production in mammalian cell lines. Artificial promoters that provide stable expression across cell lines and can be designed to the desired strength constitute an alternative to the use of viral promoters. Here, we show how the nucleotide characteristics of highly active human promoters can be modelled via the genome-wide frequency distribution of short motifs: by overlapping motifs that occur infrequently in the genome, we constructed contiguous sequence that is rich in GC and CpGs, both features of known promoters, but lacking homology to real promoters. We show that snippets from this sequence, at 100 base pairs or longer, drive gene expression in vitro in a number of mammalian cells, and are thus candidates for use in protein production. We further show that expression is driven by the general transcription factors TFIIB and TFIID, both being ubiquitously present across cell types, which results in less tissue- and species-specific regulation compared to the viral promoter SV40. We lastly found that the strength of a promoter can be tuned up and down by modulating the counts of GC and CpGs in localized regions. These results constitute a “proof-of-concept” for custom-designing promoters that are suitable for biotechnological and medical applications

    Technical Variability Is Greater than Biological Variability in a Microarray Experiment but Both Are Outweighed by Changes Induced by Stimulation

    Get PDF
    INTRODUCTION: A central issue in the design of microarray-based analysis of global gene expression is that variability resulting from experimental processes may obscure changes resulting from the effect being investigated. This study quantified the variability in gene expression at each level of a typical in vitro stimulation experiment using human peripheral blood mononuclear cells (PBMC). The primary objective was to determine the magnitude of biological and technical variability relative to the effect being investigated, namely gene expression changes resulting from stimulation with lipopolysaccharide (LPS). METHODS AND RESULTS: Human PBMC were stimulated in vitro with LPS, with replication at 5 levels: 5 subjects each on 2 separate days with technical replication of LPS stimulation, amplification and hybridisation. RNA from samples stimulated with LPS and unstimulated samples were hybridised against common reference RNA on oligonucleotide microarrays. There was a closer correlation in gene expression between replicate hybridisations (0.86-0.93) than between different subjects (0.66-0.78). Deconstruction of the variability at each level of the experimental process showed that technical variability (standard deviation (SD) 0.16) was greater than biological variability (SD 0.06), although both were low (SD<0.1 for all individual components). There was variability in gene expression both at baseline and after stimulation with LPS and proportion of cell subsets in PBMC was likely partly responsible for this. However, gene expression changes after stimulation with LPS were much greater than the variability from any source, either individually or combined. CONCLUSIONS: Variability in gene expression was very low and likely to improve further as technical advances are made. The finding that stimulation with LPS has a markedly greater effect on gene expression than the degree of variability provides confidence that microarray-based studies can be used to detect changes in gene expression of biological interest in infectious diseases

    pcaGoPromoter - An R Package for Biological and Regulatory Interpretation of Principal Components in Genome-Wide Gene Expression Data

    Get PDF
    Analyzing data obtained from genome-wide gene expression experiments is challenging due to the quantity of variables, the need for multivariate analyses, and the demands of managing large amounts of data. Here we present the R package pcaGoPromoter, which facilitates the interpretation of genome-wide expression data and overcomes the aforementioned problems. In the first step, principal component analysis (PCA) is applied to survey any differences between experiments and possible groupings. The next step is the interpretation of the principal components with respect to both biological function and regulation by predicted transcription factor binding sites. The robustness of the results is evaluated using cross-validation, and illustrative plots of PCA scores and gene ontology terms are available. pcaGoPromoter works with any platform that uses gene symbols or Entrez IDs as probe identifiers. In addition, support for several popular Affymetrix GeneChip platforms is provided. To illustrate the features of the pcaGoPromoter package a serum stimulation experiment was performed and the genome-wide gene expression in the resulting samples was profiled using the Affymetrix Human Genome U133 Plus 2.0 chip. Array data were analyzed using pcaGoPromoter package tools, resulting in a clear separation of the experiments into three groups: controls, serum only and serum with inhibitor. Functional annotation of the axes in the PCA score plot showed the expected serum-promoted biological processes, e.g., cell cycle progression and the predicted involvement of expected transcription factors, including E2F. In addition, unexpected results, e.g., cholesterol synthesis in serum-depleted cells and NF-κB activation in inhibitor treated cells, were noted. In summary, the pcaGoPromoter R package provides a collection of tools for analyzing gene expression data. These tools give an overview of the input data via PCA, functional interpretation by gene ontology terms (biological processes), and an indication of the involvement of possible transcription factors
    corecore