40 research outputs found

    How many signal peptides are there in bacteria?

    No full text
    Over the last 5 years proteogenomics (using mass spectroscopy to identify proteins predicted from genomic sequences) has emerged as a promising approach to the high-throughput identification of protein N-termini, which remains a problem in genome annotation. Comparison of the experimentally determined N-termini with those predicted by sequence analysis tools allows identification of the signal peptides and therefore conclusions on the cytoplasmic or extracytoplasmic (periplasmic or extracellular) localization of the respective proteins. We present here the results of a proteogenomic study of the signal peptides in Escherichia coliK-12 and compare its results with the available experimental data and predictions by such software tools as SignalP and Phobius. A single proteogenomics experiment recovered more than a third of all signal peptides that had been experimentally determined during the past three decades and confirmed at least 31 additional signal peptides, mostly in the known exported proteins, which had been previously predicted but not validated. The filtering of putative signal peptides for the peptide length and the presence of an eight-residue hydrophobic patch and a typical signal peptidase cleavage site proved sufficient to eliminate the false-positive hits. Surprisingly, the results of this proteogenomics study, as well as a re-analysis of the E.coli genome with the latest version of SignalP program, show that the fraction of proteins containing signal peptides is only about 10%, or half of previous estimates

    Hydrogenobyrinic acid a,c-diamide synthase (glutamine-hydrolysing)

    No full text

    Comparative genomics of cyclic di-GMP metabolism and chemosensory pathways in Shewanella algae strains: novel bacterial sensory domains and functional insights into lifestyle regulation

    No full text
    Shewanella spp. play important ecological and biogeochemical roles, due in part to their versatile metabolism and swift integration of stimuli. While Shewanella spp. are primarily considered environmental microbes, Shewanella algae is increasingly recognized as an occasional human pathogen. S. algae shares the broad metabolic and respiratory repertoire of Shewanella spp. and thrives in similar ecological niches. In S. algae, nitrate and dimethyl sulfoxide (DMSO) respiration promote biofilm formation strain specifically, with potential implication of taxis and cyclic diguanosine monophosphate (c-di-GMP) signaling. Signal transduction systems in S. algae have not been investigated. To fill these knowledge gaps, we provide here an inventory of the c-di-GMP turnover proteome and chemosensory networks of the type strain S. algae CECT 5071 and compare them with those of 41 whole-genome-sequenced clinical and environmental S. algae isolates. Besides comparative analysis of genetic content and identification of laterally transferred genes, the occurrence and topology of c-di-GMP turnover proteins and chemoreceptors were analyzed. We found S. algae strains to encode 61 to 67 c-di-GMP turnover proteins and 28 to 31 chemoreceptors, placing S. algae near the top in terms of these signaling capacities per Mbp of genome. Most c-di-GMP turnover proteins were predicted to be catalytically active; we describe in them six novel N-terminal sensory domains that appear to control their catalytic activity. Overall, our work defines the c-di-GMP and chemosensory signal transduction pathways in S. algae, contributing to a better understanding of its ecophysiology and establishing S. algae as an auspicious model for the analysis of metabolic and signaling pathways within the genus Shewanella.Funding for this study was provided by grants from Stiftelsen Lars Hiertas Minne (grant FO2019‐0293), Stiftelsen Längmanska Kulturfonden (grant BA20‐0736), the Karolinska Institute Research Foundation (grant 2020‐01556), Stiftelsen Anna och Gunnar Vidfelts fond för biologisk forskning (grant 2019-051-Vidfelts fond/SOJOH) and the Hans Dahlbergs Stiftelse för miljö och hälsa to A.J.M.-R. M.Y.G. was supported by the Intramural Research Program of the National Library of Medicine, U.S. National Institutes of Health. T.K. was supported by grants from the Junta de Andalucía (P18-FR-1621) and the Spanish Ministry of Economy and Competitiveness (PID2020-112612GB-I00). U.R. was supported by the Swedish Research Council for Natural Sciences and Engineering (2017-04465) and the Karolinska Institute

    An Identity Crisis in the Life Sciences

    No full text
    Abstract. my Grid is an e-Science project assisting life scientists to build workflows that gather and co-ordinate data from distributed, autonomous, replicated and heterogeneous resources. The provenance logs of workflow executions are recorded as RDF graphs. The log of one workflow run is used to trace the history of its execution process; however, by aggregating provenance logs of workflow reruns, or runs of different workflows, we can gather the provenance of a common data product shared in multiple derivation paths. This aggregation relies on accurate and universal identification of each data product. The nature of bioinformatics data and services, however, makes this difficult. We describe the identity problem in bioinformatics data, and present a protocol for managing identity coreferences and allocating identity to collected and computed data products. The ability to overcome this problem means that the provenance of workflows in bioinformatics and other domains can be themselves exploited to enhance the practice of e-Science.

    An Unsupervised Approach for Acquiring Ontologies and RDF Data from Online Life Science Databases

    No full text
    Abstract. In the Linked Open Data cloud one of the largest data sets, comprising of 2.5 billion triples, is derived from the Life Science domain. Yet this represents a small fraction of the total number of publicly available data sources on the Web. We briefly describe past attempts to transform specific Life Science sources from a plethora of open as well as proprietary formats into RDF data. In particular, we identify and tackle two bottlenecks in current practice: Acquiring ontologies to formally describe these data and creating “RDFizer ” programs to convert data from legacy formats into RDF. We propose an unsupervised method, based on transformation rules, for performing these two key tasks, which makes use of our previous work on unsupervised wrapper induction for extracting labelled data from complete Life Science Web sites. We apply our approach to 13 real-world online Life Science databases. The learned ontologies are evaluated by domain experts as well as against gold standard ontologies. Furthermore, we compare the learned ontologies against ontologies that are “lifted ” directly from the underlying relational schema using an existing unsupervised approach. Finally, we apply our approach to three online databases to extract RDF data. Our results indicate that this approach can be used to bootstrap and speed up the migration of life science data into the Linked Open Data cloud.
    corecore