133 research outputs found

    eGenomics: Cataloguing our complete genome collection III

    Get PDF
    This meeting report summarizes the proceedings of the “eGenomics: Cataloguing our Complete Genome Collection III” workshop held September 11–13, 2006, at the National Institute for Environmental eScience (NIEeS), Cambridge, United Kingdom. This 3rd workshop of the Genomic Standards Consortium was divided into two parts. The first half of the three-day workshop was dedicated to reviewing the genomic diversity of our current and future genome and metagenome collection, and exploring linkages to a series of existing projects through formal presentations. The second half was dedicated to strategic discussions. Outcomes of the workshop include a revised “Minimum Information about a Genome Sequence” (MIGS) specification (v1.1), consensus on a variety of features to be added to the Genome Catalogue (GCat), agreement by several researchers to adopt MIGS for imminent genome publications, and an agreement by the EBI and NCBI to input their genome collections into GCat for the purpose of quantifying the amount of optional data already available (e.g., for geographic location coordinates) and working towards a single, global list of all public genomes and metagenomes

    GlyGly-CTERM and Rhombosortase: A C-Terminal Protein Processing Signal in a Many-to-One Pairing with a Rhomboid Family Intramembrane Serine Protease

    Get PDF
    The rhomboid family of serine proteases occurs in all domains of life. Its members contain at least six hydrophobic membrane-spanning helices, with an active site serine located deep within the hydrophobic interior of the plasma membrane. The model member GlpG from Escherichia coli is heavily studied through engineered mutant forms, varied model substrates, and multiple X-ray crystal studies, yet its relationship to endogenous substrates is not well understood. Here we describe an apparent membrane anchoring C-terminal homology domain that appears in numerous genera including Shewanella, Vibrio, Acinetobacter, and Ralstonia, but excluding Escherichia and Haemophilus. Individual genomes encode up to thirteen members, usually homologous to each other only in this C-terminal region. The domain's tripartite architecture consists of motif, transmembrane helix, and cluster of basic residues at the protein C-terminus, as also seen with the LPXTG recognition sequence for sortase A and the PEP-CTERM recognition sequence for exosortase. Partial Phylogenetic Profiling identifies a distinctive rhomboid-like protease subfamily almost perfectly co-distributed with this recognition sequence. This protease subfamily and its putative target domain are hereby renamed rhombosortase and GlyGly-CTERM, respectively. The protease and target are encoded by consecutive genes in most genomes with just a single target, but far apart otherwise. The signature motif of the Rhombo-CTERM domain, often SGGS, only partially resembles known cleavage sites of rhomboid protease family model substrates. Some protein families that have several members with C-terminal GlyGly-CTERM domains also have additional members with LPXTG or PEP-CTERM domains instead, suggesting there may be common themes to the post-translational processing of these proteins by three different membrane protein superfamilies

    Bioinformatic evidence for a widely distributed, ribosomally produced electron carrier precursor, its maturation proteins, and its nicotinoprotein redox partners

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Enzymes in the radical SAM (rSAM) domain family serve in a wide variety of biological processes, including RNA modification, enzyme activation, bacteriocin core peptide maturation, and cofactor biosynthesis. Evolutionary pressures and relationships to other cellular constituents impose recognizable grammars on each class of rSAM-containing system, shaping patterns in results obtained through various comparative genomics analyses.</p> <p>Results</p> <p>An uncharacterized gene cluster found in many Actinobacteria and sporadically in Firmicutes, Chloroflexi, Deltaproteobacteria, and one Archaeal plasmid contains a PqqE-like rSAM protein family that includes Rv0693 from <it>Mycobacterium tuberculosis</it>. Members occur clustered with a strikingly well-conserved small polypeptide we designate "mycofactocin," similar in size to bacteriocins and PqqA, precursor of pyrroloquinoline quinone (PQQ). Partial Phylogenetic Profiling (PPP) based on the distribution of these markers identifies the mycofactocin cluster, but also a second tier of high-scoring proteins. This tier, strikingly, is filled with up to thirty-one members per genome from three variant subfamilies that occur, one each, in three unrelated classes of nicotinoproteins. The pattern suggests these variant enzymes require not only NAD(P), but also the novel gene cluster. Further study was conducted using SIMBAL, a PPP-like tool, to search these nicotinoproteins for subsequences best correlated across multiple genomes to the presence of mycofactocin. For both the short chain dehydrogenase/reductase (SDR) and iron-containing dehydrogenase families, aligning SIMBAL's top-scoring sequences to homologous solved crystal structures shows signals centered over NAD(P)-binding sites rather than over substrate-binding or active site residues. Previous studies on some of these proteins have revealed a non-exchangeable NAD cofactor, such that enzymatic activity <it>in vitro </it>requires an artificial electron acceptor such as N,N-dimethyl-4-nitrosoaniline (NDMA) for the enzyme to cycle.</p> <p>Conclusions</p> <p>Taken together, these findings suggest that the mycofactocin precursor is modified by the Rv0693 family rSAM protein and other enzymes in its cluster. It becomes an electron carrier molecule that serves <it>in vivo </it>as NDMA and other artificial electron acceptors do <it>in vitro</it>. Subclasses from three different nicotinoprotein families show "only-if" relationships to mycofactocin because they require its presence. This framework suggests a segregated redox pool in which mycofactocin mediates communication among enzymes with non-exchangeable cofactors.</p

    The B6 database: a tool for the description and classification of vitamin B6-dependent enzymatic activities and of the corresponding protein families

    Get PDF
    BACKGROUND: Enzymes that depend on vitamin B6 (and in particular on its metabolically active form, pyridoxal 5'-phosphate, PLP) are of great relevance to biology and medicine, as they catalyze a wide variety of biochemical reactions mainly involving amino acid substrates. Although PLP-dependent enzymes belong to a small number of independent evolutionary lineages, they encompass more than 160 distinct catalytic functions, thus representing a striking example of divergent evolution. The importance and remarkable versatility of these enzymes, as well as the difficulties in their functional classification, create a need for an integrated source of information about them. DESCRIPTION: The B6 database http://bioinformatics.unipr.it/B6db contains documented B6-dependent activities and the relevant protein families, defined as monophyletic groups of sequences possessing the same enzymatic function. One or more families were associated to each of 121 PLP-dependent activities with known sequences. Hidden Markov models (HMMs) were built from family alignments and incorporated in the database. These HMMs can be used for the functional classification of PLP-dependent enzymes in genomic sets of predicted protein sequences. An example of such analyses (a census of human genes coding for PLP-dependent enzymes) is provided here, whereas many more are accessible through the database itself. CONCLUSION: The B6 database is a curated repository of biochemical and molecular information about an important group of enzymes. This information is logically organized and available for computational analyses, providing a key resource for the identification, classification and comparative analysis of B6-dependent enzymes

    InterPro, progress and status in 2005

    Get PDF
    InterPro, an integrated documentation resource of protein families, domains and functional sites, was created to integrate the major protein signature databases. Currently, it includes PROSITE, Pfam, PRINTS, ProDom, SMART, TIGRFAMs, PIRSF and SUPERFAMILY. Signatures are manually integrated into InterPro entries that are curated to provide biological and functional information. Annotation is provided in an abstract, Gene Ontology mapping and links to specialized databases. New features of InterPro include extended protein match views, taxonomic range information and protein 3D structure data. One of the new match views is the InterPro Domain Architecture view, which shows the domain composition of protein matches. Two new entry types were introduced to better describe InterPro entries: these are active site and binding site. PIRSF and the structure-based SUPERFAMILY are the latest member databases to join InterPro, and CATH and PANTHER are soon to be integrated. InterPro release 8.0 contains 11 007 entries, representing 2573 domains, 8166 families, 201 repeats, 26 active sites, 21 binding sites and 20 post-translational modification sites. InterPro covers over 78% of all proteins in the Swiss-Prot and TrEMBL components of UniProt. The database is available for text- and sequence-based searches via a webserver (http://www.ebi.ac.uk/interpro), and for download by anonymous FTP (ftp://ftp.ebi.ac.uk/pub/databases/interpro)

    CloVR: A virtual machine for automated and portable sequence analysis from the desktop using cloud computing

    Get PDF
    Next-generation sequencing technologies have decentralized sequence acquisition, increasing the demand for new bioinformatics tools that are easy to use, portable across multiple platforms, and scalable for high-throughput applications. Cloud computing platforms provide on-demand access to computing infrastructure over the Internet and can be used in combination with custom built virtual machines to distribute pre-packaged with pre-configured software. We describe the Cloud Virtual Resource, CloVR, a new desktop application for push-button automated sequence analysis that can utilize cloud computing resources. CloVR is implemented as a single portable virtual machine (VM) that provides several automated analysis pipelines for microbial genomics, including 16S, whole genome and metagenome sequence analysis. The CloVR VM runs on a personal computer, utilizes local computer resources and requires minimal installation, addressing key challenges in deploying bioinformatics workflows. In addition CloVR supports use of remote cloud computing resources to improve performance for large-scale sequence processing. In a case study, we demonstrate the use of CloVR to automatically process next-generation sequencing data on multiple cloud computing platforms. The CloVR VM and associated architecture lowers the barrier of entry for utilizing complex analysis protocols on both local single- and multi-core computers and cloud systems for high throughput data processing.https://doi.org/10.1186/1471-2105-12-35

    InterPro: the integrative protein signature database

    Get PDF
    The InterPro database (http://www.ebi.ac.uk/interpro/) integrates together predictive models or ‘signatures' representing protein domains, families and functional sites from multiple, diverse source databases: Gene3D, PANTHER, Pfam, PIRSF, PRINTS, ProDom, PROSITE, SMART, SUPERFAMILY and TIGRFAMs. Integration is performed manually and approximately half of the total ∼58 000 signatures available in the source databases belong to an InterPro entry. Recently, we have started to also display the remaining un-integrated signatures via our web interface. Other developments include the provision of non-signature data, such as structural data, in new XML files on our FTP site, as well as the inclusion of matchless UniProtKB proteins in the existing match XML files. The web interface has been extended and now links out to the ADAN predicted protein-protein interaction database and the SPICE and Dasty viewers. The latest public release (v18.0) covers 79.8% of UniProtKB (v14.1) and consists of 16 549 entries. InterPro data may be accessed either via the web address above, via web services, by downloading files by anonymous FTP or by using the InterProScan search software (http://www.ebi.ac.uk/Tools/InterProScan/

    Comparative Genomics of Emerging Human Ehrlichiosis Agents

    Get PDF
    Anaplasma (formerly Ehrlichia) phagocytophilum, Ehrlichia chaffeensis, and Neorickettsia (formerly Ehrlichia) sennetsu are intracellular vector-borne pathogens that cause human ehrlichiosis, an emerging infectious disease. We present the complete genome sequences of these organisms along with comparisons to other organisms in the Rickettsiales order. Ehrlichia spp. and Anaplasma spp. display a unique large expansion of immunodominant outer membrane proteins facilitating antigenic variation. All Rickettsiales have a diminished ability to synthesize amino acids compared to their closest free-living relatives. Unlike members of the Rickettsiaceae family, these pathogenic Anaplasmataceae are capable of making all major vitamins, cofactors, and nucleotides, which could confer a beneficial role in the invertebrate vector or the vertebrate host. Further analysis identified proteins potentially involved in vacuole confinement of the Anaplasmataceae, a life cycle involving a hematophagous vector, vertebrate pathogenesis, human pathogenesis, and lack of transovarial transmission. These discoveries provide significant insights into the biology of these obligate intracellular pathogens
    corecore