Search CORE

133 research outputs found

ProPhylo: partial phylogenetic profiling to guide protein family construction and assignment of biological process

Author: CJ Stubben
D Barker
D Barker
D Haft
D Szklarczyk
DA Rodionov
Daniel H Haft
DH Haft
DH Haft
DH Haft
EM Marcotte
F Eckstein
F Enault
GV Glazko
H-Y Ou
J Sun
J Wu
J-P Vert
JAG Ranea
JD Selengut
JD Selengut
JD Selengut
Jeremy D Selengut
L Ferrer
M Csurös
M Huynen
M Pellegrini
MA Huynen
Malay K Basu
MS Gelfand
P Pagel
PM Bowers
PR Kensche
PS Dehal
R Jothi
RL Tatusov
S Briesemeister
S Freilich
SR Eddy
SV Date
SV Date
T Blum
T Gaasterland
T Xu
T Yamada
X Brazzolotto
Y Hong
Y Liu
Y Zhou
Z Jiang
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Crossref

Springer - Publisher Connector

PubMed Central

eGenomics: Cataloguing our complete genome collection III

Author: Cochrane G.
Field D.
Garrity G.
Glöckner F.
Gray T.
Kottmann R.
Lister A.
Selengut J.
Sterk P.
Tateno Y.
Tatusova T.
Thomson N.
Vaughan R.
Publication venue
Publication date: 30/04/2007
Field of study

This meeting report summarizes the proceedings of the “eGenomics: Cataloguing our Complete Genome Collection III” workshop held September 11–13, 2006, at the National Institute for Environmental eScience (NIEeS), Cambridge, United Kingdom. This 3rd workshop of the Genomic Standards Consortium was divided into two parts. The first half of the three-day workshop was dedicated to reviewing the genomic diversity of our current and future genome and metagenome collection, and exploring linkages to a series of existing projects through formal presentations. The second half was dedicated to strategic discussions. Outcomes of the workshop include a revised “Minimum Information about a Genome Sequence” (MIGS) specification (v1.1), consensus on a variety of features to be added to the Genome Catalogue (GCat), agreement by several researchers to adopt MIGS for imminent genome publications, and an agreement by the EBI and NCBI to input their genome collections into GCat for the purpose of quantifying the amount of optional data already available (e.g., for geographic location coordinates) and working towards a single, global list of all public genomes and metagenomes

MPG.PuRe

GlyGly-CTERM and Rhombosortase: A C-Terminal Protein Processing Signal in a Many-to-One Pairing with a Rhomboid Family Intramembrane Serine Protease

Author: A Krogh
AH Gaspar
C Meissner
Daniel H. Haft
DH Haft
DH Haft
F Brossier
GE Crooks
JD Bendtsen
JD Selengut
JD Selengut
JD Thompson
K Hofmann
K Strisovsky
LG Stevenson
M Freeman
M Shoji
M Zettl
Maureen J. Donlin
MJ Pallen
Neha Varghese
O Schneewind
RC Edgar
RD Finn
S Urban
S Urban
S Urban
SH Payne
SJ Callister
SR Eddy
Y Sugano
Z Wu
Publication venue: Public Library of Science
Publication date: 14/12/2011
Field of study

The rhomboid family of serine proteases occurs in all domains of life. Its members contain at least six hydrophobic membrane-spanning helices, with an active site serine located deep within the hydrophobic interior of the plasma membrane. The model member GlpG from Escherichia coli is heavily studied through engineered mutant forms, varied model substrates, and multiple X-ray crystal studies, yet its relationship to endogenous substrates is not well understood. Here we describe an apparent membrane anchoring C-terminal homology domain that appears in numerous genera including Shewanella, Vibrio, Acinetobacter, and Ralstonia, but excluding Escherichia and Haemophilus. Individual genomes encode up to thirteen members, usually homologous to each other only in this C-terminal region. The domain's tripartite architecture consists of motif, transmembrane helix, and cluster of basic residues at the protein C-terminus, as also seen with the LPXTG recognition sequence for sortase A and the PEP-CTERM recognition sequence for exosortase. Partial Phylogenetic Profiling identifies a distinctive rhomboid-like protease subfamily almost perfectly co-distributed with this recognition sequence. This protease subfamily and its putative target domain are hereby renamed rhombosortase and GlyGly-CTERM, respectively. The protease and target are encoded by consecutive genes in most genomes with just a single target, but far apart otherwise. The signature motif of the Rhombo-CTERM domain, often SGGS, only partially resembles known cleavage sites of rhomboid protease family model substrates. Some protein families that have several members with C-terminal GlyGly-CTERM domains also have additional members with LPXTG or PEP-CTERM domains instead, suggesting there may be common themes to the post-translational processing of these proteins by three different membrane protein superfamilies

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Bioinformatic evidence for a widely distributed, ribosomally produced electron carrier precursor, its maturation proteins, and its nicotinoprotein redox partners

Author: A Benjdia
A Benjdia
A Bernal
A Norin
Daniel H Haft
DH Haft
DH Haft
DH Haft
HJ Sofia
J Dischinger
JD Selengut
JD Selengut
JJ Meulenberg
JK Yang
JM Kuchenreuther
K Mavromatis
KE Kawulka
M Ibrahim
M Lotierzo
M Perzl
MC Taylor
MJ van der Werf
PR Kensche
PW Van Ophem
R Overbeek
RC Edgar
SF Altschul
SR Piersma
SR Wecksler
SW Lee
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background Enzymes in the radical SAM (rSAM) domain family serve in a wide variety of biological processes, including RNA modification, enzyme activation, bacteriocin core peptide maturation, and cofactor biosynthesis. Evolutionary pressures and relationships to other cellular constituents impose recognizable grammars on each class of rSAM-containing system, shaping patterns in results obtained through various comparative genomics analyses. Results An uncharacterized gene cluster found in many Actinobacteria and sporadically in Firmicutes, Chloroflexi, Deltaproteobacteria, and one Archaeal plasmid contains a PqqE-like rSAM protein family that includes Rv0693 from <it>Mycobacterium tuberculosis</it>. Members occur clustered with a strikingly well-conserved small polypeptide we designate "mycofactocin," similar in size to bacteriocins and PqqA, precursor of pyrroloquinoline quinone (PQQ). Partial Phylogenetic Profiling (PPP) based on the distribution of these markers identifies the mycofactocin cluster, but also a second tier of high-scoring proteins. This tier, strikingly, is filled with up to thirty-one members per genome from three variant subfamilies that occur, one each, in three unrelated classes of nicotinoproteins. The pattern suggests these variant enzymes require not only NAD(P), but also the novel gene cluster. Further study was conducted using SIMBAL, a PPP-like tool, to search these nicotinoproteins for subsequences best correlated across multiple genomes to the presence of mycofactocin. For both the short chain dehydrogenase/reductase (SDR) and iron-containing dehydrogenase families, aligning SIMBAL's top-scoring sequences to homologous solved crystal structures shows signals centered over NAD(P)-binding sites rather than over substrate-binding or active site residues. Previous studies on some of these proteins have revealed a non-exchangeable NAD cofactor, such that enzymatic activity <it>in vitro </it>requires an artificial electron acceptor such as N,N-dimethyl-4-nitrosoaniline (NDMA) for the enzyme to cycle. Conclusions Taken together, these findings suggest that the mycofactocin precursor is modified by the Rv0693 family rSAM protein and other enzymes in its cluster. It becomes an electron carrier molecule that serves <it>in vivo </it>as NDMA and other artificial electron acceptors do <it>in vitro</it>. Subclasses from three different nicotinoprotein families show "only-if" relationships to mycofactocin because they require its presence. This framework suggests a segregated redox pool in which mycofactocin mediates communication among enzymes with non-exchangeable cofactors.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

The B6 database: a tool for the description and classification of vitamin B6-dependent enzymatic activities and of the corresponding protein families

Author: Alessio Peracchi
BW Lepore
ES Lander
F Berkovitch
F Corpet
G Schneider
I Schomburg
J Soding
JD Selengut
JD Thompson
JN Jansonius
K Hanada
K Kanerva
K Katoh
K Koguchi
KA Koch
MD Toney
NV Grishin
P Christen
P Di Giovine
P Shannon
PK Mehta
R Percudani
RA John
Riccardo Percudani
S Donini
S Donini
SF Altschul
SF Altschul
SJ Sammut
SR Eddy
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

BACKGROUND: Enzymes that depend on vitamin B6 (and in particular on its metabolically active form, pyridoxal 5'-phosphate, PLP) are of great relevance to biology and medicine, as they catalyze a wide variety of biochemical reactions mainly involving amino acid substrates. Although PLP-dependent enzymes belong to a small number of independent evolutionary lineages, they encompass more than 160 distinct catalytic functions, thus representing a striking example of divergent evolution. The importance and remarkable versatility of these enzymes, as well as the difficulties in their functional classification, create a need for an integrated source of information about them. DESCRIPTION: The B6 database http://bioinformatics.unipr.it/B6db contains documented B6-dependent activities and the relevant protein families, defined as monophyletic groups of sequences possessing the same enzymatic function. One or more families were associated to each of 121 PLP-dependent activities with known sequences. Hidden Markov models (HMMs) were built from family alignments and incorporated in the database. These HMMs can be used for the functional classification of PLP-dependent enzymes in genomic sets of predicted protein sequences. An example of such analyses (a census of human genes coding for PLP-dependent enzymes) is provided here, whereas many more are accessible through the database itself. CONCLUSION: The B6 database is a curated repository of biochemical and molecular information about an important group of enzymes. This information is logically organized and available for computational analyses, providing a key resource for the identification, classification and comparative analysis of B6-dependent enzymes

Crossref

Archivio istituzionale della Ricerca - Università degli Studi di Parma

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

InterPro, progress and status in 2005

InterPro, an integrated documentation resource of protein families, domains and functional sites, was created to integrate the major protein signature databases. Currently, it includes PROSITE, Pfam, PRINTS, ProDom, SMART, TIGRFAMs, PIRSF and SUPERFAMILY. Signatures are manually integrated into InterPro entries that are curated to provide biological and functional information. Annotation is provided in an abstract, Gene Ontology mapping and links to specialized databases. New features of InterPro include extended protein match views, taxonomic range information and protein 3D structure data. One of the new match views is the InterPro Domain Architecture view, which shows the domain composition of protein matches. Two new entry types were introduced to better describe InterPro entries: these are active site and binding site. PIRSF and the structure-based SUPERFAMILY are the latest member databases to join InterPro, and CATH and PANTHER are soon to be integrated. InterPro release 8.0 contains 11 007 entries, representing 2573 domains, 8166 families, 201 repeats, 26 active sites, 21 binding sites and 20 post-translational modification sites. InterPro covers over 78% of all proteins in the Swiss-Prot and TrEMBL components of UniProt. The database is available for text- and sequence-based searches via a webserver (http://www.ebi.ac.uk/interpro), and for download by anonymous FTP (ftp://ftp.ebi.ac.uk/pub/databases/interpro)

The University of Manchester - Institutional Repository

ProdInra

Hal-Diderot

Archive ouverte UNIGE

Infoscience - École polytechnique fédérale de Lausanne

CiteSeerX

Crossref

INRIA a CCSD electronic archive server

PubMed Central

Open Research Exeter

Oxford University Research Archive

MDC Repository

Explore Bristol Research

CloVR: A virtual machine for automated and portable sequence analysis from the desktop using cloud computing

Author: A Bateman
A Bateman
A Tridgell
Aaron Gussman
AC Stewart
AL Delcher
B Langmead
B Langmead
BE Suzek
C Hemmerich
C Rapier
Cesar Arze
D Field
D Hull
David R Riley
DL Wheeler
DR Zerbino
E Afgan
EE Schadt
F Meyer
J Dean
J Goecks
J Orvis
J White
J White
J White
James R White
JD Selengut
JG Caporaso
JP Mesirov
JR Cole
JR Miller
JR White
JT Dudley
K Galens
K Keahey
K Lagesen
Kevin Galens
LD Stein
M Reich
Mahesh Vangala
Malcolm Matalka
MC Schatz
MC Schatz
MC Schatz
O Trelles
Owen White
PD Schloss
RC Edgar
RK Aziz
RL Tatusov
S Angiuoli
Samuel V Angiuoli
SD Kahn
SF Altschul
SF Altschul
SR Eddy
TM Lowe
W Florian Fricke
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Next-generation sequencing technologies have decentralized sequence acquisition, increasing the demand for new bioinformatics tools that are easy to use, portable across multiple platforms, and scalable for high-throughput applications. Cloud computing platforms provide on-demand access to computing infrastructure over the Internet and can be used in combination with custom built virtual machines to distribute pre-packaged with pre-configured software. We describe the Cloud Virtual Resource, CloVR, a new desktop application for push-button automated sequence analysis that can utilize cloud computing resources. CloVR is implemented as a single portable virtual machine (VM) that provides several automated analysis pipelines for microbial genomics, including 16S, whole genome and metagenome sequence analysis. The CloVR VM runs on a personal computer, utilizes local computer resources and requires minimal installation, addressing key challenges in deploying bioinformatics workflows. In addition CloVR supports use of remote cloud computing resources to improve performance for large-scale sequence processing. In a case study, we demonstrate the use of CloVR to automatically process next-generation sequencing data on multiple cloud computing platforms. The CloVR VM and associated architecture lowers the barrier of entry for utilizing complex analysis protocols on both local single- and multi-core computers and cloud systems for high throughput data processing.https://doi.org/10.1186/1471-2105-12-35

Crossref

Springer - Publisher Connector

PubMed Central

Digital Repository at the University of Maryland

InterPro: the integrative protein signature database

The InterPro database (http://www.ebi.ac.uk/interpro/) integrates together predictive models or ‘signatures' representing protein domains, families and functional sites from multiple, diverse source databases: Gene3D, PANTHER, Pfam, PIRSF, PRINTS, ProDom, PROSITE, SMART, SUPERFAMILY and TIGRFAMs. Integration is performed manually and approximately half of the total ∼58 000 signatures available in the source databases belong to an InterPro entry. Recently, we have started to also display the remaining un-integrated signatures via our web interface. Other developments include the provision of non-signature data, such as structural data, in new XML files on our FTP site, as well as the inclusion of matchless UniProtKB proteins in the existing match XML files. The web interface has been extended and now links out to the ADAN predicted protein-protein interaction database and the SPICE and Dasty viewers. The latest public release (v18.0) covers 79.8% of UniProtKB (v14.1) and consists of 16 549 entries. InterPro data may be accessed either via the web address above, via web services, by downloading files by anonymous FTP or by using the InterProScan search software (http://www.ebi.ac.uk/Tools/InterProScan/

RERO DOC Digital Library

Comparative Genomics of Emerging Human Ehrlichiosis Agents

Anaplasma (formerly Ehrlichia) phagocytophilum, Ehrlichia chaffeensis, and Neorickettsia (formerly Ehrlichia) sennetsu are intracellular vector-borne pathogens that cause human ehrlichiosis, an emerging infectious disease. We present the complete genome sequences of these organisms along with comparisons to other organisms in the Rickettsiales order. Ehrlichia spp. and Anaplasma spp. display a unique large expansion of immunodominant outer membrane proteins facilitating antigenic variation. All Rickettsiales have a diminished ability to synthesize amino acids compared to their closest free-living relatives. Unlike members of the Rickettsiaceae family, these pathogenic Anaplasmataceae are capable of making all major vitamins, cofactors, and nucleotides, which could confer a beneficial role in the invertebrate vector or the vertebrate host. Further analysis identified proteins potentially involved in vacuole confinement of the Anaplasmataceae, a life cycle involving a hematophagous vector, vertebrate pathogenesis, human pathogenesis, and lack of transovarial transmission. These discoveries provide significant insights into the biology of these obligate intracellular pathogens

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

eScholarship - University of California