1,505 research outputs found

    CDD: specific functional annotation with the Conserved Domain Database

    Get PDF
    NCBI's Conserved Domain Database (CDD) is a collection of multiple sequence alignments and derived database search models, which represent protein domains conserved in molecular evolution. The collection can be accessed at http://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml, and is also part of NCBI's Entrez query and retrieval system, cross-linked to numerous other resources. CDD provides annotation of domain footprints and conserved functional sites on protein sequences. Precalculated domain annotation can be retrieved for protein sequences tracked in NCBI's Entrez system, and CDD's collection of models can be queried with novel protein sequences via the CD-Search service at http://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi. Starting with the latest version of CDD, v2.14, information from redundant and homologous domain models is summarized at a superfamily level, and domain annotation on proteins is flagged as either ‘specific’ (identifying molecular function with high confidence) or as ‘non-specific’ (identifying superfamily membership only)

    Inferred Biomolecular Interaction Server—a web server to analyze and predict protein interacting partners and binding sites

    Get PDF
    IBIS is the NCBI Inferred Biomolecular Interaction Server. This server organizes, analyzes and predicts interaction partners and locations of binding sites in proteins. IBIS provides annotations for different types of binding partners (protein, chemical, nucleic acid and peptides), and facilitates the mapping of a comprehensive biomolecular interaction network for a given protein query. IBIS reports interactions observed in experimentally determined structural complexes of a given protein, and at the same time IBIS infers binding sites/interacting partners by inspecting protein complexes formed by homologous proteins. Similar binding sites are clustered together based on their sequence and structure conservation. To emphasize biologically relevant binding sites, several algorithms are used for verification in terms of evolutionary conservation, biological importance of binding partners, size and stability of interfaces, as well as evidence from the published literature. IBIS is updated regularly and is freely accessible via http://www.ncbi.nlm.nih.gov/Structure/ibis/ibis.html

    CDD: a Conserved Domain Database for the functional annotation of proteins

    Get PDF
    NCBI’s Conserved Domain Database (CDD) is a resource for the annotation of protein sequences with the location of conserved domain footprints, and functional sites inferred from these footprints. CDD includes manually curated domain models that make use of protein 3D structure to refine domain models and provide insights into sequence/structure/function relationships. Manually curated models are organized hierarchically if they describe domain families that are clearly related by common descent. As CDD also imports domain family models from a variety of external sources, it is a partially redundant collection. To simplify protein annotation, redundant models and models describing homologous families are clustered into superfamilies. By default, domain footprints are annotated with the corresponding superfamily designation, on top of which specific annotation may indicate high-confidence assignment of family membership. Pre-computed domain annotation is available for proteins in the Entrez/Protein dataset, and a novel interface, Batch CD-Search, allows the computation and download of annotation for large sets of protein queries. CDD can be accessed via http://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml

    Genomic and transcriptional analysis of protein heterogeneity of the honeybee venom allergen Api m 6

    Get PDF
    Several components of honeybee venom are known to cause allergenic responses in humans and other vertebrates. One such component, the minor allergen Api m 6, has been known to show amino acid variation but the genetic mechanism for this variation is unknown. Here we show that Api m 6 is derived from a single locus, and that substantial protein-level variation has a simple genome-level cause, without the need to invoke multiple loci or alternatively spliced exons. Api m 6 sits near a misassembled section of the honeybee genome sequence, and we propose that a substantial number of indels at and near Api m 6 might be the root cause of this misassembly. We suggest that genes such as Api m 6 with coding-region or untranslated region indels might have had a strong effect on the assembly of this draft of the honeybee genome

    Full-length genome sequence of Ntaya virus

    Get PDF
    Presentation of pyrosequencing data and phylogenetic analysis for the full genome of Ntaya virus, type virus of the Ntaya virus group of the Flaviviridae isolated in Cameroon in 1966

    Nucleotide sequence and genome organization of Dweet mottle virus and its relationship to members of the family Betaflexiviridae

    Get PDF
    The nucleotide sequence of Dweet mottle virus (DMV) was determined and compared to sequences of members of the families Alphaflexiviridae and Betaflexiviridae. The DMV genome has 8,747 nucleotides (nt) excluding the 3′ poly-(A) tail. DMV genomic RNA contains three putative open reading frames (ORFs) and untranslated regions of 73 nt at the 5′ and 541 nt at 3′ termini. ORF1 potentially encoding a 227.48-kDa polyprotein, which has methyltransferase, oxygenase, endopeptidase, helicase, and RNA-dependent RNA polymerase (RdRP) domains. ORF2 encodes a movement protein of 40.25 kDa, while ORF3 encodes a coat protein of 40.69 kDa. Protein database searches showed 98–99% matches of DMV ORFs with citrus leaf blotch virus (CLBV) sequences. Phylogenetic analysis based on the RdRP core domain revealed that DMV is closely related to CLBV as a member of the genus Citrivirus. DMV did not satisfy the molecular criteria for demarcation of an independent species within the genus Citrivirus, family Betaflexiviridae, and hence, DMV can be considered a CLBV isolate

    ComSin: database of protein structures in bound (complex) and unbound (single) states in relation to their intrinsic disorder

    Get PDF
    Most of the proteins in a cell assemble into complexes to carry out their function. In this work, we have created a new database (named ComSin) of protein structures in bound (complex) and unbound (single) states to provide a researcher with exhaustive information on structures of the same or homologous proteins in bound and unbound states. From the complete Protein Data Bank (PDB), we selected 24 910 pairs of protein structures in bound and unbound states, and identified regions of intrinsic disorder. For 2448 pairs, the proteins in bound and unbound states are identical, while 7129 pairs have sequence identity 90% or larger. The developed server enables one to search for proteins in bound and unbound states with several options including sequence similarity between the corresponding proteins in bound and unbound states, and validation of interaction interfaces of protein complexes. Besides that, through our web server, one can obtain necessary information for studying disorder-to-order and order-to-disorder transitions upon complex formation, and analyze structural differences between proteins in bound and unbound states. The database is available at http://antares.protres.ru/comsin/

    MMDB: annotating protein sequences with Entrez's 3D-structure database

    Get PDF
    Three-dimensional (3D) structure is now known for a large fraction of all protein families. Thus, it has become rather likely that one will find a homolog with known 3D structure when searching a sequence database with an arbitrary query sequence. Depending on the extent of similarity, such neighbor relationships may allow one to infer biological function and to identify functional sites such as binding motifs or catalytic centers. Entrez's 3D-structure database, the Molecular Modeling Database (MMDB), provides easy access to the richness of 3D structure data and its large potential for functional annotation. Entrez's search engine offers several tools to assist biologist users: (i) links between databases, such as between protein sequences and structures, (ii) pre-computed sequence and structure neighbors, (iii) visualization of structure and sequence/structure alignment. Here, we describe an annotation service that combines some of these tools automatically, Entrez's ‘Related Structure’ links. For all proteins in Entrez, similar sequences with known 3D structure are detected by BLAST and alignments are recorded. The ‘Related Structure’ service summarizes this information and presents 3D views mapping sequence residues onto all 3D structures available in MMDB ()

    Automatic annotation of experimentally derived, evolutionarily conserved post-translational modifications onto multiple genomes

    Get PDF
    New generation sequencing technologies have resulted in significant increases in the number of complete genomes. Functional characterization of these genomes, such as by high-throughput proteomics, is an important but challenging task due to the difficulty of scaling up existing experimental techniques. By use of comparative genomics techniques, experimental results can be transferred from one genome to another, while at the same time minimizing errors by requiring discovery in multiple genomes. In this study, protein phosphorylation, an essential component of many cellular processes, is studied using data from large-scale proteomics analyses of the phosphoproteome. Phosphorylation sites from Homo sapiens, Mus musculus and Drosophila melanogaster phosphopeptide data sets were mapped onto conserved domains in NCBI’s manually curated portion of Conserved Domain Database (CDD). In this subset, 25 phosphorylation sites are found to be evolutionarily conserved between the three species studied. Transfer of phosphorylation annotation of these conserved sites onto sequences sharing the same conserved domains yield 3253 phosphosite annotations for proteins from coelomata, the taxonomic division that spans H. sapiens, M. musculus and D. melanogaster. The method scales automatically, so as the amount of experimental phosphoproteomics data increases, more conserved phosphorylation sites may be revealed

    Homology Inference of Protein-Protein Interactions via Conserved Binding Sites

    Get PDF
    The coverage and reliability of protein-protein interactions determined by high-throughput experiments still needs to be improved, especially for higher organisms, therefore the question persists, how interactions can be verified and predicted by computational approaches using available data on protein structural complexes. Recently we developed an approach called IBIS (Inferred Biomolecular Interaction Server) to predict and annotate protein-protein binding sites and interaction partners, which is based on the assumption that the structural location and sequence patterns of protein-protein binding sites are conserved between close homologs. In this study first we confirmed high accuracy of our method and found that its accuracy depends critically on the usage of all available data on structures of homologous complexes, compared to the approaches where only a non-redundant set of complexes is employed. Second we showed that there exists a trade-off between specificity and sensitivity if we employ in the prediction only evolutionarily conserved binding site clusters or clusters supported by only one observation (singletons). Finally we addressed the question of identifying the biologically relevant interactions using the homology inference approach and demonstrated that a large majority of crystal packing interactions can be correctly identified and filtered by our algorithm. At the same time, about half of biological interfaces that are not present in the protein crystallographic asymmetric unit can be reconstructed by IBIS from homologous complexes without the prior knowledge of crystal parameters of the query protein
    corecore