575 research outputs found

    MODBASE, a database of annotated comparative protein structure models and associated resources.

    Get PDF
    MODBASE (http://salilab.org/modbase) is a database of annotated comparative protein structure models. The models are calculated by MODPIPE, an automated modeling pipeline that relies primarily on MODELLER for fold assignment, sequence-structure alignment, model building and model assessment (http:/salilab.org/modeller). MODBASE currently contains 5,152,695 reliable models for domains in 1,593,209 unique protein sequences; only models based on statistically significant alignments and/or models assessed to have the correct fold are included. MODBASE also allows users to calculate comparative models on demand, through an interface to the MODWEB modeling server (http://salilab.org/modweb). Other resources integrated with MODBASE include databases of multiple protein structure alignments (DBAli), structurally defined ligand binding sites (LIGBASE), predicted ligand binding sites (AnnoLyze), structurally defined binary domain interfaces (PIBASE) and annotated single nucleotide polymorphisms and somatic mutations found in human proteins (LS-SNP, LS-Mut). MODBASE models are also available through the Protein Model Portal (http://www.proteinmodelportal.org/)

    Comparative functional genomics approach for the annotation of proteins in Unclassified Halophilic archaeon DL31

    Get PDF
    The structure, function and sub-cellular location prediction for the unknown proteins from Unclassified Halophilic archaeon DL31 were carried out for characterization of the proteins in their respective families. The 991 genes for hypothetical proteins in Halophilic archaeon DL31 chromosome were predicted by the application of computational methods and Bioinformatics web tools. The structure predictions for 206 unknown proteins were possible whereas functions were predicted in 825 protein sequences. The function prediction for the proteins were done by using Bioinformatics web tools like CDD-BLAST, INTERPROSCAN and PFAM by searching protein databases for the presence of conserved domains. The Sub-cellular location predictions were done for all the unknown proteins by using CELLO v 2.5 server. While tertiary structures were constructed using PS2 Server- Protein Structure Prediction server. This study revealed structural, functional and Sub-cellular localization of unknown proteins in Unclassified Halophilic archaeon DL31chromosome

    Interactive visualisation and exploration of biological data

    Get PDF
    International audienceno abstrac

    The Gene3D Web Services: a platform for identifying, annotating and comparing structural domains in protein sequences

    Get PDF
    The Gene3D structural domain database provides domain annotations for 7 million proteins, based on the manually curated structural domain superfamilies in CATH. These annotations are integrated with functional, genomic and molecular information from external resources, such as GO, EC, UniProt and the NCBI Taxonomy database. We have constructed a set of web services that provide programmatic access to this integrated database, as well as the Gene3D domain recognition tool (Gene3DScan) and protein sequence annotation pipeline for analysing novel protein sequences. Example queries include retrieving all curated GO terms for a domain superfamily or all the multi-domain architectures for the human genome. The services can be accessed using simple HTTP calls and are able to return results in a range of formats for quick downloading and easy parsing, graphical rendering and data storage. Hence, they provide a simple, but flexible means of integrating domain annotations and associated data sets into locally run pipelines and analysis software. The services can be found at http://gene3d.biochem.ucl.ac.uk/WebServices/

    ProtoNet 6.0: organizing 10 million protein sequences in a compact hierarchical family tree

    Get PDF
    ProtoNet 6.0 (http://www.protonet.cs.huji.ac.il) is a data structure of protein families that cover the protein sequence space. These families are generated through an unsupervised bottomā€“up clustering algorithm. This algorithm organizes large sets of proteins in a hierarchical tree that yields high-quality protein families. The 2012 ProtoNet (Version 6.0) tree includes over 9ā€‰million proteins of which 5.5% come from UniProtKB/SwissProt and the rest from UniProtKB/TrEMBL. The hierarchical tree structure is based on an all-against-all comparison of 2.5ā€‰million representatives of UniRef50. Rigorous annotation-based quality tests prune the tree to most informative 162ā€‰088 clusters. Every high-quality cluster is assigned a ProtoName that reflects the most significant annotations of its proteins. These annotations are dominated by GO terms, UniProt/Swiss-Prot keywords and InterPro. ProtoNet 6.0 operates in a default mode. When used in the advanced mode, this data structure offers the user a view of the family tree at any desired level of resolution. Systematic comparisons with previous versions of ProtoNet are carried out. They show how our view of protein families evolves, as larger parts of the sequence space become known. ProtoNet 6.0 provides numerous tools to navigate the hierarchy of clusters

    A biophysical approach to large-scale protein-DNA binding data

    Get PDF
    About this book * Cutting-edge genome analysis methods from leading bioinformaticians An accurate description of current scientific developments in the field of bioinformatics and computational implementation is presented by research of the BioSapiens Network of Excellence. Bioinformatics is essential for annotating the structure and function of genes, proteins and the analysis of complete genomes and to molecular biology and biochemistry. Included is an overview of bioinformatics, the full spectrum of genome annotation approaches including; genome analysis and gene prediction, gene regulation analysis and expression, genome variation and QTL analysis, large scale protein annotation of function and structure, annotation and prediction of protein interactions, and the organization and annotation of molecular networks and biochemical pathways. Also covered is a technical framework to organize and represent genome data using the DAS technology and work in the annotation of two large genomic sets: HIV/HCV viral genomes and splicing alternatives potentially encoded in 1% of the human genome

    Protein subfamily assignment using the Conserved Domain Database

    Get PDF
    Ā© 2008 Fong et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution Licens
    • ā€¦
    corecore