77 research outputs found

    The CATH Domain Structure Database and related resources Gene3D and DHS provide comprehensive domain family information for genome analysis

    Get PDF
    The CATH database of protein domain structures (http://www.biochem.ucl.ac.uk/bsm/cath/) currently contains 43 229 domains classified into 1467 superfamilies and 5107 sequence families. Each structural family is expanded with sequence relatives from GenBank and completed genomes, using a variety of efficient sequence search protocols and reliable thresholds. This extended CATH protein family database contains 616 470 domain sequences classified into 23 876 sequence families. This results in the significant expansion of the CATHHMMmodel library to include models built from the CATH sequence relatives, giving a10%increase in coveragefor detecting remote homologues. An improved Dictionary of Homologous superfamilies (DHS) (http://www.biochem.ucl.ac.uk/bsm/dhs/) containing specific sequence, structural and functional information for each superfamily in CATH considerably assists manual validation of homologues. Information on sequence relatives in CATH superfamilies, GenBank and completed genomes is presented in the CATH associated DHS and Gene3D resources. Domain partnership information can be obtained from Gene3D (http://www.biochem.ucl.ac.uk/bsm/cath/Gene3D/). A new CATH server has been implemented (http://www.biochem.ucl.ac.uk/cgi-bin/cath/CathServer.pl) providing automatic classification of newly determined sequences and structures using a suite of rapid sequence and structure comparison methods. The statistical significance of matches is assessed and links are provided to the putative superfamily or fold group to which the query sequence or structure is assigned

    Gene3D: modelling protein structure, function and evolution

    Get PDF
    The Gene3D release 4 database and web portal () provide a combined structural, functional and evolutionary view of the protein world. It is focussed on providing structural annotation for protein sequences without structural representatives—including the complete proteome sets of over 240 different species. The protein sequences have also been clustered into whole-chain families so as to aid functional prediction. The structural annotation is generated using HMM models based on the CATH domain families; CATH is a repository for manually deduced protein domains. Amongst the changes from the last publication are: the addition of over 100 genomes and the UniProt sequence database, domain data from Pfam, metabolic pathway and functional data from COGs, KEGG and GO, and protein–protein interaction data from MINT and BIND. The website has been rebuilt to allow more sophisticated querying and the data returned is presented in a clearer format with greater functionality. Furthermore, all data can be downloaded in a simple XML format, allowing users to carry out complex investigations at their own computers

    The CATH domain structure database: new protocols and classification levels give a more comprehensive resource for exploring evolution

    Get PDF
    We report the latest release (version 3.0) of the CATH protein domain database (). There has been a 20% increase in the number of structural domains classified in CATH, up to 86 151 domains. Release 3.0 comprises 1110 fold groups and 2147 homologous superfamilies. To cope with the increases in diverse structural homologues being determined by the structural genomics initiatives, more sensitive methods have been developed for identifying boundaries in multi-domain proteins and for recognising homologues. The CATH classification update is now being driven by an integrated pipeline that links these automated procedures with validation steps, that have been made easier by the provision of information rich web pages summarising comparison scores and relevant links to external sites for each domain being classified. An analysis of the population of domains in the CATH hierarchy and several domain characteristics are presented for version 3.0. We also report an update of the CATH Dictionary of homologous structures (CATH-DHS) which now contains multiple structural alignments, consensus information and functional annotations for 1459 well populated superfamilies in CATH. CATH is directly linked to the Gene3D database which is a projection of CATH structural data onto ∼2 million sequences in completed genomes and UniProt

    MoKCa database - mutations of kinases in cancer

    Get PDF
    Members of the protein kinase family are amongst the most commonly mutated genes in human cancer, and both mutated and activated protein kinases have proved to be tractable targets for the development of new anticancer therapies The MoKCa database (Mutations of Kinases in Cancer, http://strubiol.icr.ac.uk/extra/mokca) has been developed to structurally and functionally annotate, and where possible predict, the phenotypic consequences of mutations in protein kinases implicated in cancer. Somatic mutation data from tumours and tumour cell lines have been mapped onto the crystal structures of the affected protein domains. Positions of the mutated amino-acids are highlighted on a sequence-based domain pictogram, as well as a 3D-image of the protein structure, and in a molecular graphics package, integrated for interactive viewing. The data associated with each mutation is presented in the Web interface, along with expert annotation of the detailed molecular functional implications of the mutation. Proteins are linked to functional annotation resources and are annotated with structural and functional features such as domains and phosphorylation sites. MoKCa aims to provide assessments available from multiple sources and algorithms for each potential cancer-associated mutation, and present these together in a consistent and coherent fashion to facilitate authoritative annotation by cancer biologists and structural biologists, directly involved in the generation and analysis of new mutational data

    ProtoNet 6.0: organizing 10 million protein sequences in a compact hierarchical family tree

    Get PDF
    ProtoNet 6.0 (http://www.protonet.cs.huji.ac.il) is a data structure of protein families that cover the protein sequence space. These families are generated through an unsupervised bottom–up clustering algorithm. This algorithm organizes large sets of proteins in a hierarchical tree that yields high-quality protein families. The 2012 ProtoNet (Version 6.0) tree includes over 9 million proteins of which 5.5% come from UniProtKB/SwissProt and the rest from UniProtKB/TrEMBL. The hierarchical tree structure is based on an all-against-all comparison of 2.5 million representatives of UniRef50. Rigorous annotation-based quality tests prune the tree to most informative 162 088 clusters. Every high-quality cluster is assigned a ProtoName that reflects the most significant annotations of its proteins. These annotations are dominated by GO terms, UniProt/Swiss-Prot keywords and InterPro. ProtoNet 6.0 operates in a default mode. When used in the advanced mode, this data structure offers the user a view of the family tree at any desired level of resolution. Systematic comparisons with previous versions of ProtoNet are carried out. They show how our view of protein families evolves, as larger parts of the sequence space become known. ProtoNet 6.0 provides numerous tools to navigate the hierarchy of clusters

    DBAli tools: mining the protein structure space

    Get PDF
    The DBAli tools use a comprehensive set of structural alignments in the DBAli database to leverage the structural information deposited in the Protein Data Bank (PDB). These tools include (i) the DBAlit program that allows users to input the 3D coordinates of a protein structure for comparison by MAMMOTH against all chains in the PDB; (ii) the AnnoLite and AnnoLyze programs that annotate a target structure based on its stored relationships to other structures; (iii) the ModClus program that clusters structures by sequence and structure similarities; (iv) the ModDom program that identifies domains as recurrent structural fragments and (v) an implementation of the COMPARER method in the SALIGN command in MODELLER that creates a multiple structure alignment for a set of related protein structures. Thus, the DBAli tools, which are freely accessible via the World Wide Web at http://salilab.org/DBAli/, allow users to mine the protein structure space by establishing relationships between protein structures and their functions

    Ascaris lumbricoides β carbonic anhydrase: A potential target enzyme for treatment of ascariasis

    Get PDF
    BACKGROUND: A parasitic roundworm, Ascaris lumbricoides, is the causative agent of ascariasis, with approximately 760 million cases around the world. Helminthic infections occur with a high prevalence mostly in tropical and developing xcountries. Therefore, design of affordable broad-spectrum anti-helminthic agents against a variety of pathogens, including not only A. lumbricoides but also hookworms and whipworms, is desirable. Beta carbonic anhydrases (β-CAs) are considered promising targets of novel anthelminthics because these enzymes are present in various parasites, while completely absent in vertebrates. METHODS: In this study, we identified an A. lumbricoides β-CA (AIBCA) protein from protein sequence data using bioinformatics tools. We used computational biology resources and methods (including InterPro, CATH/Gene3D, KEGG, and METACYC) to analyze AlBCA and define potential roles of this enzyme in biological pathways. The AlBCA gene was cloned into pFastBac1, and recombinant AIBCA was produced in sf-9 insect cells. Kinetics of AlBCA were analyzed by a stopped-flow method. RESULTS: Multiple sequence alignment revealed that AIBCA contains the two sequence motifs, CXDXR and HXXC, typical for β-CAs. Recombinant AIBCA showed significant CA catalytic activity with k(cat) of 6.0 × 10(5) s(−1) and k(cat)/K(M) of 4.3 × 10(7) M(−1) s(−1). The classical CA inhibitor, acetazolamide, showed an inhibition constant of 84.1 nM. Computational modeling suggests that the molecular architecture of AIBCA is highly similar to several other known β-CA structures. Functional predictions suggest that AIBCA might play a role in bicarbonate-mediated metabolic pathways, such as gluconeogenesis and removal of metabolically produced cyanate. CONCLUSIONS: These results open new avenues to further investigate the precise functions of β-CAs in parasites and suggest that novel β-CA specific inhibitors should be developed and tested against helminthic diseases. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13071-015-1098-5) contains supplementary material, which is available to authorized users

    DMAPS: a database of multiple alignments for protein structures

    Get PDF
    The database of multiple alignments for protein structures (DMAPS) provides instant access to pre-computed multiple structure alignments for all protein structure families in the Protein Data Bank (PDB). Protein structure families have been obtained from four distinct classification methods including SCOP, CATH, ENZYME and CE, and multiple structure alignments have been built for all families containing at least three members, using CE-MC software. Currently, multiple structure alignments are available for 3050 SCOP-, 3087 CATH-, 664 ENZYME- and 1707 CE-based families. A web-based query system has been developed to retrieve multiple alignments for these families using the PDB chain ID of any member of a family. Multiple alignments can be viewed or downloaded in six different formats, including JOY/html, TEXT, FASTA, PDB (superimposed coordinates), JOY/postscript and JOY/rtf. DMAPS is accessible online at

    MODBASE: a database of annotated comparative protein structure models and associated resources

    Get PDF
    MODBASE () is a database of annotated comparative protein structure models for all available protein sequences that can be matched to at least one known protein structure. The models are calculated by MODPIPE, an automated modeling pipeline that relies on MODELLER for fold assignment, sequence–structure alignment, model building and model assessment (). MODBASE is updated regularly to reflect the growth in protein sequence and structure databases, and improvements in the software for calculating the models. MODBASE currently contains 3 094 524 reliable models for domains in 1 094 750 out of 1 817 889 unique protein sequences in the UniProt database (July 5, 2005); only models based on statistically significant alignments and models assessed to have the correct fold despite insignificant alignments are included. MODBASE also allows users to generate comparative models for proteins of interest with the automated modeling server MODWEB (). Our other resources integrated with MODBASE include comprehensive databases of multiple protein structure alignments (DBAli, ), structurally defined ligand binding sites and structurally defined binary domain interfaces (PIBASE, ) as well as predictions of ligand binding sites, interactions between yeast proteins, and functional consequences of human nsSNPs (LS-SNP, )

    PUMA2—grid-based high-throughput analysis of genomes and metabolic pathways

    Get PDF
    The PUMA2 system (available at ) is an interactive, integrated bioinformatics environment for high-throughput genetic sequence analysis and metabolic reconstructions from sequence data. PUMA2 provides a framework for comparative and evolutionary analysis of genomic data and metabolic networks in the context of taxonomic and phenotypic information. Grid infrastructure is used to perform computationally intensive tasks. PUMA2 currently contains precomputed analysis of 213 prokaryotic, 22 eukaryotic, 650 mitochondrial and 1493 viral genomes and automated metabolic reconstructions for >200 organisms. Genomic data is annotated with information integrated from >20 sequence, structural and metabolic databases and ontologies. PUMA2 supports both automated and interactive expert-driven annotation of genomes, using a variety of publicly available bioinformatics tools. It also contains a suite of unique PUMA2 tools for automated assignment of gene function, evolutionary analysis of protein families and comparative analysis of metabolic pathways. PUMA2 allows users to submit batch sequence data for automated functional analysis and construction of metabolic models. The results of these analyses are made available to the users in the PUMA2 environment for further interactive sequence analysis and annotation
    corecore