10,637 research outputs found

    Annotation and Curation of the Protein Data Bank

    Get PDF
    The Protein Data Bank (PDB) is the worldwide repository for experimentally determined 3D structures of biological macromolecules. Established in 1971 with just seven structures, it presently includes more than 56,000 entries. To maintain the highest standards in curation and processing, the members of the worldwide Protein Data Bank (wwPDB) collaborate in data annotation and the development of procedures, tools, and resources. Annotation-related issues, particularly those impacted by new developments
in structural biology, are critically reviewed at in-person and virtual meetings regularly and frequently. Comprehensive documentation of the procedures, formats, and related data dictionaries used in data annotation are available at the wwPDB website(www.wwpdb.org).

Mindful of the impact that changes in annotation procedures or data format may have on users, changes are carefully managed and communicated in a timely fashion. In cases involving complex scientific or policy issues, input is sought from advisory committees, standing task forces, experimental method developers, and community experts. This is exemplified by creation of the recently-released version of the PDB archive which updates and further standardizes database references, small molecule chemistry, biological assemblies, and active sites

    Domain-based small molecule binding site annotation

    Get PDF
    BACKGROUND: Accurate small molecule binding site information for a protein can facilitate studies in drug docking, drug discovery and function prediction, but small molecule binding site protein sequence annotation is sparse. The Small Molecule Interaction Database (SMID), a database of protein domain-small molecule interactions, was created using structural data from the Protein Data Bank (PDB). More importantly it provides a means to predict small molecule binding sites on proteins with a known or unknown structure and unlike prior approaches, removes large numbers of false positive hits arising from transitive alignment errors, non-biologically significant small molecules and crystallographic conditions that overpredict ion binding sites. DESCRIPTION: Using a set of co-crystallized protein-small molecule structures as a starting point, SMID interactions were generated by identifying protein domains that bind to small molecules, using NCBI's Reverse Position Specific BLAST (RPS-BLAST) algorithm. SMID records are available for viewing at . The SMID-BLAST tool provides accurate transitive annotation of small-molecule binding sites for proteins not found in the PDB. Given a protein sequence, SMID-BLAST identifies domains using RPS-BLAST and then lists potential small molecule ligands based on SMID records, as well as their aligned binding sites. A heuristic ligand score is calculated based on E-value, ligand residue identity and domain entropy to assign a level of confidence to hits found. SMID-BLAST predictions were validated against a set of 793 experimental small molecule interactions from the PDB, of which 472 (60%) of predicted interactions identically matched the experimental small molecule and of these, 344 had greater than 80% of the binding site residues correctly identified. Further, we estimate that 45% of predictions which were not observed in the PDB validation set may be true positives. CONCLUSION: By focusing on protein domain-small molecule interactions, SMID is able to cluster similar interactions and detect subtle binding patterns that would not otherwise be obvious. Using SMID-BLAST, small molecule targets can be predicted for any protein sequence, with the only limitation being that the small molecule must exist in the PDB. Validation results and specific examples within illustrate that SMID-BLAST has a high degree of accuracy in terms of predicting both the small molecule ligand and binding site residue positions for a query protein

    The RCSB Protein Data Bank: views of structural biology for basic and applied research and education.

    Get PDF
    The RCSB Protein Data Bank (RCSB PDB, http://www.rcsb.org) provides access to 3D structures of biological macromolecules and is one of the leading resources in biology and biomedicine worldwide. Our efforts over the past 2 years focused on enabling a deeper understanding of structural biology and providing new structural views of biology that support both basic and applied research and education. Herein, we describe recently introduced data annotations including integration with external biological resources, such as gene and drug databases, new visualization tools and improved support for the mobile web. We also describe access to data files, web services and open access software components to enable software developers to more effectively mine the PDB archive and related annotations. Our efforts are aimed at expanding the role of 3D structure in understanding biology and medicine

    A structural annotation resource for the selection of putative target proteins in the malaria parasite

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Protein structure plays a pivotal role in elucidating mechanisms of parasite functioning and drug resistance. Moreover, protein structure aids the determination of protein function, which can together with the structure be used to identify novel drug targets in the parasite. However, various structural features in <it>Plasmodium falciparum </it>proteins complicate the experimental determination of protein structures. Limited similarity to proteins in the Protein Data Bank and the shortage of solved protein structures in the malaria parasite necessitate genome-scale structural annotation of <it>P. falciparum </it>proteins. Additionally, the annotation of a range of structural features facilitates the identification of suitable targets for experimental and computational studies.</p> <p>Methods</p> <p>An integrated structural annotation system was developed and applied to <it>P. falciparum</it>, <it>Plasmodium vivax </it>and <it>Plasmodium yoelii</it>. The annotation included searches for sequence similarity, patterns and domains in addition to the following predictions: secondary structure, transmembrane helices, protein disorder, low complexity, coiled-coils and small molecule interactions. Subsequently, candidate proteins for further structural studies were identified based on the annotated structural features.</p> <p>Results</p> <p>The annotation results are accessible through a web interface, enabling users to select groups of proteins which fulfil multiple criteria pertaining to structural and functional features <abbrgrp><abbr bid="B1">1</abbr></abbrgrp>. Analysis of features in the <it>P. falciparum </it>proteome showed that protein-interacting proteins contained a higher percentage of predicted disordered residues than non-interacting proteins. Proteins interacting with 10 or more proteins have a disordered content concentrated in the range of 60–100%, while the disorder distribution for proteins having only one interacting partner, was more evenly spread.</p> <p>Conclusion</p> <p>A series of <it>P. falciparum </it>protein targets for experimental structure determination, comparative modelling and <it>in silico </it>docking studies were putatively identified. The system is available for public use, where researchers may identify proteins by querying with multiple physico-chemical, sequence similarity and interaction features.</p

    Representing and analysing molecular and cellular function in the computer

    Get PDF
    Determining the biological function of a myriad of genes, and understanding how they interact to yield a living cell, is the major challenge of the post genome-sequencing era. The complexity of biological systems is such that this cannot be envisaged without the help of powerful computer systems capable of representing and analysing the intricate networks of physical and functional interactions between the different cellular components. In this review we try to provide the reader with an appreciation of where we stand in this regard. We discuss some of the inherent problems in describing the different facets of biological function, give an overview of how information on function is currently represented in the major biological databases, and describe different systems for organising and categorising the functions of gene products. In a second part, we present a new general data model, currently under development, which describes information on molecular function and cellular processes in a rigorous manner. The model is capable of representing a large variety of biochemical processes, including metabolic pathways, regulation of gene expression and signal transduction. It also incorporates taxonomies for categorising molecular entities, interactions and processes, and it offers means of viewing the information at different levels of resolution, and dealing with incomplete knowledge. The data model has been implemented in the database on protein function and cellular processes 'aMAZE' (http://www.ebi.ac.uk/research/pfbp/), which presently covers metabolic pathways and their regulation. Several tools for querying, displaying, and performing analyses on such pathways are briefly described in order to illustrate the practical applications enabled by the model

    Charge environments around phosphorylation sites in proteins

    Get PDF
    Background: Phosphorylation is a central feature in many biological processes. Structural analyses have identified the importance of charge-charge interactions, for example mediating phosphorylation-driven allosteric change and protein binding to phosphopeptides. Here, we examine computationally the prevalence of charge stabilisation around phosphorylated sites in the structural database, through comparison with locations that are not phosphorylated in the same structures. Results: A significant fraction of phosphorylated sites appear to be electrostatically stabilised, largely through interaction with sidechains. Some examples of stabilisation across a subunit interface are evident from calculations with biological units. When considering the immediately surrounding environment, in many cases favourable interactions are only apparent after conformational change that accompanies phosphorylation. A simple calculation of potential interactions at longer-range, applied to non-phosphorylated structures, recovers the separation exhibited by phosphorylated structures. In a study of sites in the Phospho.ELM dataset, for which structural annotation is provided by non-phosphorylated proteins, there is little separation of the known phospho-acceptor sites relative to background, even using the wider interaction radius. However, there are differences in the distributions of patch polarity for acceptor and background sites in the Phospho.ELM dataset. Conclusion: In this study, an easy to implement procedure is developed that could contribute to the identification of phospho-acceptor sites associated with charge-charge interactions and conformational change. Since the method gives information about potential anchoring interactions subsequent to phosphorylation, it could be combined with simulations that probe conformational change. Our analysis of the Phospho.ELM dataset also shows evidence for mediation of phosphorylation effects through (i) conformational change associated with making a solvent inaccessible phospho-acceptor site accessible, and (ii) modulation of protein-protein interactions

    Integration and mining of malaria molecular, functional and pharmacological data: how far are we from a chemogenomic knowledge space?

    Get PDF
    The organization and mining of malaria genomic and post-genomic data is highly motivated by the necessity to predict and characterize new biological targets and new drugs. Biological targets are sought in a biological space designed from the genomic data from Plasmodium falciparum, but using also the millions of genomic data from other species. Drug candidates are sought in a chemical space containing the millions of small molecules stored in public and private chemolibraries. Data management should therefore be as reliable and versatile as possible. In this context, we examined five aspects of the organization and mining of malaria genomic and post-genomic data: 1) the comparison of protein sequences including compositionally atypical malaria sequences, 2) the high throughput reconstruction of molecular phylogenies, 3) the representation of biological processes particularly metabolic pathways, 4) the versatile methods to integrate genomic data, biological representations and functional profiling obtained from X-omic experiments after drug treatments and 5) the determination and prediction of protein structures and their molecular docking with drug candidate structures. Progresses toward a grid-enabled chemogenomic knowledge space are discussed.Comment: 43 pages, 4 figures, to appear in Malaria Journa

    Assessing the Gene Content of the Megagenome: Sugar Pine (Pinus lambertiana).

    Get PDF
    Sugar pine (Pinus lambertiana Douglas) is within the subgenus Strobus with an estimated genome size of 31 Gbp. Transcriptomic resources are of particular interest in conifers due to the challenges presented in their megagenomes for gene identification. In this study, we present the first comprehensive survey of the P. lambertiana transcriptome through deep sequencing of a variety of tissue types to generate more than 2.5 billion short reads. Third generation, long reads generated through PacBio Iso-Seq have been included for the first time in conifers to combat the challenges associated with de novo transcriptome assembly. A technology comparison is provided here to contribute to the otherwise scarce comparisons of second and third generation transcriptome sequencing approaches in plant species. In addition, the transcriptome reference was essential for gene model identification and quality assessment in the parallel project responsible for sequencing and assembly of the entire genome. In this study, the transcriptomic data were also used to address questions surrounding lineage-specific Dicer-like proteins in conifers. These proteins play a role in the control of transposable element proliferation and the related genome expansion in conifers
    • …
    corecore