22 research outputs found

    SuperLigands – a database of ligand structures derived from the Protein Data Bank

    Get PDF
    BACKGROUND: Currently, the PDB contains approximately 29,000 protein structures comprising over 70,000 experimentally determined three-dimensional structures of over 5,000 different low molecular weight compounds. Information about these PDB ligands can be very helpful in the field of molecular modelling and prediction, particularly for the prediction of protein binding sites and function. DESCRIPTION: Here we present an Internet accessible database delivering PDB ligands in the MDL Mol file format which, in contrast to the PDB format, includes information about bond types. Structural similarity of the compounds can be detected by calculation of Tanimoto coefficients and by three-dimensional superposition. Topological similarity of PDB ligands to known drugs can be assessed via Tanimoto coefficients. CONCLUSION: SuperLigands supplements the set of existing resources of information about small molecules bound to PDB structures. Allowing for three-dimensional comparison of the compounds as a novel feature, this database represents a valuable means of analysis and prediction in the field of biological and medical research

    The Jena Library of Biological Macromolecules - JenaLib

    Get PDF
    The JenaLib database ("www.fli-leibniz.de/IMAGE.html":http://www.fli-leibniz.de/IMAGE.html) offers value-added information for all Protein Data Bank (PDB) and Nucleic Acid Database (NDB) entries. This includes:
(1) atlas pages and entry lists, (2) PDB sequence information extracted from atomic coordinates, (3) PDB/UniProt sequence alignments that clearly indicate gaps, mutations, numbering irregularities and modified residues, (4) integration of data on single amino acid polymorphisms (SAPs), PROSITE motifs, exon structure and SCOP/CATH/Pfam domains with PDB, GO and taxonomy information, (5) display of these data in the sequence/alignment viewer and in the Jmol-based molecule viewer Jena3D ("jena3d.fli-leibniz.de":http://jena3d.fli-leibniz.de ); in the latter case both for asymmetric and biological units, (6) a QuickSearch option that allows searching for PDB/NDB code, UniProt ID/accession number and other search terms in one input field, (7) a sequence homology search (BLAST) and pattern search options and (8) SCOP/CATH/Pfam tree browsers.

Offering all this information and analysis tools in one place makes JenaLib a unique resource for the dissemination of 3D structural information on biological macromolecules

    AISMIG—an interactive server-side molecule image generator

    Get PDF
    Using a web browser without additional software and generating interactive high quality and high resolution images of bio-molecules is no longer a problem. Interactive visualization of 3D molecule structures by Internet browsers normally is not possible without additional software and the disadvantage of browser-based structure images (e.g. by a Java applet) is their low resolution. Scientists who want to generate 3D molecular images with high quality and high resolution (e.g. for publications or to render a molecule for a poster) therefore require separately installed software that is often not easy to use. The alternative concept is an interactive server-side rendering application that can be interfaced with any web browser. Thus it combines the advantage of the web application with the high-end rendering of a raytracer. This article addresses users who want to generate high quality images from molecular structures and do not have software installed locally for structure visualization. Often people do not have a structure viewer, such as RasMol or Chime (or even Java) installed locally but want to visualize a molecule structure interactively. AISMIG (An Interactive Server-side Molecule Image Generator) is a web service that provides a visualization of molecule structures in such cases. AISMIG-URL:

    The SYSTERS Protein Family Database in 2005

    Get PDF
    The SYSTERS project aims to provide a meaningful partitioning of the whole protein sequence space by a fully automatic procedure. A refined two-step algorithm assigns each protein to a family and a superfamily. The sequence data underlying SYSTERS release 4 now comprise several protein sequence databases derived from completely sequenced genomes (ENSEMBL, TAIR, SGD and GeneDB), in addition to the comprehensive Swiss-Prot/TrEMBL databases. The SYSTERS web server (http://systers.molgen.mpg.de) provides access to 158 153 SYSTERS protein families. To augment the automatically derived results, information from external databases like Pfam and Gene Ontology are added to the web server. Furthermore, users can retrieve pre-processed analyses of families like multiple alignments and phylogenetic trees. New query options comprise a batch retrieval tool for functional inference about families based on automatic keyword extraction from sequence annotations. A new access point, PhyloMatrix, allows the retrieval of phylogenetic profiles of SYSTERS families across organisms with completely sequenced genomes

    EC-PSI: Associating Enzyme Commission Numbers with Pfam Domains

    Get PDF
    International audienceWith the growing number of protein structures in the protein data bank (PDB), there is a need to annotate these structures at the domain level in order to relate protein structure to protein function. Thanks to the SIFTS database, many PDB chains are now cross-referenced with Pfam domains and enzyme commission (EC) numbers. However, these annotations do not include any explicit relationship between individual Pfam domains and EC numbers. This article presents a novel statistical training-based method called EC-PSI that can automatically infer high confidence associations between EC numbers and Pfam domains directly from EC-chain associations from SIFTS and from EC-sequence associations from the SwissProt, and TrEMBL databases. By collecting and integrating these existing EC-chain/sequence annotations, our approach is able to infer a total of 8,329 direct EC-Pfam associations with an overall F-measure of 0.819 with respect to the manually curated InterPro database, which we treat here as a " gold standard " reference dataset. Thus, compared to the 1,493 EC-Pfam associations in InterPro, our approach provides a way to find over six times as many high quality EC-Pfam associations completely automatically

    Automatic discovery of cross-family sequence features associated with protein function

    Get PDF
    BACKGROUND: Methods for predicting protein function directly from amino acid sequences are useful tools in the study of uncharacterised protein families and in comparative genomics. Until now, this problem has been approached using machine learning techniques that attempt to predict membership, or otherwise, to predefined functional categories or subcellular locations. A potential drawback of this approach is that the human-designated functional classes may not accurately reflect the underlying biology, and consequently important sequence-to-function relationships may be missed. RESULTS: We show that a self-supervised data mining approach is able to find relationships between sequence features and functional annotations. No preconceived ideas about functional categories are required, and the training data is simply a set of protein sequences and their UniProt/Swiss-Prot annotations. The main technical aspect of the approach is the co-evolution of amino acid-based regular expressions and keyword-based logical expressions with genetic programming. Our experiments on a strictly non-redundant set of eukaryotic proteins reveal that the strongest and most easily detected sequence-to-function relationships are concerned with targeting to various cellular compartments, which is an area already well studied both experimentally and computationally. Of more interest are a number of broad functional roles which can also be correlated with sequence features. These include inhibition, biosynthesis, transcription and defence against bacteria. Despite substantial overlaps between these functions and their corresponding cellular compartments, we find clear differences in the sequence motifs used to predict some of these functions. For example, the presence of polyglutamine repeats appears to be linked more strongly to the "transcription" function than to the general "nuclear" function/location. CONCLUSION: We have developed a novel and useful approach for knowledge discovery in annotated sequence data. The technique is able to identify functionally important sequence features and does not require expert knowledge. By viewing protein function from a sequence perspective, the approach is also suitable for discovering unexpected links between biological processes, such as the recently discovered role of ubiquitination in transcription