256 research outputs found

    A series of PDB related databases for everyday needs

    Get PDF
    The Protein Data Bank (PDB) is the world-wide repository of macromolecular structure information. We present a series of databases that run parallel to the PDB. Each database holds one entry, if possible, for each PDB entry. DSSP holds the secondary structure of the proteins. PDBREPORT holds reports on the structure quality and lists errors. HSSP holds a multiple sequence alignment for all proteins. The PDBFINDER holds easy to parse summaries of the PDB file content, augmented with essentials from the other systems. PDB_REDO holds re-refined, and often improved, copies of all structures solved by X-ray. WHY_NOT summarizes why certain files could not be produced. All these systems are updated weekly. The data sets can be used for the analysis of properties of protein structures in areas ranging from structural genomics, to cancer biology and protein design

    Mining protein database using machine learning techniques

    No full text
    With a large amount of information relating to proteins accumulating in databases widely available online, it is of interest to apply machine learning techniques that, by extracting underlying statistical regularities in the data, make predictions about the functional and evolutionary characteristics of unseen proteins. Such predictions can help in achieving a reduction in the space over which experiment designers need to search in order to improve our understanding of the biochemical properties. Previously it has been suggested that an integration of features computable by comparing a pair of proteins can be achieved by an artificial neural network, hence predicting the degree to which they may be evolutionary related and homologous. We compiled two datasets of pairs of proteins, each pair being characterised by seven distinct features. We performed an exhaustive search through all possible combinations of features, for the problem of separating remote homologous from analogous pairs, we note that significant performance gain was obtained by the inclusion of sequence and structure information. We find that the use of a linear classifier was enough to discriminate a protein pair at the family level. However, at the superfamily level, to detect remote homologous pairs was a relatively harder problem. We find that the use of nonlinear classifiers achieve significantly higher accuracies. In this paper, we compare three different pattern classification methods on two problems formulated as detecting evolutionary and functional relationships between pairs of proteins, and from extensive cross validation and feature selection based studies quantify the average limits and uncertainties with which such predictions may be made. Feature selection points to a "knowledge gap" in currently available functional annotations. We demonstrate how the scheme may be employed in a framework to associate an individual protein with an existing family of evolutionarily related proteins

    Análise preliminar de um processo para identificação e alinhamento de seqüências homólogas para proteínas com estrutura resolvida.

    Get PDF
    O objetivo deste trabalho é apresentar e fazer uma avaliação preliminar de um processo alternativo, denominado Sequences Homologue to the Query (Structure-having) Sequence-SH2Q, para elaboração de alinhamentos múltiplos semelhantes à aqueles relatados no HSSP. O processo aqui apresentado baseia-se em programas de domínio público para busca em bases de dados de sequências -Blast (Altschul et al., 1990, 1997) e para alinhamento múltiplo de sequências -ClustalW (Thompson et al., 1994) O critério para avaliação do mesmo é o grau de similaridade entre as medidas de Entropia Relativa, quando comparadas com os mesmos valores relatados pelo HSSP.bitstream/CNPTIA/10041/1/comtec48.pdfAcesso em: 30 maio 2008

    A structural classification of protein-protein interactions for detection of convergently evolved motifs and for prediction of protein binding sites on sequence level

    Get PDF
    BACKGROUND: A long-standing challenge in the post-genomic era of Bioinformatics is the prediction of protein-protein interactions, and ultimately the prediction of protein functions. The problem is intrinsically harder, when only amino acid sequences are available, but a solution is more universally applicable. So far, the problem of uncovering protein-protein interactions has been addressed in a variety of ways, both experimentally and computationally. MOTIVATION: The central problem is: How can protein complexes with solved threedimensional structure be utilized to identify and classify protein binding sites and how can knowledge be inferred from this classification such that protein interactions can be predicted for proteins without solved structure? The underlying hypothesis is that protein binding sites are often restricted to a small number of residues, which additionally often are well-conserved in order to maintain an interaction. Therefore, the signal-to-noise ratio in binding sites is expected to be higher than in other parts of the surface. This enables binding site detection in unknown proteins, when homology based annotation transfer fails. APPROACH: The problem is addressed by first investigating how geometrical aspects of domain-domain associations can lead to a rigorous structural classification of the multitude of protein interface types. The interface types are explored with respect to two aspects: First, how do interface types with one-sided homology reveal convergently evolved motifs? Second, how can sequential descriptors for local structural features be derived from the interface type classification? Then, the use of sequential representations for binding sites in order to predict protein interactions is investigated. The underlying algorithms are based on machine learning techniques, in particular Hidden Markov Models. RESULTS: This work includes a novel approach to a comprehensive geometrical classification of domain interfaces. Alternative structural domain associations are found for 40% of all family-family interactions. Evaluation of the classification algorithm on a hand-curated set of interfaces yielded a precision of 83% and a recall of 95%. For the first time, a systematic screen of convergently evolved motifs in 102.000 protein-protein interactions with structural information is derived. With respect to this dataset, all cases related to viral mimicry of human interface bindings are identified. Finally, a library of 740 motif descriptors for binding site recognition - encoded as Hidden Markov Models - is generated and cross-validated. Tests for the significance of motifs are provided. The usefulness of descriptors for protein-ligand binding sites is demonstrated for the case of "ATP-binding", where a precision of 89% is achieved, thus outperforming comparable motifs from PROSITE. In particular, a novel descriptor for a P-loop variant has been used to identify ATP-binding sites in 60 protein sequences that have not been annotated before by existing motif databases

    Characterization of Protein Residue Surface Accessibility Using Sequence Homology

    Get PDF
    Residues present on the surface of the proteins are involved in a number of functions, especially in ligand-protein interactions, that are important for drug design. The residues present in the core of the protein provide stability to the protein and help in maintaining protein structure. Hence, there is a need for a binary characterization of protein residues based on their surface accessibility (surface accessible or buried). Such a classification can aid in the directed study of either residue type. A number of methods for the prediction of surface accessible protein residues have been proposed in the past. However, most of these methods are computationally complex and time consuming. In this thesis, we propose a simple method based on protein sequence homology parameters for the binary classification of protein residues as surface accessible or “buried”. To aid in the classification of protein residues, we chose three highly conservative homology-based parameter filter thresholds. The filter thresholds predicted and evaluated are: residue sequence entropy ≥0:15, fraction of strongly hydrophobic residues \u3c0:5 and fraction of small residues \u3c 0:15. The application of these filter thresholds to the residues, is expected to predict the “buried residues” with a better percentage accuracy than that of the surface accessible residues. These filter thresholds were selected from the frequency distributions and the aggregate correlation plots of the various homology-based parameters. An analysis of the plots suggests the presence of a strongly hydrophobic core between packing density 14 –22 where the presence of strongly hydrophobic residues is maximum and the presence of small and non-strongly hydrophobic residues is minimum. However, the densest portion of the protein (density 26 – 35) is indicated to be occupied by a combination of small and non-strongly hydrophobic residues with a negligible presence of strongly hydrophobic residues

    Protein structure analysis of mutations causing inheritable diseases. An e-Science approach with life scientist friendly interfaces

    Get PDF
    Contains fulltext : 89590.pdf (publisher's version ) (Open Access)BACKGROUND: Many newly detected point mutations are located in protein-coding regions of the human genome. Knowledge of their effects on the protein's 3D structure provides insight into the protein's mechanism, can aid the design of further experiments, and eventually can lead to the development of new medicines and diagnostic tools. RESULTS: In this article we describe HOPE, a fully automatic program that analyzes the structural and functional effects of point mutations. HOPE collects information from a wide range of information sources including calculations on the 3D coordinates of the protein by using WHAT IF Web services, sequence annotations from the UniProt database, and predictions by DAS services. Homology models are built with YASARA. Data is stored in a database and used in a decision scheme to identify the effects of a mutation on the protein's 3D structure and function. HOPE builds a report with text, figures, and animations that is easy to use and understandable for (bio)medical researchers. CONCLUSIONS: We tested HOPE by comparing its output to the results of manually performed projects. In all straightforward cases HOPE performed similar to a trained bioinformatician. The use of 3D structures helps optimize the results in terms of reliability and details. HOPE's results are easy to understand and are presented in a way that is attractive for researchers without an extensive bioinformatics background

    PDBselect 1992–2009 and PDBfilter-select

    Get PDF
    PDBselect (http://bioinfo.tg.fh-giessen.de/pdbselect/) is a list of representative protein chains with low mutual sequence identity selected from the protein data bank (PDB) to enable unbiased statistics. The list increased from 155 chains in 1992 to more than 4500 chains in 2009. PDBfilter-select is an online service to generate user-defined selections
    corecore