95 research outputs found

    RHYTHM—a server to predict the orientation of transmembrane helices in channels and membrane-coils

    Get PDF
    RHYTHM is a web server that predicts buried versus exposed residues of helical membrane proteins. Starting from a given protein sequence, secondary and tertiary structure information is calculated by RHYTHM within only a few seconds. The prediction applies structural information from a growing data base of precalculated packing files and evolutionary information from sequence patterns conserved in a representative dataset of membrane proteins (‘Pfam-domains’). The program uses two types of position specific matrices to account for the different geometries of packing in channels and transporters (‘channels’) or other membrane proteins (‘membrane-coils’). The output provides information on the secondary structure and topology of the protein and specifically on the contact type of each residue and its conservation. This information can be downloaded as a graphical file for illustration, a text file for analysis and statistics and a PyMOL file for modeling purposes. The server can be freely accessed at: URL: http://proteinformatics.de/rhyth

    New computational methods for structural modeling protein-protein and protein-nucleic acid interactions

    Get PDF
    Programa de Doctorat en Biomedicina[eng] The study of the 3D structural details of protein-protein and protein-DNA interactions is essential to understand biomolecular functions at the molecular level. Given the difficulty of the structural determination of these complexes by experimental techniques, computational tools are becoming a powerful to increase the actual structural coverage of protein-protein and protein-DNA interactions. pyDock is one of these tools, which uses its scoring function to determine the quality of models generated by other tools. pyDock is usually combined with the model sampling methods FTDOCK or ZDOCK. This combination has shown a consistently good prediction performance in community-wide assessment experiments like CAPRI or CASP and has provided biological insights and insightful interpretation of experiments by modeling many biomolecular interactions of biomedical and biotechnological interest. This software combination has demonstrated good predictive performance in the blinded evaluation experiments CAPRI and CASP. It has provided biological insights by modeling many biomolecular interactions of biomedical and biotechnological interest. Here, we describe a pyDock software update, which includes its adaptation to the newest python code, the capability of including cofactor and other small molecules, and an internal parallelization to use the computational resources more efficiently. A strategy was designed to integrate the template-based docking and ab initio docking approaches by creating a new scoring function based on the pyDock scoring energy basis function and the TM-score measure of structural similarity of protein structures. This strategy was partially used for our participation in the 7th CAPRI, the 3rd CASP-CAPRI and the 4th CASP-CAPRI joint experiments. These experiments were challenging, as we needed to model protein-protein complexes, multimeric oligomerization proteins, protein-peptide, and protein-oligosaccharide interactions. Many proposed targets required the efficient integration of rigid-body docking, template-based modeling, flexible optimization, multi- parametric scoring, and experimental restraints. This was especially relevant for the multi- molecular assemblies proposed in the 3er and 4th CASP-CAPRI joint experiments. In addition, a case study, in which electron transfer protein complexes were modelled to test the software new capabilities. Good results were achieved as the structural models obtained help explaining the differences in photosynthetic efficiency between red and green algae

    Beta Atomic Contacts: Identifying Critical Specific Contacts in Protein Binding Interfaces

    Get PDF
    Specific binding between proteins plays a crucial role in molecular functions and biological processes. Protein binding interfaces and their atomic contacts are typically defined by simple criteria, such as distance-based definitions that only use some threshold of spatial distance in previous studies. These definitions neglect the nearby atomic organization of contact atoms, and thus detect predominant contacts which are interrupted by other atoms. It is questionable whether such kinds of interrupted contacts are as important as other contacts in protein binding. To tackle this challenge, we propose a new definition called beta (β) atomic contacts. Our definition, founded on the β-skeletons in computational geometry, requires that there is no other atom in the contact spheres defined by two contact atoms; this sphere is similar to the van der Waals spheres of atoms. The statistical analysis on a large dataset shows that β contacts are only a small fraction of conventional distance-based contacts. To empirically quantify the importance of β contacts, we design βACV, an SVM classifier with β contacts as input, to classify homodimers from crystal packing. We found that our βACV is able to achieve the state-of-the-art classification performance superior to SVM classifiers with distance-based contacts as input. Our βACV also outperforms several existing methods when being evaluated on several datasets in previous works. The promising empirical performance suggests that β contacts can truly identify critical specific contacts in protein binding interfaces. β contacts thus provide a new model for more precise description of atomic organization in protein quaternary structures than distance-based contacts

    Studies on the relationship between single nucleotide polymorphisms and protein interactions

    No full text
    This thesis presents an analysis of the relationship between single nucleotide polymorphism (SNPs) and protein–protein interactions. The aim of the thesis is to investigate the distribution of non-synonymous single nucleotide polymorphism (nsSNPs) in terms of their locations in the protein core, at the protein–protein interface sites and on the other areas on the protein surface. The analysis used experimentally verified human protein–protein interactions and nsSNPs from the UniProt humsavar database. A further investigation was performed on a larger SNP dataset from the 1000 Genomes Project (1KGP). Both investigations identified a significant preference for disease-causing SNPs to occur at the protein interface compared to other areas on the protein surface. The three-dimensional structures of protein–protein interfaces were examined in order to propose stereo-chemical explanations for the disease-causing effect of nsSNPs in the humsavar dataset. In addition, three methodologies (i.e., usage of SNP server, structural analysis and usage of GMAF) that could help identify pathogenic variants were presented. Structural analysis was also performed on non-diseasecausing SNPs in order to investigate their possible effects on protein–protein interactions. The result showed that some of the previously classified non-diseasecausing SNPs could potentially be disease-causing SNPs. The myVARIANT program was developed. The program obtains SNPs from 1KGP, maps them to structures, evaluates their distribution on structures and performs a structural analysis. In conclusion, the thesis demonstrates the important role that protein–protein interactions play in disease pathogenesis.Open Acces

    Statistical Relational Learning for Proteomics: Function, Interactions and Evolution

    Get PDF
    In recent years, the field of Statistical Relational Learning (SRL) [1, 2] has produced new, powerful learning methods that are explicitly designed to solve complex problems, such as collective classification, multi-task learning and structured output prediction, which natively handle relational data, noise, and partial information. Statistical-relational methods rely on some First- Order Logic as a general, expressive formal language to encode both the data instances and the relations or constraints between them. The latter encode background knowledge on the problem domain, and are use to restrict or bias the model search space according to the instructions of domain experts. The new tools developed within SRL allow to revisit old computational biology problems in a less ad hoc fashion, and to tackle novel, more complex ones. Motivated by these developments, in this thesis we describe and discuss the application of SRL to three important biological problems, highlighting the advantages, discussing the trade-offs, and pointing out the open problems. In particular, in Chapter 3 we show how to jointly improve the outputs of multiple correlated predictors of protein features by means of a very gen- eral probabilistic-logical consistency layer. The logical layer — based on grounding-specific Markov Logic networks [3] — enforces a set of weighted first-order rules encoding biologically motivated constraints between the pre- dictions. The refiner then improves the raw predictions so that they least violate the constraints. Contrary to canonical methods for the prediction of protein features, which typically take predicted correlated features as in- puts to improve the output post facto, our method can jointly refine all predictions together, with potential gains in overall consistency. In order to showcase our method, we integrate three stand-alone predictors of corre- lated features, namely subcellular localization (Loctree[4]), disulfide bonding state (Disulfind[5]), and metal bonding state (MetalDetector[6]), in a way that takes into account the respective strengths and weaknesses. The ex- perimental results show that the refiner can improve the performance of the underlying predictors by removing rule violations. In addition, the proposed method is fully general, and could in principle be applied to an array of heterogeneous predictions without requiring any change to the underlying software. In Chapter 4 we consider the multi-level protein–protein interaction (PPI) prediction problem. In general, PPIs can be seen as a hierarchical process occurring at three related levels: proteins bind by means of specific domains, which in turn form interfaces through patches of residues. Detailed knowl- edge about which domains and residues are involved in a given interaction has extensive applications to biology, including better understanding of the bind- ing process and more efficient drug/enzyme design. We cast the prediction problem in terms of multi-task learning, with one task per level (proteins, domains and residues), and propose a machine learning method that collec- tively infers the binding state of all object pairs, at all levels, concurrently. Our method is based on Semantic Based Regularization (SBR) [7], a flexible and theoretically sound SRL framework that employs First-Order Logic con- straints to tie the learning tasks together. Contrarily to most current PPI prediction methods, which neither identify which regions of a protein actu- ally instantiate an interaction nor leverage the hierarchy of predictions, our method resolves the prediction problem up to residue level, enforcing con- sistent predictions between the hierarchy levels, and fruitfully exploits the hierarchical nature of the problem. We present numerical results showing that our method substantially outperforms the baseline in several experi- mental settings, indicating that our multi-level formulation can indeed lead to better predictions. Finally, in Chapter 5 we consider the problem of predicting drug-resistant protein mutations through a combination of Inductive Logic Programming [8, 9] and Statistical Relational Learning. In particular, we focus on viral pro- teins: viruses are typically characterized by high mutation rates, which allow them to quickly develop drug-resistant mutations. Mining relevant rules from mutation data can be extremely useful to understand the virus adaptation mechanism and to design drugs that effectively counter potentially resistant mutants. We propose a simple approach for mutant prediction where the in- put consists of mutation data with drug-resistance information, either as sets of mutations conferring resistance to a certain drug, or as sets of mutants with information on their susceptibility to the drug. The algorithm learns a set of relational rules characterizing drug-resistance, and uses them to generate a set of potentially resistant mutants. Learning a weighted combination of rules allows to attach generated mutants with a resistance score as predicted by the statistical relational model and select only the highest scoring ones. Promising results were obtained in generating resistant mutations for both nucleoside and non-nucleoside HIV reverse transcriptase inhibitors. The ap- proach can be generalized quite easily to learning mutants characterized by more complex rules correlating multiple mutations
    • …
    corecore