95 research outputs found
RHYTHMâa server to predict the orientation of transmembrane helices in channels and membrane-coils
RHYTHM is a web server that predicts buried versus exposed residues of helical membrane proteins. Starting from a given protein sequence, secondary and tertiary structure information is calculated by RHYTHM within only a few seconds. The prediction applies structural information from a growing data base of precalculated packing files and evolutionary information from sequence patterns conserved in a representative dataset of membrane proteins (âPfam-domainsâ). The program uses two types of position specific matrices to account for the different geometries of packing in channels and transporters (âchannelsâ) or other membrane proteins (âmembrane-coilsâ). The output provides information on the secondary structure and topology of the protein and specifically on the contact type of each residue and its conservation. This information can be downloaded as a graphical file for illustration, a text file for analysis and statistics and a PyMOL file for modeling purposes. The server can be freely accessed at: URL: http://proteinformatics.de/rhyth
New computational methods for structural modeling protein-protein and protein-nucleic acid interactions
Programa de Doctorat en Biomedicina[eng] The study of the 3D structural details of protein-protein and protein-DNA interactions is essential to understand biomolecular functions at the molecular level. Given the difficulty of the structural determination of these complexes by experimental techniques, computational tools are becoming a powerful to increase the actual structural coverage of protein-protein and protein-DNA interactions. pyDock is one of these tools, which uses its scoring function to determine the quality of models generated by other tools. pyDock is usually combined with the model sampling methods FTDOCK or ZDOCK. This combination has shown a consistently good prediction performance in community-wide assessment experiments like CAPRI or CASP and has provided biological insights and insightful interpretation of experiments by modeling many biomolecular interactions of biomedical and biotechnological interest. This software combination has demonstrated good predictive performance in the blinded evaluation experiments CAPRI and CASP. It has provided biological insights by modeling many biomolecular interactions of biomedical and biotechnological interest.
Here, we describe a pyDock software update, which includes its adaptation to the newest python code, the capability of including cofactor and other small molecules, and an internal parallelization to use the computational resources more efficiently.
A strategy was designed to integrate the template-based docking and ab initio docking approaches by creating a new scoring function based on the pyDock scoring energy basis function and the TM-score measure of structural similarity of protein structures. This strategy was partially used for our participation in the 7th CAPRI, the 3rd CASP-CAPRI and the 4th CASP-CAPRI joint experiments. These experiments were challenging, as we needed to model protein-protein complexes, multimeric oligomerization proteins, protein-peptide, and protein-oligosaccharide interactions. Many proposed targets required the efficient integration of rigid-body docking, template-based modeling, flexible optimization, multi- parametric scoring, and experimental restraints. This was especially relevant for the multi- molecular assemblies proposed in the 3er and 4th CASP-CAPRI joint experiments.
In addition, a case study, in which electron transfer protein complexes were modelled to test the software new capabilities. Good results were achieved as the structural models obtained help explaining the differences in photosynthetic efficiency between red and green algae
Beta Atomic Contacts: Identifying Critical Specific Contacts in Protein Binding Interfaces
Specific binding between proteins plays a crucial role in molecular functions and biological processes. Protein binding interfaces and their atomic contacts are typically defined by simple criteria, such as distance-based definitions that only use some threshold of spatial distance in previous studies. These definitions neglect the nearby atomic organization of contact atoms, and thus detect predominant contacts which are interrupted by other atoms. It is questionable whether such kinds of interrupted contacts are as important as other contacts in protein binding. To tackle this challenge, we propose a new definition called beta (β) atomic contacts. Our definition, founded on the β-skeletons in computational geometry, requires that there is no other atom in the contact spheres defined by two contact atoms; this sphere is similar to the van der Waals spheres of atoms. The statistical analysis on a large dataset shows that β contacts are only a small fraction of conventional distance-based contacts. To empirically quantify the importance of β contacts, we design βACV, an SVM classifier with β contacts as input, to classify homodimers from crystal packing. We found that our βACV is able to achieve the state-of-the-art classification performance superior to SVM classifiers with distance-based contacts as input. Our βACV also outperforms several existing methods when being evaluated on several datasets in previous works. The promising empirical performance suggests that β contacts can truly identify critical specific contacts in protein binding interfaces. β contacts thus provide a new model for more precise description of atomic organization in protein quaternary structures than distance-based contacts
Studies on the relationship between single nucleotide polymorphisms and protein interactions
This thesis presents an analysis of the relationship between single nucleotide
polymorphism (SNPs) and proteinâprotein interactions. The aim of the thesis is to
investigate the distribution of non-synonymous single nucleotide polymorphism
(nsSNPs) in terms of their locations in the protein core, at the proteinâprotein interface
sites and on the other areas on the protein surface. The analysis used experimentally
verified human proteinâprotein interactions and nsSNPs from the UniProt humsavar
database. A further investigation was performed on a larger SNP dataset from the 1000
Genomes Project (1KGP). Both investigations identified a significant preference for
disease-causing SNPs to occur at the protein interface compared to other areas on the
protein surface. The three-dimensional structures of proteinâprotein interfaces were
examined in order to propose stereo-chemical explanations for the disease-causing
effect of nsSNPs in the humsavar dataset. In addition, three methodologies (i.e., usage of
SNP server, structural analysis and usage of GMAF) that could help identify pathogenic
variants were presented. Structural analysis was also performed on non-diseasecausing
SNPs in order to investigate their possible effects on proteinâprotein
interactions. The result showed that some of the previously classified non-diseasecausing
SNPs could potentially be disease-causing SNPs. The myVARIANT program was
developed. The program obtains SNPs from 1KGP, maps them to structures, evaluates
their distribution on structures and performs a structural analysis. In conclusion, the
thesis demonstrates the important role that proteinâprotein interactions play in disease
pathogenesis.Open Acces
Statistical Relational Learning for Proteomics: Function, Interactions and Evolution
In recent years, the field of Statistical Relational Learning (SRL) [1, 2] has
produced new, powerful learning methods that are explicitly designed to solve
complex problems, such as collective classification, multi-task learning and
structured output prediction, which natively handle relational data, noise,
and partial information. Statistical-relational methods rely on some First-
Order Logic as a general, expressive formal language to encode both the data
instances and the relations or constraints between them. The latter encode
background knowledge on the problem domain, and are use to restrict or bias
the model search space according to the instructions of domain experts. The
new tools developed within SRL allow to revisit old computational biology
problems in a less ad hoc fashion, and to tackle novel, more complex ones.
Motivated by these developments, in this thesis we describe and discuss the
application of SRL to three important biological problems, highlighting the
advantages, discussing the trade-offs, and pointing out the open problems.
In particular, in Chapter 3 we show how to jointly improve the outputs
of multiple correlated predictors of protein features by means of a very gen-
eral probabilistic-logical consistency layer. The logical layer â based on
grounding-specific Markov Logic networks [3] â enforces a set of weighted
first-order rules encoding biologically motivated constraints between the pre-
dictions. The refiner then improves the raw predictions so that they least
violate the constraints. Contrary to canonical methods for the prediction
of protein features, which typically take predicted correlated features as in-
puts to improve the output post facto, our method can jointly refine all
predictions together, with potential gains in overall consistency. In order
to showcase our method, we integrate three stand-alone predictors of corre-
lated features, namely subcellular localization (Loctree[4]), disulfide bonding
state (Disulfind[5]), and metal bonding state (MetalDetector[6]), in a way
that takes into account the respective strengths and weaknesses. The ex-
perimental results show that the refiner can improve the performance of the
underlying predictors by removing rule violations. In addition, the proposed
method is fully general, and could in principle be applied to an array of
heterogeneous predictions without requiring any change to the underlying
software.
In Chapter 4 we consider the multi-level proteinâprotein interaction (PPI)
prediction problem. In general, PPIs can be seen as a hierarchical process
occurring at three related levels: proteins bind by means of specific domains,
which in turn form interfaces through patches of residues. Detailed knowl-
edge about which domains and residues are involved in a given interaction has
extensive applications to biology, including better understanding of the bind-
ing process and more efficient drug/enzyme design. We cast the prediction
problem in terms of multi-task learning, with one task per level (proteins,
domains and residues), and propose a machine learning method that collec-
tively infers the binding state of all object pairs, at all levels, concurrently.
Our method is based on Semantic Based Regularization (SBR) [7], a flexible
and theoretically sound SRL framework that employs First-Order Logic con-
straints to tie the learning tasks together. Contrarily to most current PPI
prediction methods, which neither identify which regions of a protein actu-
ally instantiate an interaction nor leverage the hierarchy of predictions, our
method resolves the prediction problem up to residue level, enforcing con-
sistent predictions between the hierarchy levels, and fruitfully exploits the
hierarchical nature of the problem. We present numerical results showing
that our method substantially outperforms the baseline in several experi-
mental settings, indicating that our multi-level formulation can indeed lead
to better predictions.
Finally, in Chapter 5 we consider the problem of predicting drug-resistant
protein mutations through a combination of Inductive Logic Programming [8,
9] and Statistical Relational Learning. In particular, we focus on viral pro-
teins: viruses are typically characterized by high mutation rates, which allow
them to quickly develop drug-resistant mutations. Mining relevant rules from
mutation data can be extremely useful to understand the virus adaptation
mechanism and to design drugs that effectively counter potentially resistant
mutants. We propose a simple approach for mutant prediction where the in-
put consists of mutation data with drug-resistance information, either as sets
of mutations conferring resistance to a certain drug, or as sets of mutants with
information on their susceptibility to the drug. The algorithm learns a set
of relational rules characterizing drug-resistance, and uses them to generate
a set of potentially resistant mutants. Learning a weighted combination of
rules allows to attach generated mutants with a resistance score as predicted
by the statistical relational model and select only the highest scoring ones.
Promising results were obtained in generating resistant mutations for both
nucleoside and non-nucleoside HIV reverse transcriptase inhibitors. The ap-
proach can be generalized quite easily to learning mutants characterized by
more complex rules correlating multiple mutations
- âŚ