29 research outputs found
Evolution of interface binding strengths in simplified model of protein quaternary structure.
The self-assembly of proteins into protein quaternary structures is of fundamental importance to many biological processes, and protein misassembly is responsible for a wide range of proteopathic diseases. In recent years, abstract lattice models of protein self-assembly have been used to simulate the evolution and assembly of protein quaternary structure, and to provide a tractable way to study the genotype-phenotype map of such systems. Here we generalize these models by representing the interfaces as mutable binary strings. This simple change enables us to model the evolution of interface strengths, interface symmetry, and deterministic assembly pathways. Using the generalized model we are able to reproduce two important results established for real protein complexes: The first is that protein assembly pathways are under evolutionary selection to minimize misassembly. The second is that the assembly pathway of a complex mirrors its evolutionary history, and that both can be derived from the relative strengths of interfaces. These results demonstrate that the generalized lattice model offers a powerful new idealized framework to facilitate the study of protein self-assembly processes and their evolution
Application of computational methods for predicting protein interactions
Protein interactions with other proteins or small molecules are critical to most physiological processes. These interactions may be characterized experimentally, but this can be time consuming and expensive; computational methods for predicting how two proteins interact, or which regions of a protein are most favorable for binding, are thus valuable tools for understanding how proteins of interest function, and have applications in drug discovery and identifying proteins of therapeutic interest. The ClusPro and FTMap algorithms for docking or solvent mapping, respectively, model protein-protein and protein-small molecule interactions, and can be used to identify the most likely orientations of a protein complex or the regions on a protein surface with the greatest propensity for binding. Here we describe three applications of ClusPro and FTMap. ClusPro was used to develop a method for determining whether a protein-protein interface is biologically relevant, by docking the proteins and comparing the results to the given interface; a larger number of near-native structures--which have interfaces similar to that of the given complex--was found to correspond to a greater probability that an interface is biological. In another project, ClusPro was used to predict whether a mutation in a multimeric complex would trigger the formation of a supramolecular assembly, based on how often that mutated residue appeared in the interfaces of the docking results; if a mutation caused such a residue to be present in the docked interfaces more often, in comparison to those of the wild-type structure, then it was likely to induce self-assembly. FTMap was used to detect and analyze the druggability of potential allosteric sites in kinases, with mapping performed on all available kinase structures to identify and determine the potential binding affinity of binding hot spots located outside of the active site. Discrimination of proteins as dimers or monomers was implemented as an addition to the ClusPro server, ClusPro-DC, and the results of the druggability analysis of kinases were organized into an online resource, the Kinase Atlas.2019-02-20T00:00:00
Gradient descent optimization and deep reinforcement learning for protein-protein interaction
Reconstruction of the 3D structure of protein dimers is a crucial and challenging task. Although inter-protein contacts have been found useful in the modeling process of protein complexes, a few methods have been introduced to tackle the challenging quaternary structure prediction problem utilizing inter-chain contacts. We propose an optimization method based on gradient descent algorithm, called GD, to reconstruct the quaternary structures of protein complexes from inter-protein contacts. We test the performance of the GD method on both homodimers and heterodimers utilizing both true and predicted inter-protein contacts. GD has a superior performance than a Markov Chain Monte Carlo (MC), and a method based on Crystallography and NMR System (CNS). When native inter-chain contacts are provided as inputs, GD builds high quality models with TM-scores of more than 0.92 and interface RMSDs (I_RMSDs) of less than 1.64 A for both homodimers and heterodimers. Receiving the predicted inter-chain contacts as restraints, GD is able to generate models with a mean TM-score of 0.76 for 115 homodimers. Besides, for nearly half of the homodimers, GD reconstructs high quality models with TM-scores more than 0.9 using just the predicted inter-chain contacts to guide the modeling process. We also develop a self-learning algorithm based on reinforcement learning, named DRLComplex, to reconstruct protein dimers from true/predicted inter-protein contacts. We evaluate DRLComplex on two standard datasets including CASP-CAPRI dataste (28 homodimers), and Std32 (32 heterodimers). If native inter-chain contacts are provided, DRLComplex generates models with mean TM-score of 0.9895 and mean I_RMSD of 0.2197 for CASP-CAPRI dataset, and models having average TM-score of 0.9881, and average I_RMSD of 0.92 for Std32. Using predicted inter-chain contacts as restraints, DRLComplex builds models with overall average TM-scores of 0.73 and 0.76 for CASP-CvAPRI and Std32, successively. Moreover, utilizing predicted contacts, DRLComplex improves the mean I_RMSD of the reconstructed models for the Std32 dataset by 0.29 percent, 1.01 percent, 13.47 percent, and 8.69 percent over GD, MC, CNS, and Equidock (an end-to-end quaternary structure prediction method), respectively. In addition, the mean I_RMSD of the models predicted by DRLComplex for CASP-CAPRI dataset utilizing predicted contacts is 0.04, 3.94, and 4.07 lower than MC, CNS, and Equidock.Includes bibliographical references
New computational methods for structural modeling protein-protein and protein-nucleic acid interactions
Programa de Doctorat en Biomedicina[eng] The study of the 3D structural details of protein-protein and protein-DNA interactions is essential to understand biomolecular functions at the molecular level. Given the difficulty of the structural determination of these complexes by experimental techniques, computational tools are becoming a powerful to increase the actual structural coverage of protein-protein and protein-DNA interactions. pyDock is one of these tools, which uses its scoring function to determine the quality of models generated by other tools. pyDock is usually combined with the model sampling methods FTDOCK or ZDOCK. This combination has shown a consistently good prediction performance in community-wide assessment experiments like CAPRI or CASP and has provided biological insights and insightful interpretation of experiments by modeling many biomolecular interactions of biomedical and biotechnological interest. This software combination has demonstrated good predictive performance in the blinded evaluation experiments CAPRI and CASP. It has provided biological insights by modeling many biomolecular interactions of biomedical and biotechnological interest.
Here, we describe a pyDock software update, which includes its adaptation to the newest python code, the capability of including cofactor and other small molecules, and an internal parallelization to use the computational resources more efficiently.
A strategy was designed to integrate the template-based docking and ab initio docking approaches by creating a new scoring function based on the pyDock scoring energy basis function and the TM-score measure of structural similarity of protein structures. This strategy was partially used for our participation in the 7th CAPRI, the 3rd CASP-CAPRI and the 4th CASP-CAPRI joint experiments. These experiments were challenging, as we needed to model protein-protein complexes, multimeric oligomerization proteins, protein-peptide, and protein-oligosaccharide interactions. Many proposed targets required the efficient integration of rigid-body docking, template-based modeling, flexible optimization, multi- parametric scoring, and experimental restraints. This was especially relevant for the multi- molecular assemblies proposed in the 3er and 4th CASP-CAPRI joint experiments.
In addition, a case study, in which electron transfer protein complexes were modelled to test the software new capabilities. Good results were achieved as the structural models obtained help explaining the differences in photosynthetic efficiency between red and green algae
Recommended from our members
Modelling the evolution of biological complexity with a two-dimensional lattice self-assembly process
Self-assembling systems are prevalent across numerous scales of nature, lying at the heart of diverse physical and biological phenomena.
Individual protein subunits self-assembling into complexes is often a vital first step of biological processes.
Errors during protein assembly, due to mutations or misfolds, can have devastating effects and are responsible for an assortment of protein diseases, known as proteopathies.
With proteins exhibiting endless layers of complexity, building any all-encompassing model is unrealistic.
Coarse-grained models, despite not faithfully capturing every detail of the original system, have massive potential to assist understanding complex phenomenon.
A principal actor in self-assembly is the binding interactions between subunits, and so geometric constraints, polarity, kinetic forces, etc. can often be marginalised.
This work explores how self-assembly and its outcomes are inextricably tied to the involved interactions through the use of a two-dimensional lattice polyomino model.
%Armed with this tractable model, we can probe how dynamics acting on evolution are reflected in interaction properties.
First, this thesis addresses how the interaction characteristics of self-assembly building blocks determine what structures they form.
Specifically, if the same structures are consistently produced and remain finite in size.
Assembly graphs store subunit interaction information and are used in classifying these two properties, the determinism and boundedness respectively.
Arbitrary sets of building blocks are classified without the costly overhead of repeated stochastic assembling, improving both the analysis speed and accuracy.
Furthermore, assembly graphs naturally integrate combinatorial and graph techniques, enabling a wider range of future polyomino studies.
The second part narrows in on implications of nondeterministic assembly on interaction strength evolution.
Generalising subunit binding sites with mutable binary strings introduces such interaction strengths into the polyomino model.
Deterministic assemblies obey analytic expectations.
Conversely, interactions in nondeterministic assemblies rapidly diverge from equilibrium to minimise assembly inconsistency.
Optimal interaction strengths during assembly are also reflected in evolution.
Transitions between certain polyominoes are strongly forbidden when interaction strengths are misaligned.
The third aspect focuses on genetic duplication, an evolutionary event observed in organisms across all taxa.
Through polyomino evolutions, a duplication-heteromerisation pathway emerges as an efficient process.
This pathway exploits the advantages of both self-interactions and pairwise-interactions, and accelerates evolution by avoiding complexity bottlenecks.
Several simulation predictions are successfully validated against a large data set of protein complexes.
These results focus on coarse-grained models rather than quantified biological insight.
Despite this, they reinforce existing observations of protein complexes, as well as posing several new mechanisms for the evolution of biological complexity
Regulatory mechanisms and biological implications of protein complex assembly
Every living organism possesses a genome that contains within it a unique set of genes, a substantial
number of which encode proteins. Over the last 20 years, it has become apparent that organismal
complexity arises not from the specific complement of genes per se, but rather from interactions
between the gene products - in particular, interactions between proteins. As an inevitable consequence
of the crowded cellular interior, most protein-protein interactions are fleeting. However,
many are significantly more long-lived and result in stable protein complexes, in which the constituent
subunits are obligately dependent on their binding partners. Despite the abundance of
protein complexes and their critical importance to the cell, we currently have an incomplete understanding
of the mechanisms by which the cell ensures their correct assembly.
In the chapters that follow, I have attempted to improve our understanding of the regulatory
systems underlying assembly of protein complexes, and the way in which assembly as a whole affects
the behaviour of the cell. The thesis opens with an extended literature review covering the currently
available methods for characterising protein complexes. After this introduction, chapters 2-4 are
concerned with regulatory mechanisms and biological implications common to the assembly of all
protein complexes. Chapter 5 diverges from this work, and describes a family of evolutionarily
related proteins that regulate the behaviour of condensins and cohesins.
Bacterial and archaeal genomes contain far less non-coding DNA than eukaryotes, and coding
genes are often packaged into discrete units known as operons. The proteins encoded within
operons are usually functionally related, either through participation in metabolic pathways or as
subunits of heteromeric protein complexes. Since protein complexes assemble via ordered pathways,
we reasoned that there might be a signature of assembly order present in operons, the genes
of which are translated in sequential order. By comparing computationally predicted assembly
pathways with gene order in operons, we demonstrated this to be the case for the large majority
of operon-encoded complexes. Within operons, gene order follows assembly order, and adjacent
genes are substantially more likely to share a physical interface than those further apart. This work
demonstrates that efficient assembly of complexes is of sufficient importance as to have placed major
constraints on the evolution of operon gene order.
Following this study of bacterial operons, I present results from research investigating how patterns
of protein degradation in eukaryotes are influenced by the formation of protein complexes.
This showed that, whilst most proteins display exponential degradation kinetics, a sizeable minority
deviate considerably from this pattern, instead being more consistent with a two-step degradation
process. These proteins are predominantly members of heteromeric complexes, and their two-step
decay profiles can be explained using a model under which bound and unbound subunits are degraded at different rates. Within individual complexes, we find that non-exponentially decaying
proteins tend to form larger interfaces, assemble earlier, and show a higher degree of coexpression,
consistent with the idea that bound subunits are degraded at a slower rate than unbound or
peripheral subunits.
This model also explains the behaviour of proteins in aneuploid cells where one or more chromosomes
have been duplicated. In general, protein abundance scales with gene copy number, so
that the immediate effect of duplicating a chromosome is to double the abundance of the proteins
encoded on it. However, previous analyses of mass spectrometry data, as well as my own, have
shown that the abundance of many proteins on duplicated chromosomes is significantly attenuated
compared to what one would expect. These proteins, like those with non-exponential degradation
patterns, are very often members of larger complexes. Since the overall concentration of a protein
complex is constrained by that of its least abundant members, duplicating a single subunit
will predominantly increase the unbound, unstable fraction of that subunit. The results from this
work strongly suggest that the apparent attenuation of many proteins observed in aneuploid cells
is indeed a consequence of the failure of these proteins to assemble into complexes.
Finally, I present a study concerning an important, universally conserved family of protein complexes,
namely the SMC-kleisins. Two members of this family, condensin and cohesin, are responsible
for two hallmarks of eukaryotic chromatin organisation: the formation of condensed, linear
chromosomes, and sister chromatid cohesion during cell division. Unlike other SMC-kleisins,
condensin and cohesin possess a number of regulators containing HEAT repeats. By developing
a computational pipeline for searching and clustering paralogous repeat proteins, I was able to
demonstrate that these regulators form a distinct sub-family within the larger class of HEAT repeat
proteins. Furthermore, these regulators arose very early in eukaryotic history, hinting at a possible
role in the origin of modern condensins and cohesins
Circadian oscillator proteins across the kingdoms of life : Structural aspects 06 Biological Sciences 0601 Biochemistry and Cell Biology
Circadian oscillators are networks of biochemical feedback loops that generate 24-hour rhythms and control numerous biological processes in a range of organisms. These periodic rhythms are the result of a complex interplay of interactions among clock components. These components are specific to the organism but share molecular mechanisms that are similar across kingdoms. The elucidation of clock mechanisms in different kingdoms has recently started to attain the level of structural interpretation. A full understanding of these molecular processes requires detailed knowledge, not only of the biochemical and biophysical properties of clock proteins and their interactions, but also the three-dimensional structure of clockwork components. Posttranslational modifications (such as phosphorylation) and protein-protein interactions, have become a central focus of recent research, in particular the complex interactions mediated by the phosphorylation of clock proteins and the formation of multimeric protein complexes that regulate clock genes at transcriptional and translational levels. The three-dimensional structures for the cyanobacterial clock components are well understood, and progress is underway to comprehend the mechanistic details. However, structural recognition of the eukaryotic clock has just begun. This review serves as a primer as the clock communities move towards the exciting realm of structural biology
Assembly and Mechanism of Action of Sulfolobus solfataricus DNA Replication Complexes
DNA replication enzymes are essential for the maintenance and propagation of genetic information which precisely governs the growth and development of our cells. Aberrant DNA replication processes have been implicated in a wide variety of human diseases, most notably cancer, and therefore, mechanistic understanding of DNA replication processes is paramount for the development of human therapeutic agents. The study of the eukaryotic replication system however, is difficult, as the system contains a large number of enzymes and regulatory factors making assembly of these systems for in vitro study complicated. Thus, in order to gain insight into the workings of the eukaryotic replication system, several model systems are used, where the complexity of the replication pathways is not as great.
The DNA replication system from the thermophilic archaeon Sulfolobus solfataricus is a recently identified model with components sharing high levels of sequence homology to their eukaryotic counterparts. This system is ideal for gaining insight into the mechanistic workings of DNA replication which can be translated to the eukaryotic system. A key advantage to the study of thermophilic enzymes is in the ability to utilize reaction temperatures far lower than the physiological conditions for the organisms. This results in slower kinetics with no significant change in overall function, allowing an easier discernment of the enzyme’s mechanistic details.
I have contributed to the development of Sulfolobus solfataricus as a model system primarily through characterization of nucleotide transferase enzymes including DNA polymerases and primases. Firstly, I have determined that the DNA polymerase, SsoPolB3, possesses a low rate of synthesis and fidelity more similar to those involved in lesion bypass. Secondly, I characterized the assembly and mechanism of action SsoPolB1 replication holoenzyme which replicates in a distributive fashion similar to the eukaryotic Pold holoenzyme, and maintains stimulated replication rates through rapid re-recruitment of the polymerase to the processivity clamp. Finally, I discovered and characterized the interactions of a unique primosome complex formed between the bacterial like DnaG primase and eukaryotic like MCM helicase. In all, my thesis provides for a more thorough understanding of the interactions, kinetics, and dynamics occurring at the replication fork