612 research outputs found
Leveraging Structural Flexibility to Predict Protein Function
Proteins are essentially versatile and flexible molecules and understanding protein function plays a fundamental role in understanding biological systems. Protein structure comparisons are widely used for revealing protein function. However,with rigidity or partial rigidity assumption, most existing comparison methods do not consider conformational flexibility in protein structures. To address this issue, this thesis seeks to develop algorithms for flexible structure comparisons to predict one specific aspect of protein function, binding specificity. Given conformational samples as flexibility representation, we focus on two predictive problems related to specificity: aggregate prediction and individual prediction.For aggregate prediction, we have designed FAVA (Flexible Aggregate Volumetric Analysis). FAVA is the first conformationally general method to compare proteins with identical folds but different specificities. FAVA is able to correctly categorize members of protein superfamilies and to identify influential amino acids that cause different specificities. A second method PEAP (Point-based Ensemble for Aggregate Prediction) employs ensemble clustering techniques from many base clustering to predict binding specificity. This method incorporates structural motions of functional substructures and is capable of mitigating prediction errors.For individual prediction, the first method is an atomic point representation for representing flexibilities in the binding cavity. This representation is able to predict binding specificity on each protein conformation with high accuracy, and it is the first to analyze maps of binding cavity conformations that describe proteins with different specificities. Our second method introduces a volumetric lattice representation. This representation localizes solvent-accessible shape of the binding cavity by computing cavity volume in each user-defined space. It proves to be more informative than point-based representations. Last but not least, we discuss a structure-independent representation. This representation builds a lattice model on protein electrostatic isopotentials. This is the first known method to predict binding specificity explicitly from the perspective of electrostatic fields.The methods presented in this thesis incorporate the variety of protein conformations into the analysis of protein ligand binding, and provide more views on flexible structure comparisons and structure-based function annotation of molecular design
Modeling regionalized volumetric differences in protein-ligand binding cavities
Identifying elements of protein structures that create differences in protein-ligand
binding specificity is an essential method for explaining the molecular mechanisms
underlying preferential binding. In some cases, influential mechanisms can be
visually identified by experts in structural biology, but subtler mechanisms, whose
significance may only be apparent from the analysis of many structures, are harder to
find. To assist this process, we present a geometric algorithm and two statistical
models for identifying significant structural differences in protein-ligand binding
cavities. We demonstrate these methods in an analysis of sequentially nonredundant
structural representatives of the canonical serine proteases and the enolase
superfamily. Here, we observed that statistically significant structural variations
identified experimentally established determinants of specificity. We also observed
that an analysis of individual regions inside cavities can reveal areas where small
differences in shape can correspond to differences in specificity
Theoretical-experimental study on protein-ligand interactions based on thermodynamics methods, molecular docking and perturbation models
The current doctoral thesis focuses on understanding the thermodynamic
events of protein-ligand interactions which have been of paramount importance from traditional Medicinal
Chemistry to Nanobiotechnology. Particular attention has been made on the application of state-of-the-art
methodologies to address thermodynamic studies of the protein-ligand interactions by integrating structure-based
molecular docking techniques, classical fractal approaches to solve protein-ligand complementarity problems,
perturbation models to study allosteric signal propagation, predictive nano-quantitative structure-toxicity relationship
models coupled with powerful experimental validation techniques. The contributions provided by this work could
open an unlimited horizon to the fields of Drug-Discovery, Materials Sciences, Molecular Diagnosis, and
Environmental Health Sciences
Classification of Protein-Binding Sites Using a Spherical Convolutional Neural Network
The analysis and comparison of protein-binding sites aid various applications in the drug discovery process, e.g., hit finding, drug repurposing, and polypharmacology. Classification of binding sites has been a hot topic for the past 30 years, and many different methods have been published. The rapid development of machine learning computational algorithms, coupled with the large volume of publicly available protein–ligand 3D structures, makes it possible to apply deep learning techniques in binding site comparison. Our method uses a cutting-edge spherical convolutional neural network based on the DeepSphere architecture to learn global representations of protein-binding sites. The model was trained on TOUGH-C1 and TOUGH-M1 data and validated with the ProSPECCTs datasets. Our results show that our model can (1) perform well in protein-binding site similarity and classification tasks and (2) learn and separate the physicochemical properties of binding sites. Lastly, we tested the model on a set of kinases, where the results show that it is able to cluster the different kinase subfamilies effectively. This example demonstrates the method’s promise for lead hopping within or outside a protein target, directly based on binding site information
Recommended from our members
On the origins of enzyme inhibitor selectivity and promiscuity: a case study of protein kinase binding to staurosporine
Protein kinases are important regulatory enzymes in signal transduction and in cell regulation. Understanding inhibition mechanisms of kinases is important for the further development of new therapies for cancer and inflammatory diseases. I have developed a statistical approach based on the Mantel test to find the relationship between the shapes of ATP binding sites and their affinities for inhibitors. My shape-based dendrogram shows clustering of the kinases based on similarity in shape. I investigate the pocket in terms of conservation of surrounding amino acids and atoms in order to identify the key determinants of ligand binding. I find that the most conserved regions are the main chain atoms in the hinge region and I show that the tetrahydropyran ring of staurosporine causes induced-fit of the glycine rich loop. I apply multiple linear regression to select distances measured between the distinctive parts of residues which correlate with the binding constants. This method allows me to understand the importance of the size of the gatekeeper residue and the closure between the first glycine of the GXGXXG motif and the aspartate of the DFG loop, which act together to promote tight binding to staurosporine. I also find that the greater the number of hydrogen bonds made by the kinase around the methylamine group of staurosporine, the tighter the binding to staurosporine. The website I have developed allows a better understanding of cross reactivity and may be useful for narrowing down the options for a synthetic strategy to design kinase inhibitors.This work was supported by the Royal Thai Government
Stochastic Derivative-Free Optimization of Noisy Functions
Optimization problems with numerical noise arise from the growing use of computer simulation of complex systems. This thesis concerns the development, analysis and applications of randomized derivative-free optimization (DFO) algorithms for noisy functions. The first contribution is the introduction of DFO-VASP, an algorithm for solving the problem of finding the optimal volumetric alignment of protein structures. Our method compensates for noisy, variable-time volume evaluations and warm-starts the search for globally optimal superposition. These techniques enable DFO-VASP to generate practical and accurate superpositions in a timely manner. The second algorithm, STARS, is aimed at solving general noisy optimization problems and employs a random search framework while dynamically adjusting the smoothing step-size using noise information. rate analysis of this algorithm is provided in both additive and multiplicative noise settings. STARS outperforms randomized zero-order methods in both additive and multiplicative settings and has an advantage of being insensitive to the level noise in terms of number of function evaluations and final objective value. The third contribution is a trust-region model-based algorithm STORM, that relies on constructing random models and estimates that are sufficiently accurate with high probability. This algorithm is shown to converge with probability one. Numerical experiments show that STORM outperforms other stochastic DFO methods in solving noisy functions
Computational Approaches to Drug Profiling and Drug-Protein Interactions
Despite substantial increases in R&D spending within the pharmaceutical industry, denovo drug design has become a time-consuming endeavour. High attrition rates led to a
long period of stagnation in drug approvals. Due to the extreme costs associated with
introducing a drug to the market, locating and understanding the reasons for clinical failure
is key to future productivity. As part of this PhD, three main contributions were made in
this respect. First, the web platform, LigNFam enables users to interactively explore
similarity relationships between ‘drug like’ molecules and the proteins they bind. Secondly,
two deep-learning-based binding site comparison tools were developed, competing with
the state-of-the-art over benchmark datasets. The models have the ability to predict offtarget interactions and potential candidates for target-based drug repurposing. Finally, the
open-source ScaffoldGraph software was presented for the analysis of hierarchical scaffold
relationships and has already been used in multiple projects, including integration into a
virtual screening pipeline to increase the tractability of ultra-large screening experiments.
Together, and with existing tools, the contributions made will aid in the understanding of
drug-protein relationships, particularly in the fields of off-target prediction and drug
repurposing, helping to design better drugs faster
Graph-Based Approaches to Protein StructureComparison - From Local to Global Similarity
The comparative analysis of protein structure data is a central aspect of structural bioinformatics. Drawing upon structural information allows the inference of function for unknown proteins even in cases where no apparent homology can be found on the sequence level.
Regarding the function of an enzyme, the overall fold topology might less important than the specific structural conformation of the catalytic site or the surface region of a protein, where the interaction with other molecules, such as binding partners, substrates and ligands occurs. Thus, a comparison of these regions is especially interesting for functional inference, since structural constraints imposed by the demands of the catalyzed biochemical function make them more likely to exhibit structural similarity. Moreover, the comparative analysis of protein binding sites is of special interest in pharmaceutical chemistry, in order to predict cross-reactivities and gain a deeper understanding of the catalysis mechanism.
From an algorithmic point of view, the comparison of structured data, or, more generally, complex objects, can be attempted based on different methodological principles. Global methods aim at comparing structures as a whole, while local methods transfer the problem to multiple comparisons of local substructures. In the context of protein structure analysis, it is not a priori clear, which strategy is more suitable.
In this thesis, several conceptually different algorithmic approaches have been developed, based on local, global and semi-global strategies, for the task of comparing protein structure data, more specifically protein binding pockets. The use of graphs for the modeling of protein structure data has a long standing tradition in structural bioinformatics. Recently, graphs have been used to model the geometric constraints of protein binding sites. The algorithms developed in this thesis are based on this modeling concept, hence, from a computer scientist's point of view, they can also be regarded as global, local and semi-global approaches to graph comparison. The developed algorithms were mainly designed on the premise to allow for a more approximate comparison of protein binding sites, in order to account for the molecular flexibility of the protein structures. A main motivation was to allow for the detection of more remote similarities, which are not apparent by using more rigid methods. Subsequently, the developed approaches were applied to different problems typically encountered in the field of structural bioinformatics in order to assess and compare their performance and suitability for different problems.
Each of the approaches developed during this work was capable of improving upon the performance of existing methods in the field. Another major aspect in the experiments was the question, which methodological concept, local, global or a combination of both, offers the most benefits for the specific task of protein binding site comparison, a question that is addressed throughout this thesis
- …