159 research outputs found

    A Self-Organizing Algorithm for Modeling Protein Loops

    Get PDF
    Protein loops, the flexible short segments connecting two stable secondary structural units in proteins, play a critical role in protein structure and function. Constructing chemically sensible conformations of protein loops that seamlessly bridge the gap between the anchor points without introducing any steric collisions remains an open challenge. A variety of algorithms have been developed to tackle the loop closure problem, ranging from inverse kinematics to knowledge-based approaches that utilize pre-existing fragments extracted from known protein structures. However, many of these approaches focus on the generation of conformations that mainly satisfy the fixed end point condition, leaving the steric constraints to be resolved in subsequent post-processing steps. In the present work, we describe a simple solution that simultaneously satisfies not only the end point and steric conditions, but also chirality and planarity constraints. Starting from random initial atomic coordinates, each individual conformation is generated independently by using a simple alternating scheme of pairwise distance adjustments of randomly chosen atoms, followed by fast geometric matching of the conformationally rigid components of the constituent amino acids. The method is conceptually simple, numerically stable and computationally efficient. Very importantly, additional constraints, such as those derived from NMR experiments, hydrogen bonds or salt bridges, can be incorporated into the algorithm in a straightforward and inexpensive way, making the method ideal for solving more complex multi-loop problems. The remarkable performance and robustness of the algorithm are demonstrated on a set of protein loops of length 4, 8, and 12 that have been used in previous studies

    COMPUTER ALGORITHMS FOR A FUNDAMENTAL PROBLEM IN BIOINFORMATICS: THE "PROTEIN LOOP CLOSURE" PROBLEM

    Get PDF
    The "protein loop closure" problem is a fundamental problem in bioinformatics. Thebackbone of a protein is a kinematic chain. When using current techniques to try to get a "picture" of a protein, there are limitations. Whilst we can "see" most ofthe backbone, there are parts that current techniques do not show. We need to fill in any gaps in the "picture", as thebackbone defines thetype of protein. From computational perspective, the "protein loop closure" problem can be viewed as an inverse kinematics problem. Inverse kinematics can be bestthought of in terms of a robotic arm comprising of several links connected by joints

    Self-organizing neural networks for modeling robust 3D and 4D QSAR: application to dihydrofolate reductase inhibitors

    Get PDF
    We have used SOM and grid 3D and 4D QSAR schemes for modeling the activity of a series of dihydrofolate reductase inhibitors. Careful analysis of the performance and external predictivities proves that this method can provide an efficient inhibition model

    Two Decades of 4D-QSAR: A Dying Art or Staging a Comeback?

    Get PDF
    A key question confronting computational chemists concerns the preferable ligand geometry that fits complementarily into the receptor pocket. Typically, the postulated ‘bioactive’ 3D ligand conformation is constructed as a ‘sophisticated guess’ (unnecessarily geometry-optimized) mirroring the pharmacophore hypothesis—sometimes based on an erroneous prerequisite. Hence, 4D-QSAR scheme and its ‘dialects’ have been practically implemented as higher level of model abstraction that allows the examination of the multiple molecular conformation, orientation and protonation representation, respectively. Nearly a quarter of a century has passed since the eminent work of Hopfinger appeared on the stage; therefore the natural question occurs whether 4D-QSAR approach is still appealing to the scientific community? With no intention to be comprehensive, a review of the current state of art in the field of receptor-independent (RI) and receptor-dependent (RD) 4D-QSAR methodology is provided with a brief examination of the ‘mainstream’ algorithms. In fact, a myriad of 4D-QSAR methods have been implemented and applied practically for a diverse range of molecules. It seems that, 4D-QSAR approach has been experiencing a promising renaissance of interests that might be fuelled by the rising power of the graphics processing unit (GPU) clusters applied to full-atom MD-based simulations of the protein-ligand complexes

    An algorithm to enumerate all possible protein conformations verifying a set of distance constraints

    Get PDF
    International audienceBackground: The determination of protein structures satisfying distance constraints is an important problem in structural biology. Whereas the most common method currently employed is simulated annealing, there have been other methods previously proposed in the literature. Most of them, however, are designed to find one solution only. Results: In order to explore exhaustively the feasible conformational space, we propose here an interval Branch-and-Prune algorithm (iBP) to solve the Distance Geometry Problem (DGP) associated to protein structure determination. This algorithm is based on a discretization of the problem obtained by recursively constructing a search space having the structure of a tree, and by verifying whether the generated atomic positions are feasible or not by making use of pruning devices. The pruning devices used here are directly related to features of protein conformations. Conclusions: We described the new algorithm iBP to generate protein conformations satisfying distance constraints, that would potentially allows a systematic exploration of the conformational space. The algorithm iBP has been applied on three α-helical peptides

    Bioactive conformational generation of small molecules: A comparative analysis between force-field and multiple empirical criteria based methods

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Conformational sampling for small molecules plays an essential role in drug discovery research pipeline. Based on multi-objective evolution algorithm (MOEA), we have developed a conformational generation method called Cyndi in the previous study. In this work, in addition to Tripos force field in the previous version, Cyndi was updated by incorporation of MMFF94 force field to assess the conformational energy more rationally. With two force fields against a larger dataset of 742 bioactive conformations of small ligands extracted from PDB, a comparative analysis was performed between pure force field based method (FFBM) and multiple empirical criteria based method (MECBM) hybrided with different force fields.</p> <p>Results</p> <p>Our analysis reveals that incorporating multiple empirical rules can significantly improve the accuracy of conformational generation. MECBM, which takes both empirical and force field criteria as the objective functions, can reproduce about 54% (within 1Å RMSD) of the bioactive conformations in the 742-molecule testset, much higher than that of pure force field method (FFBM, about 37%). On the other hand, MECBM achieved a more complete and efficient sampling of the conformational space because the average size of unique conformations ensemble per molecule is about 6 times larger than that of FFBM, while the time scale for conformational generation is nearly the same as FFBM. Furthermore, as a complementary comparison study between the methods with and without empirical biases, we also tested the performance of the three conformational generation methods in MacroModel in combination with different force fields. Compared with the methods in MacroModel, MECBM is more competitive in retrieving the bioactive conformations in light of accuracy but has much lower computational cost.</p> <p>Conclusions</p> <p>By incorporating different energy terms with several empirical criteria, the MECBM method can produce more reasonable conformational ensemble with high accuracy but approximately the same computational cost in comparison with FFBM method. Our analysis also reveals that the performance of conformational generation is irrelevant to the types of force field adopted in characterization of conformational accessibility. Moreover, post energy minimization is not necessary and may even undermine the diversity of conformational ensemble. All the results guide us to explore more empirical criteria like geometric restraints during the conformational process, which may improve the performance of conformational generation in combination with energetic accessibility, regardless of force field types adopted.</p

    Machine learning in multiscale modeling and simulations of molecular systems

    Get PDF
    Collective variables (CVs) are low-dimensional representations of the state of a complex system, which help us rationalize molecular conformations and sample free energy landscapes with molecular dynamics simulations. However, identifying a representative set of CVs for a given system is far from obvious, and most often relies on physical intuition or partial knowledge about the systems. An inappropriate choice of CVs is misleading and can lead to inefficient sampling. Thus, there is a need for systematic approaches to effectively identify CVs. In recent years, machine learning techniques, especially nonlinear dimensionality reduction (NLDR), have shown their ability to automatically identify the most important collective behavior of molecular systems. These methods have been widely used to visualize molecular trajectories. However, in general they do not provide a differentiable mapping from high-dimensional configuration space to their low-dimensional representation, as required in enhanced sampling methods, and they cannot deal with systems with inherently nontrivial conformational manifolds. In the fist part of this dissertation, we introduce a methodology that, starting from an ensemble representative of molecular flexibility, builds smooth and nonlinear data-driven collective variables (SandCV) from the output of nonlinear manifold learning algorithms. We demonstrate the method with a standard benchmark molecule and show how it can be non-intrusively combined with off-the-shelf enhanced sampling methods, here the adaptive biasing force method. SandCV identifies the system's conformational manifold, handles out-of-manifold conformations by a closest point projection, and exactly computes the Jacobian of the resulting CVs. We also illustrate how enhanced sampling simulations with SandCV can explore regions that were poorly sampled in the original molecular ensemble. We then demonstrate that NLDR methods face serious obstacles when the underlying CVs present periodicities, e.g.~arising from proper dihedral angles. As a result, NLDR methods collapse very distant configurations, thus leading to misinterpretations and inefficiencies in enhanced sampling. Here, we identify this largely overlooked problem, and discuss possible approaches to overcome it. Additionally, we characterize flexibility of alanine dipeptide molecule and show that it evolves around a flat torus in four-dimensional space. In the final part of this thesis, we propose a novel method, atlas of collective variables, that systematically overcomes topological obstacles, ameliorates the geometrical distortions and thus allows NLDR techniques to perform optimally in molecular simulations. This method automatically partitions the configuration space and treats each partition separately. Then, it connects these partitions from the statistical mechanics standpoint.Las variables colectivas (CVs, acrónimo inglés de collective variables) son representaciones de baja dimensionalidad del estado de un sistema complejo, que nos ayudan a racionalizar conformaciones moleculares y muestrear paisajes de energía libre con simulaciones de dinámica molecular. Sin embargo, identificar unas CVs representativas para un sistema dado dista de ser evidente, por lo que a menudo se confía en la intuición física o en el conocimiento parcial de los sistemas bajo estudio. Una elección inadecuada de las CVs puede dar a interpretaciones engañosas y conducir a un muestreo ineficiente. Por lo tanto, hay una necesidad de desarrollar enfoques sistemáticos para identificar CVs de manera efectiva. En los últimos años, las técnicas de aprendizaje de máquina, especialmente las técnicas de reducción de dimensionalidad no lineal (NLDR, acrónimo inglés de nonlinear dimensionality reduction), han demostrado su capacidad para identificar automáticamente el comportamiento colectivo de sistemas moleculares. Estos métodos han sido ampliamente utilizados para visualizar las trayectorias moleculares. No obstante, en general las técnicas de NLDR no proporcionan una aplicación diferenciable de las configuraciones de alta dimensión a su representación de baja dimensión, condición que es requerida en los métodos mejorados de muestreo, por lo que no pueden hacer frente a sistemas con variedades conformacionales inherentemente no triviales. En la primer parte de esta tesis doctoral, introducimos una metodología que, a partir de un conjunto de conformaciones representativo de la flexibilidad del sistema molecular, construye variables colectivas suaves y no lineales basadas en datos (SandCV, acrónimo en inglés de smooth and nonlinear data-driven collective variables) obtenidos utilizando algoritmos de aprendizaje de variedades no lineales. Demostramos el método con una molécula de referencia estándar y mostramos cómo puede ser combinado de forma no intrusiva con métodos mejorados de muestreo ya existentes, aquí el método de la fuerza de sesgo adaptativa. SandCV identifica la variedad conformacional del sistema, maneja conformaciones fuera de la variedad por una proyección al punto más cercano de la variedad, y calcula exactamente el Jacobiano de las CVs resultantes. También ilustramos cómo simulaciones de muestreo mejoradas pueden, mediante SandCV, explorar regiones que fueron mal muestreadas en el conjunto molecular inicial. A continuación, demostramos que los métodos NLDR se enfrentan a serios obstáculos cuando las CVs subyacentes presentan periodicidad, por ejemplo, derivados de ángulos diedrales. Como consecuencia, los métodos NLDR colapsan configuraciones muy distantes, lo que conduce a interpretaciones erróneas y a ineficiencias en el muestreo mejorado. Aquí, identificamos este problema en gran medida pasado por alto, y discutimos los posibles enfoques para superarlo. Además, caracterizamos la flexibilidad de la molécula de dipéptido alanina y demostramos que evoluciona en torno a un toro plano en cuatro dimensiones. En la parte final de esta tesis, proponemos una metodología novedosa, atlas de variables colectivas, que supera sistemáticamente obstáculos topológicos, aminora las distorsiones geométricas y por lo tanto permite que las técnicas NLDR trabajen de manera óptima en simulaciones moleculares. Este método divide de forma automática el espacio configuracional y trata a cada partición por separado. Después, conecta estas particiones del punto de vista de mecánica estadística

    Conformations and 3D pharmacophore searching

    Get PDF
    Several methods have been developed and published over the past years to generate sets of diverse and pharmacologically relevant conformations which can be used within 3D pharmacophore search protocols to increase the number of meaningful hits of such experiments. This review gives some insights into the general challenges and problems in the area of 3D structure and conformation generation and focuses on some available and recent software technologies and approaches applicable for this task. The methods, algorithms and philosophies behind the approaches are briefly described and discussed and some examples on the performance and results obtained with the different tools are given
    corecore