201 research outputs found

    High-Throughput Atomistic Modeling of Biomolecular Structure and Association

    Get PDF
    The reliability of many protein models arising from structure prediction methods is unclear. Here we present a method for absolute quality control of theoretical protein models, which can significantly contribute to their acceptance in the life-science research. We apply these methods to gain insight into the family of hydrophobins and modify them for increased cell adhesion to allow for the coating of implants. The novel proteins were shown to bind cells, while impeding bacterial adhesion

    Quantifying the Role of Water in Ligand-Protein Binding Processes

    Get PDF
    The aim of this thesis is to quantify the contributions of water thermodynamics to the binding free energy in protein-ligand complexes. Various computational tools were directly applied, implemented, benchmarked and discussed. An own implementation of the IFST formulation was developed to facilitate easy integration in workflows that are based on Schrödinger software. By applying the tool to a well-defined test set of congeneric ligand pairs, the potential of IFST for quantitative predictions in lead-optimization was assessed. Furthermore, FEP calculations were applied to an extended test set to validate if these simulations can accurately account for solvent displacement in ligand modifications. As a fast tool that has applications in virtual screening problems, we finally developed and validated a new scoring function that incorporates terms for protein and ligand desolvation. This resulted in total in three distinct studies, that all elucidated different aspects of water thermodynamics in CADD. These three studies are presented in the next section. In the conclusion, the results and implications of these studies are discussed jointly, as well with possible future developments. An additional study was focused on virtual screening and toxicity prediction at the androgen receptor, where distinguishing agonists and antagonists poses difficulties. We proposed and validated an approach based on MD simulations and ensemble docking to improve predictions of androgen agonists and antagonists

    Study of ligand-based virtual screening tools in computer-aided drug design

    Get PDF
    Virtual screening is a central technique in drug discovery today. Millions of molecules can be tested in silico with the aim to only select the most promising and test them experimentally. The topic of this thesis is ligand-based virtual screening tools which take existing active molecules as starting point for finding new drug candidates. One goal of this thesis was to build a model that gives the probability that two molecules are biologically similar as function of one or more chemical similarity scores. Another important goal was to evaluate how well different ligand-based virtual screening tools are able to distinguish active molecules from inactives. One more criterion set for the virtual screening tools was their applicability in scaffold-hopping, i.e. finding new active chemotypes. In the first part of the work, a link was defined between the abstract chemical similarity score given by a screening tool and the probability that the two molecules are biologically similar. These results help to decide objectively which virtual screening hits to test experimentally. The work also resulted in a new type of data fusion method when using two or more tools. In the second part, five ligand-based virtual screening tools were evaluated and their performance was found to be generally poor. Three reasons for this were proposed: false negatives in the benchmark sets, active molecules that do not share the binding mode, and activity cliffs. In the third part of the study, a novel visualization and quantification method is presented for evaluation of the scaffold-hopping ability of virtual screening tools.Siirretty Doriast

    Computational strategies to include protein flexibility in Ligand Docking and Virtual Screening

    Get PDF
    The dynamic character of proteins strongly influences biomolecular recognition mechanisms. With the development of the main models of ligand recognition (lock-and-key, induced fit, conformational selection theories), the role of protein plasticity has become increasingly relevant. In particular, major structural changes concerning large deviations of protein backbones, and slight movements such as side chain rotations are now carefully considered in drug discovery and development. It is of great interest to identify multiple protein conformations as preliminary step in a screening campaign. Protein flexibility has been widely investigated, in terms of both local and global motions, in two diverse biological systems. On one side, Replica Exchange Molecular Dynamics has been exploited as enhanced sampling method to collect multiple conformations of Lactate Dehydrogenase A (LDHA), an emerging anticancer target. The aim of this project was the development of an Ensemble-based Virtual Screening protocol, in order to find novel potent inhibitors. On the other side, a preliminary study concerning the local flexibility of Opioid Receptors has been carried out through ALiBERO approach, an iterative method based on Elastic Network-Normal Mode Analysis and Monte Carlo sampling. Comparison of the Virtual Screening performances by using single or multiple conformations confirmed that the inclusion of protein flexibility in screening protocols has a positive effect on the probability to early recognize novel or known active compounds

    Artificial Intelligence for Science in Quantum, Atomistic, and Continuum Systems

    Full text link
    Advances in artificial intelligence (AI) are fueling a new paradigm of discoveries in natural sciences. Today, AI has started to advance natural sciences by improving, accelerating, and enabling our understanding of natural phenomena at a wide range of spatial and temporal scales, giving rise to a new area of research known as AI for science (AI4Science). Being an emerging research paradigm, AI4Science is unique in that it is an enormous and highly interdisciplinary area. Thus, a unified and technical treatment of this field is needed yet challenging. This work aims to provide a technically thorough account of a subarea of AI4Science; namely, AI for quantum, atomistic, and continuum systems. These areas aim at understanding the physical world from the subatomic (wavefunctions and electron density), atomic (molecules, proteins, materials, and interactions), to macro (fluids, climate, and subsurface) scales and form an important subarea of AI4Science. A unique advantage of focusing on these areas is that they largely share a common set of challenges, thereby allowing a unified and foundational treatment. A key common challenge is how to capture physics first principles, especially symmetries, in natural systems by deep learning methods. We provide an in-depth yet intuitive account of techniques to achieve equivariance to symmetry transformations. We also discuss other common technical challenges, including explainability, out-of-distribution generalization, knowledge transfer with foundation and large language models, and uncertainty quantification. To facilitate learning and education, we provide categorized lists of resources that we found to be useful. We strive to be thorough and unified and hope this initial effort may trigger more community interests and efforts to further advance AI4Science

    Doctor of Philosophy

    Get PDF
    dissertationThe FDA antiviral drug, Tamiflu (oseltamivir) is the front-line antiviral drug for the fight against 2003 avian flu (H5N1) as well as, more recently, for the 2009 swine flu (H1N1pdm). The drug functions as a neuraminidase inhibitor that prevents the release of new virions. Unfortunately, there is emerging evidence that the neuraminidase mutations H274Y and N294S render oseltamivir ineffective against the H5N1 virus. Of greater concern is the growing likelihood of the emergence of similar oseltamivir-resistant strains of H1N1pdm. It is therefore critical to understand the mechanisms for mutation-induced drug resistance in the H5N1 and H1N1pdm flu viruses in order to develop new and effective therapies. As molecular dynamics (MD) simulations have become an important tool for the study of biological systems, this dissertation aims to employ MD simulations for computer-aided rational drug design. Specifically, different MD simulation techniques were utilized in the investigation of oseltamivir-resistant mechanisms of H5N1/H1N1pdm and for the development of new antiviral drugs. Chapter 1 is simply a general introduction to the whole thesis. Chapter 2 presents top-hits for H1N1pdm neuraminidase identified by virtual screening using ensemblebased docking technique, which incorporates protein flexibility into molecular docking. Next in Chapter 3, progress in the development of two related methodologies for calculation of solvation free energy, one called the Coupled Reference Interaction Site model-hyper-netted chain/molecular dynamics (RISM/MD) approach, and another called Molecular Mechanics Poisson?Boltzmann linear response approximation and surface area contributions (MMPB-LRA-SA), are presented. The methods are expected to be applicable to the lead refinement process since they provide more reliable results than the continuum model but are less computationally expensive than conventional MD methods. In Chapter 4, we discuss our observations based on drug-protein endpoint interactions on how the mutations H274Y and N294S induce oseltamivir resistance in neuraminidase N1 subtypes. However, since the two mutations are non-active-site, endpoint interactions alone cannot fully account for the drug resistance. In Chapter 5, we present our finding of the drug binding pathway through electrostatic surface potential and steered MD simulation. The results reveal a novel oseltamivir-resistant mechanism in which the mutations rupture the drug binding funnel, in conjuction with the findings reported in Chapter 4. Our study not only assists understanding of oseltamivir-resistance in neuraminidase N1 subtypes, but also bears several important consequences for the intelligent design of new inhibitors that can overcome the established resistance strain

    Computational study and rational design of pluriZymes

    Get PDF
    [eng] The increase in production over the last centuries has come at the expense of compromising the environment, urging the need to find solutions. Enzymes are the essential molecules that make life kinetically possible. In industry, enzymes can be a sustainable alternative to using inorganic catalysts. However, their low productivity, poor resistance to industrial conditions, and their cost limit their usage. Thus, enabling the tailoring of biocatalysts at will is crucial to expand their application. The advances in computational power, followed by the repertoire of modeling tools, are helping design the next generation of biocatalysts due to their lower costs and quickness. This thesis aims to develop a novel concept of biocatalysis, which could lower the expression costs of enzymes, named pluriZymes. PluriZymes are proteins with plural catalytic active sites where one (at least) of them is artificially designed. The type of introduced functional site along the thesis has been the hydrolase one due to its simplicity (only 3 catalytic residues needed) and does not need a cofactor. The studied systems were transaminases and esterases since they have several applications in industry, thus, being of broad interest. All computational designs were experimentally validated by our collaborators. The thesis' results include an in-one protease-esterase pluriZyme, a transaminase- esterase pluriZyme with potential applications for the pharmaceutical industry, the rational improvement of substrate promiscuity of hydrolase sites, and a new algorithm to facilitate the design of artificial active sites. Hence, this thesis proves the potential of pluriZymes for the next generation of biocatalysts toward a more sustainable society and the need for computational tools to develop them.[cat] L’augment en la producció dels darrers segles s’ha produït a canvi de comprometre el medi ambient, apressant la necessitat de trobar solucions. Els enzims són les molècules essencials que fan la vida possible cinèticament. En l'àmbit industrial, els enzims poden ser una alternativa sostenible a l’ús de catalitzadors inorgànics. No obstant això, la seva baixa productivitat, la poca resistència a les condicions industrials i el seu cost limiten el seu ús. Així doncs, la capacitat de poder adaptar els biocatalitzadors a voluntat és crucial per ampliar la seva aplicació. Els avenços en els recursos computacionals, seguits pel repertori d’eines de modelatge, estan ajudant a dissenyar la propera generació de biocatalitzadors pels seus baixos costs i la seva rapidesa. Aquesta tesi pretén desenvolupar un nou concepte en biocatàlisi, que podria reduir els costs d’expressió dels enzims, anomenat “pluriZyme”. Els “pluriZymes” són proteïnes amb múltiples llocs actius catalítics on almenys un d’ells està dissenyat artificialment. El tipus de lloc funcional introduït al llarg de la tesi ha estat la hidrolasa per la seva simplicitat (només calen 3 residus catalítics) i no necessita cofactor. Els sistemes estudiats van ser transaminases i esterases, ja que tenen diverses aplicacions a la indústria, per tant, són d'ampli interès. Tots els dissenys computacionals van ser validats experimentalment pels nostres col·laboradors. Els resultats de la tesi inclouen un “pluriZyme” proteasa-esterasa tot en un, un “pluriZyme” transaminasa-esterasa amb aplicacions potencials per a la indústria farmacèutica, la millora racional de la promiscuïtat de substrats de llocs hidrolasa i un nou algorisme per facilitar el disseny de llocs actius artificials. Per tant, aquesta tesi demostra el potencial de pluriZymes per a la propera generació de biocatalitzadors cap a una societat més sostenible i la necessitat d'eines computacionals per desenvolupar-los.[spa] El incremento en la producción de los últimos siglos se ha producido a expensas de comprometer el medioambiente, lo que ha acelerado la necesidad de encontrar soluciones. Las enzimas son moléculas esenciales para que la vida sea cinéticamente posible. En el ámbito industrial, las enzimas pueden ser una alternativa sostenible a los catalizadores inorgánicos. Sin embargo, su baja productividad, poca resistencia a las condiciones industriales y su costo limitan su uso. Por lo tanto, permitir la adaptación de biocatalizadores a voluntad es crucial para expandir su aplicación. Los avances en recursos computacionales, seguidos por el repertorio de herramientas de modelado, están ayudando a diseñar la próxima generación de biocatalizadores debido a su menor costo y rapidez. Esta tesis tiene como objetivo desarrollar un concepto novedoso en el campo de biocatálisis, que podría reducir los costes de expresión de las enzimas, denominado "pluriZymes". Los "pluriZymes" son proteínas con sitios activos catalíticos plurales donde al menos uno de ellos está diseñado artificialmente. El tipo de sitio funcional introducido a lo largo de la tesis ha sido el de hidrolasa debido a su sencillez (solo se necesitan 3 residuos catalíticos) y no necesita cofactor. Los sistemas estudiados fueron transaminasas y esterasas, ya que tienen varias aplicaciones en la industria, por lo tanto, son de amplio interés. Todos los diseños computacionales fueron validados experimentalmente por nuestros colaboradores. Los resultados de la tesis incluyen una proteasa- esterasa "pluriZyme" todo en uno, una transaminasa-esterasa "pluriZyme" con aplicaciones potenciales para la industria farmacéutica, la mejora racional de la promiscuidad de sustratos de sitios hidrolasa y un nuevo algoritmo para facilitar el diseño de sitios activos artificiales. Por lo tanto, esta tesis demuestra el potencial de pluriZymes para la próxima generación de biocatalizadores hacia una sociedad más sostenible y la necesidad de herramientas computacionales para desarrollarlos

    Enumeration, conformation sampling and population of libraries of peptide macrocycles for the search of chemotherapeutic cardioprotection agents

    Get PDF
    Peptides are uniquely endowed with features that allow them to perturb previously difficult to drug biomolecular targets. Peptide macrocycles in particular have seen a flurry of recent interest due to their enhanced bioavailability, tunability and specificity. Although these properties make them attractive hit-candidates in early stage drug discovery, knowing which peptides to pursue is non‐trivial due to the magnitude of the peptide sequence space. Computational screening approaches show promise in their ability to address the size of this search space but suffer from their inability to accurately interrogate the conformational landscape of peptide macrocycles. We developed an in‐silico compound enumerator that was tasked with populating a conformationally laden peptide virtual library. This library was then used in the search for cardio‐protective agents (that may be administered, reducing tissue damage during reperfusion after ischemia (heart attacks)). Our enumerator successfully generated a library of 15.2 billion compounds, requiring the use of compression algorithms, conformational sampling protocols and management of aggregated compute resources in the context of a local cluster. In the absence of experimental biophysical data, we performed biased sampling during alchemical molecular dynamics simulations in order to observe cyclophilin‐D perturbation by cyclosporine A and its mitochondrial targeted analogue. Reliable intermediate state averaging through a WHAM analysis of the biased dynamic pulling simulations confirmed that the cardio‐protective activity of cyclosporine A was due to its mitochondrial targeting. Paralleltempered solution molecular dynamics in combination with efficient clustering isolated the essential dynamics of a cyclic peptide scaffold. The rapid enumeration of skeletons from these essential dynamics gave rise to a conformation laden virtual library of all the 15.2 Billion unique cyclic peptides (given the limits on peptide sequence imposed). Analysis of this library showed the exact extent of physicochemical properties covered, relative to the bare scaffold precursor. Molecular docking of a subset of the virtual library against cyclophilin‐D showed significant improvements in affinity to the target (relative to cyclosporine A). The conformation laden virtual library, accessed by our methodology, provided derivatives that were able to make many interactions per peptide with the cyclophilin‐D target. Machine learning methods showed promise in the training of Support Vector Machines for synthetic feasibility prediction for this library. The synergy between enumeration and conformational sampling greatly improves the performance of this library during virtual screening, even when only a subset is used

    On deep generative modelling methods for protein-protein interaction

    Get PDF
    Proteins form the basis for almost all biological processes, identifying the interactions that proteins have with themselves, the environment, and each other are critical to understanding their biological function in an organism, and thus the impact of drugs designed to affect them. Consequently a significant body of research and development focuses on methods to analyse and predict protein structure and interactions. Due to the breadth of possible interactions and the complexity of structures, \textit{in sillico} methods are used to propose models of both interaction and structure that can then be verified experimentally. However the computational complexity of protein interaction means that full physical simulation of these processes requires exceptional computational resources and is often infeasible. Recent advances in deep generative modelling have shown promise in correctly capturing complex conditional distributions. These models derive their basic principles from statistical mechanics and thermodynamic modelling. While the learned functions of these methods are not guaranteed to be physically accurate, they result in a similar sampling process to that suggested by the thermodynamic principles of protein folding and interaction. However, limited research has been applied to extending these models to work over the space of 3D rotation, limiting their applicability to protein models. In this thesis we develop an accelerated sampling strategy for faster sampling of potential docking locations, we then address the rotational diffusion limitation by extending diffusion models to the space of SO(3)SO(3) and finally present a framework for the use of this rotational diffusion model to rigid docking of proteins
    corecore