20 research outputs found

    Autoencoders for dimensionality reduction in molecular dynamics: collective variable dimension, biasing and transition states

    Full text link
    The heat shock protein 90 (Hsp90) is a molecular chaperone that controls the folding and activation of client proteins using the free energy of ATP hydrolysis. The Hsp90 active site is in its N-terminal domain (NTD). Our goal is to characterize the dynamics of NTD using an autoencoder-learned collective variable (CV) in conjunction with adaptive biasing force (ABF) Langevin dynamics. Using dihedral analysis, we cluster all available experimental Hsp90 NTD structures into distinct native states. We then perform unbiased molecular dynamics (MD) simulations to construct a dataset that represents each state and use this dataset to train an autoencoder. Two autoencoder architectures are considered, with one and two hidden layers respectively, and bottlenecks of dimension kk ranging from 1 to 10. We demonstrate that the addition of an extra hidden layer does not significantly improve the performance, while it leads to complicated CVs that increases the computational cost of biased MD calculations. In addition, a 2D bottleneck can provide enough information of the different states, while the optimal bottleneck dimension is five. For the 2D bottleneck, the two-dimensional CV is directly used in biased MD simulations. For the 5D bottleneck, we perform an analysis of the latent CV space and identify the pair of CV coordinates that best separates the states of Hsp90. Interestingly, selecting a 2D CV out of the 5D CV space leads to better results than directly learning a 2D CV, and allows to observe transitions between native states when running free energy biased dynamics

    Etude théorique de l'hydrolyse acide des monoesters de phosphate

    No full text
    TOULOUSE3-BU Sciences (315552104) / SudocSudocFranceF

    Explaining and avoiding failures modes in goal-directed generation

    No full text
    Despite growing interest and success in automated in-silico molecular design, doubts remain regarding the ability of goal-directed generation algorithms to perform unbiased exploration of novel chemical spaces. A specific phenomenon has recently been highlighted: goal-directed generation guided with machine learning models produce molecules with high scores according to the optimization model, but low scores according to control models, even when trained on the same data distribution and the same target. In this work, we show that this worrisome behavior is actually due to issues with the predictive models and not the goal-directed generation algorithms. We show that with appropriate predictive models, this issue can be resolved, and molecules generated have high scores according to both the optimization and the control models

    Herbaria preserve plant microbiota responses to environmental changes.

    No full text
    International audienceInteraction between plants and their microbiota is a central theme to understand adaptation of plants to their environment. Considering herbaria as repositories of holobionts that preserved traces of ancient plant-associated microbial communities, we propose to explore these historical collections to evaluate the impact of long lasting global changes on plant-microbiota interactions. Glossary Ancient DNA (aDNA): DNA that remains for a certain period of time (up to several 10 000s of years) after the death of an organism. aDNA is subject to time-dependent degradation that includes fragmentation, single-strand breaks, and frequent cytosine deamination, especially at singlestrand extremities of the fragments. aDNA extraction and sequencing need specific protocols and equipment and contamination-proof laboratories. Anthropocene: a new geological era where the geological and environmental processes of planet earth are dominated by human activities

    Protein loops with multiple meta-stable conformations: a challenge for sampling and scoring methods

    No full text
    International audienceFlexible regions in proteins, such as loops, cannot be represented by a single conformation. Instead, conformational ensembles are needed to provide a more global picture. In this context, identifying statistically meaningful conforma-tions within an ensemble generated by loop sampling techniques remains an open problem. The difficulty is primarily related to the lack of structural data about these flexible regions. With the majority of structural data coming from X-ray crystallography and ignoring plasticity, the conception and evaluation of loop scoring methods is challenging. In this work, we compare the performance of various scoring methods on a set of 8 protein loops that are known to be flexible. The ability of each method to identify and select all of the known conformations is assessed, and the underlying energy landscapes are produced and projected to visualize the qualitative differences obtained when using the methods. Statistical potentials are found to provide considerable reliability despite their being designed to tradeoff accurac

    Contact Map Fingerprints of Protein-Ligand Unbinding Trajectories Reveal Mechanisms Determining Residence Times Computed from Scaled Molecular Dynamics

    No full text
    The binding kinetic properties of potential drugs may significantly influence their subsequent clinical efficacy. Predictions of these properties based on computer simulations provide a useful alternative to their expensive and time-demanding experimental counterparts, even at an early drug discovery stage.Herein, we perform Scaled Molecular Dynamics (ScaledMD) simulations on a set of 27 ligands of HSP90 belonging to more than 7 chemical series in order to estimate their relative residence time. We introduce two new techniques for the analysis and the classification of the simulated unbinding trajectories. The first technique, which helps in estimating the limits of the free energy well around the bound state and the second one, based on a new contact map fingerprint, allows the description and the comparison of the paths that lead to unbinding.Using these analyses, we find that ScaledMD’s relative residence time generally enables the identification of the slowest unbinders. We propose an explanation for the underestimation of the residence times of a subset of compounds and we investigate how the biasing in ScaledMD can affect the mechanistic insights that can be gained from the simulations.</div

    AI4DR: Development and implementation of an annotation system for high-throughput dose-response experiments

    No full text
    One of the common strategies to identify novel chemical matter in drug discovery consists in performing a High Throughput Screening (HTS). However, the large amount of data generated at the dose-response (DR) step of an HTS campaign requires a careful analysis to detect artifacts and correct erroneous datapoints before validating the experiments. This step which requires to review each DR experiment can be time consuming and prone to human errors or inconsistencies. AI4DR is a system that has been developed for the classification of DR curves based on a Convolutional Neural Network (CNN) acting on normalized images of the DR curves. AI4DR allows the annotation in minutes of thousands of curves among 14 categories to help the High Throughput Screening biologists in their analyses. Several categories are associated with active and inactive compounds, other categories correspond to features of interest such as the presence of noise, a weaker effect at high doses, or a suspiciously weak or strong slope at the inflexion point of the DR curves of actives. The classifier has been trained on an algorithmically generated dataset curated and refined by experts, tested using real screening campaigns and improved using thousands of annotations by experts. The solution is deployed using a MLFlow model server interfaced with the Genedata Screener data analysis software used by the end users. AI4DR improves the consistency, the robustness, and the speed of HTS data analysis as well as reducing the human effort to identify faster new medicines for patients

    Impact of applicability domains to generative artificial intelligence

    No full text
    Molecular generative artificial intelligence is drawing significant attention in the drug design community, with several experimentally validated proofs of concepts already published. Nevertheless, generative models are known for sometimes generating unrealistic, unsynthesizable or unstable structures. This calls for methods to constrain those algorithms to generate structures in reasonable portions of the chemical space. While the concept of applicability domains (AD) for predictive models is well studied, its counterpart for generative models is not yet defined. In this work, we examine empirically various possibilities and propose applicability domains suited for generative models. Using both public and internal datasets, we use state-of-the-art generative methods to generate novel structures that are predicted actives by a corresponding QSAR model, while constraining the generative model to stay within a given applicability domain. Our work looks at several applicability domain definitions, combining various criteria, such as structural similarity to the training set, similarity of physico-chemical properties, unwanted substructures, and Quantitative Estimate of Drug- Likeness (QED). We assess both from a qualitative and quantitative point of view the structures generated, and find that the applicability domain definitions have a strong influence on the chemical beauty of generated molecules. An extensive analysis of our results allows us to identify applicability domain definitions that are best suited for generating drug-like molecules with generative models. We anticipate that this work will help foster the adoption of generative models in an industrial context

    Molecular Mechanism of SSR128129E, an Extracellularly Acting, Small-Molecule, Allosteric Inhibitor of FGF Receptor Signaling (vol 23, pg 489, 2013)

    No full text
    © 2016 Elsevier Inc. (Cancer Cell 23, 489–501, April 15, 2013) In Figure 2F, the authors failed to highlight clearly that there is a split in the western blot (all rows) between the “0” and “0.1” condition. Even though all samples were run in the same experiment on the same blot, the image was split to remove samples that were simultaneously analyzed but irrelevant for this study. In the corrected Figure 2 F, the authors have now clearly separated both parts of the western blot. In Figure 5B, the image of the western blot showing total FGFR2 for the HEK293-FGFR2Y328D cell line was mistakenly replaced with the image of the western blot showing total FGFR2 for the HEK293-FGFR2WT cell line from Figure 5A. In the corrected Figure 5B, the authors have included the correct image of the western blot showing total FGFR2 for the HEK293-FGFR2Y328D cell line. The corrected Figure 2 and Figure 5 are included below. The authors apologize for any confusion these mistakes may have caused the readers.Correctionstatus: publishe
    corecore