31 research outputs found
Practical Aspects of Machine Learning for the Design-Synthesis-Purify-Assay Workflow
Chimia column: no abstrac
Magnetic effects of disulfide bridges: a density functional and semiempirical study
Density functional chemical shielding calculations are reported for methane and hydrogen disulfide dimers. The calculations show that the contributions of disulfide bridges to the chemical shielding of neighboring protons is sizable at distances that are frequently sampled in protein structures. A semiempirical model of the quantum chemical data is developed. It is shown that magnetic anisotropy effects of disulfide are poorly described by the McConnell equation, both qualitatively and quantitatively. In particular, the ratio of magnetic anisotropy contributions to shielding along and perpendicular to the magnetic anisotropy principal axis do not conform to the predictions of the McConnell equation, and magnetic anisotropy effects are not null along the magic angle axis. A sulfur-based model of the magnetic anisotropy of the disulfide is developed and shown to give much better agreement with the quantum chemical data
Intégration de la modélisation moléculaire et de la résonance magnétique nucléaire dans la conception rationnelle de ligands
La conception de ligands susceptibles de se lier avec une forte affinitĂ© Ă des rĂ©gions clefs de macromolĂ©cules biologiques afin de modifier leur activitĂ© est Ă la base des stratĂ©gies de dĂ©veloppement de molĂ©cules d'intĂ©rĂȘt thĂ©rapeutique. La conception de ligand basĂ©e sur la structure de la cible combine l'usage de l'information structurale sur la cible biologique et les principes physiques de l'interaction intermolĂ©culaire. La gĂ©nomique structurale a pour ambition la dĂ©termination de la structure d'un grand nombre de protĂ©ines, et donc d'un grand nombre de cibles thĂ©rapeutiques potentielles. Les approches de conception de ligand basĂ©e sur la structure peuvent efficacement ĂȘtre mises Ă profit dans cette perspective. Les aspects de conception rationnelle et de dĂ©termination de structures spatiales de macromolĂ©cules biologiques, sont abordĂ©s.Les travaux prĂ©sentĂ©s consistent en la prise en compte de la solvatation dans le classement de sites de liaison identifiĂ©s pour des fragments molĂ©culaires par le programme MCSS. Cette prise en compte permet d'obtenir un classement rĂ©aliste validĂ© par une Ă©tude RMN menĂ©e en parallĂšle et indĂ©pendamment au sein de la sociĂ©tĂ© Sanofi-SynthĂ©labo. Un algorithme original de groupement basĂ© sur les interactions de vdWaals ligand-cible a Ă©tĂ© dĂ©veloppĂ©. La robustesse de l'approche incluant la solvatation a Ă©tĂ© testĂ©e avec succĂšs sur un complexe ARN/aminoglycoside. L'aspect RMN est abordĂ© par l'Ă©tude thĂ©orique de l'effet des ponts disulfures sur le dĂ©placement chimique des protons par calculs quantiques afin de mettre au point un jeu d'Ă©quations simples reflĂ©tant les diffĂ©rentes contributions physiques influençant le dĂ©placement chimique. Enfin, les deux axes de dĂ©veloppement thĂ©orique sont utilisĂ©s conjointement, Ă nouveau dans le cadre d'un complexe ARN/antibiotique, pour Ă©tudier les possibilitĂ©s pratiques d'utilisation de l'information expĂ©rimentale de dĂ©placement chimique pour le tri des sites identifiĂ©s par MCSS.Designing ligands that bind with high affnity to biological macromolecules' key regions in order to modify their activity is a key process in the design of pharmaceutical molecules. Structure-based ligand design combines structural information regarding the biological target and intermolecular interactions' physical principles. Structural genomics is aimed at fast structure determination of a large number of proteins, and consequently, a great number of protential therapeutic targets. Structure-based ligand design approaches will be of great help in this perspective.Rational ligand design and biological macromolecules' structure determination are two aspects studied in this thesis. The work presented here consists in the inclusion of solvation effects in the ranking of the molecular fragments' binding modes identified by the program MCSS. Taking the solvation into account led to a realistic ranking of the binding modes that has been validated by a NMR study performed independently at Sanofi-SynthĂ©labo. An original clustering algorithm based on the van der Waals interactions between the fragments and the target has been developped and the whole procedure has been automatized and can be distributed on several comuters with one or more processors across a network. The method's robustness was successfully tested on an aminoglycoside/RNA complex. The NMR aspect of this work is approached through the theoretical study of the effect of disulfide bridges on proton chemical shift by quantum calculations in order to define somple equations that model the physical contributions influencing proton chemical shift. In a last part, the two axis developped here are used in the framework of an antibiotics/RNA complex to study the possible use of experimental chemical shift data to filter the binding modes identified by MCSS.STRASBOURG-Sc. et Techniques (674822102) / SudocSudocFranceF
Profile-QSAR and Surrogate AutoShim Protein-Family Modeling of Proteases
The 2D Profile-QSAR and 3D Surrogate AutoShim protein-family virtual screening methods were originally developed for kinases. They are the key components of an iterative medium-throughput screening alternative to expensive and time-consuming experimental high-throughput screening. Encouraged by the success with kinases, the S1-serine proteases were selected as a second protein family to tackle, based on the structural and SAR similarity among them, availability of structural and bioactivity data, and the current and future small-molecule drug discovery interest. Validation studies on 24 S1-serine protease assay datasets from 16 unique proteases gave positive results. Profile-QSAR gave a median R2ext = 0.60 for 24 assay datasets, and pairwise selectivity modeling on 60 protease pairs gave a median R2ext = 0.64, comparable to the performance for kinases. A 17-structure universal ensemble S1-serine protease surrogate receptor for Autoshim was developed from a collection of ~1500 X-ray structures. The predictive performance on 24 S1-serine protease assays was good, with a median R2ext = 0.41, but lower than was obtained for kinases. Analysis showed that the higher structural diversity of the protease structures, as well as lower dataset volume and fewer potent compounds, both contributed to the decreased predictive power. In a prospective virtual screening application, 32 compounds were selected from a 1.5 million archive and tested in a biochemical assay. 13 of the 32 compounds were active at IC50 †10 M, a 41% hit-rate. Three new scaffolds were identified which are being followed up with testing of additional analogues. A SAR similarity analysis for this target against 13 other proteases also indicated two potential protease targets which were positively and negatively correlated with the activity of the target protease
Prediction of Small-Molecule Developability Using Large-Scale In Silico ADMET Models.
Early in silico assessment of the potential of a series of compounds to deliver a drug is one of the major challenges in computer-assisted drug design. The goal is to identify the right chemical series of compounds out of a large chemical space to then subsequently prioritize the molecules with the highest potential to become a drug. Although multiple approaches to assess compounds have been developed over decades, the quality of these predictors is often not good enough and compounds that agree with the respective estimates are not necessarily druglike. Here, we report a novel deep learning approach that leverages large-scale predictions of âŒ100 ADMET assays to assess the potential of a compound to become a relevant drug candidate. The resulting score, which we termed bPK score, substantially outperforms previous approaches and showed strong discriminative performance on data sets where previous approaches did not
Medicinal Chemistry Database GDBMedChem
The generated database GDB17 enumerates 166.4 billion possible molecules up to 17 atoms of C, N, O, S and halogens following simple chemical stability and synthetic feasibility rules, however medicinal chemistry criteria are not taken into account. Here we applied rules inspired by medicinal chemistry to exclude problematic functional groups and complex molecules from GDB17, and sampled the resulting subset evenly across molecular size, stereochemistry and polarity to form GDBMedChem as a compact collection of 10 million small molecules.This collection has reduced complexity and better synthetic accessibility than the entire GDB17 but retains higher sp 3 - carbon fraction and natural product likeness scores compared to known drugs. GDBMedChem molecules are more diverse and very different from known molecules in terms of substructures and represent an unprecedented source of diversity for drug design. GDBMedChem is available for 3D-visualization, similarity searching and for download at http://gdb.unibe.ch.</div
Medicinal Chemistry Aware Database GDBMedChem
The generated database GDB17 enumerates 166.4â
billion possible molecules up to 17â
atoms of C, N, O, S and halogens following simple chemical stability and synthetic feasibility rules, however medicinal chemistry criteria are not taken into account. Here we applied rules inspired by medicinal chemistry to exclude problematic functional groups and complex molecules from GDB17, and sampled the resulting subset uniformly across molecular size, stereochemistry and polarity to form GDBMedChem as a compact collection of 10 million small molecules. This collection has reduced complexity and better synthetic accessibility than the entire GDB17 but retains higher sp3âcarbon fraction and natural product likeness scores compared to known drugs. GDBMedChem molecules are more diverse and very different from known molecules in terms of substructures and represent an unprecedented source of diversity for drug design. GDBMedChem is available for 3Dâvisualization, similarity searching and for download at http://gdb.unibe.ch
Drug Analogs from Fragment Based Long Short-Term Memory Generative Neural Networks
Several recent reports have shown that long short-term memory generative neural networks (LSTM) of the type used for grammar learning efficiently learn to write SMILES of drug-like compounds when trained with SMILES from a database of bioactive compounds such as ChEMBL and can later produce focused sets upon transfer learning with compounds of specific bioactivity profiles. Here we trained an LSTM using molecules taken either from ChEMBL, DrugBank, commercially available fragments, or from FDB-17 (a database of fragments up to 17 atoms) and performed transfer learning to a single known drug to obtain new analogs of this drug. We found that this approach readily generates hundreds of relevant and diverse new drug analogs and works best with training sets of around 40,000 compounds as simple as commercial fragments. These data suggest that fragment-based LSTM offer a promising method for new molecule generation
Drug Analogs from Fragment Based Long Short-Term Memory Generative Neural Networks
Several recent reports have shown that long short-term memory generative
neural networks (LSTM) of the type used for grammar learning efficiently learn
to write SMILES of drug-like compounds when trained with SMILES from a database
of bioactive compounds such as ChEMBL and can later produce focused sets upon
transfer learning with compounds of specific bioactivity profiles. Here we
trained an LSTM using molecules taken either from ChEMBL, DrugBank, commercially
available fragments, or from FDB-17 (a database of fragments up to 17 atoms) and
performed transfer learning to a single known drug to obtain new analogs of
this drug. We found that this approach readily generates hundreds of relevant and
diverse new drug analogs and works best with training sets of around 40,000
compounds as simple as commercial fragments. These data suggest that
fragment-based LSTM offer a promising method for new molecule generation.</p
Drug Analogs from Fragment-Based Long Short-Term Memory Generative Neural Networks
Several recent reports have shown that long short-term memory generative neural networks (LSTM) of the type used for grammar learning efficiently learn to write Simplified Molecular Input Line Entry System (SMILES) of druglike compounds when trained with SMILES from a database of bioactive compounds such as ChEMBL and can later produce focused sets upon transfer learning with compounds of specific bioactivity profiles. Here we trained an LSTM using molecules taken either from ChEMBL, DrugBank, commercially available fragments, or from FDB-17 (a database of fragments up to 17 atoms) and performed transfer learning to a single known drug to obtain new analogs of this drug. We found that this approach readily generates hundreds of relevant and diverse new drug analogs and works best with training sets of around 40,000 compounds as simple as commercial fragments. These data suggest that fragment-based LSTM offer a promising method for new molecule generation