Search CORE

18 research outputs found

Binary Classification of Aqueous Solubility Using Support Vector Machines with Reduction and Recombination Feature Selection

Improving Chemical Autoencoder Latent Space and Molecular De novo Generation Diversity with Heteroencoders

Author: Bjerrum Esben Jannik
Sattarov Boris
Publication venue: 'MDPI AG'
Publication date: 17/09/2018
Field of study

Chemical autoencoders are attractive models as they combine chemical space navigation with possibilities for de-novo molecule generation in areas of interest. This enables them to produce focused chemical libraries around a single lead compound for employment early in a drug discovery project. Here it is shown that the choice of chemical representation, such as SMILES strings, has a large influence on the properties of the latent space. It is further explored to what extent translating between different chemical representations influences the latent space similarity to the SMILES strings or circular fingerprints. By employing SMILES enumeration for either the encoder or decoder, it is found that the decoder has the largest influence on the properties of the latent space. Training a sequence to sequence heteroencoder based on recurrent neural networks(RNNs) with long short-term memory cells (LSTM) to predict different enumerated SMILES strings from the same canonical SMILES string gives the largest similarity between latent space distance and molecular similarity measured as circular fingerprints similarity. Using the output from the bottleneck in QSAR modelling of five molecular datasets shows that heteroencoder derived vectors markedly outperforms autoencoder derived vectors as well as models built using ECFP4 fingerprints, underlining the increased chemical relevance of the latent space. However, the use of enumeration during training of the decoder leads to a markedly increase in the rate of decoding to a different molecules than encoded, a tendency that can be counteracted with more complex network architectures

arXiv.org e-Print Archive

Directory of Open Access Journals

QSPR Studies on Aqueous Solubilities of Drug-Like Compounds

Author: Amidon
Antipin
Apostol
Balakin
Bhattachar
Castro
Charifson
Consonni
Consonni
Cramer
Delaney
Devillers
Draper
Duchowicz
Duchowicz
Duchowicz
Duchowicz
Eduardo Castro
Firpo
Free
Golbraikh
Goodwin
Hansch
Hansch
Hansch
Harary
Hawkins
Hou
Huuskonen
Johnson
Jorgensen
Karelson
Kariv
Katritzky
Katritzky
Kier
Klamt
Klopman
Klopman
Kuhne
Leardi
Lee
Lipinski
Liu
Livingstone
Lukovits
Malinowski
Martin
McFarland
Meylan
Meylan
Monge
Moriguchi
Morris
Myrdal
Nirmalakhandan
Noringer
Pablo Duchowicz
Peterson
Pinsuwan
Pogliani
Ran
Ran
Randic
Schneider
Silverman
Smith
Suzuki
Talevi
Tetko
Thompson
Todeschini
Trinajstic
Trinajstić
Vapanik
Veber
Viswanadhan
Walters
Wold
Worth
Yalkowsky
Yalkowsky
Yan
Yang
Yaws
Yoshida
Yu
Zupan
Publication venue: Molecular Diversity Preservation International (MDPI)
Publication date: 01/06/2009
Field of study

A rapidly growing area of modern pharmaceutical research is the prediction of aqueous solubility of drug-sized compounds from their molecular structures. There exist many different reasons for considering this physico-chemical property as a key parameter: the design of novel entities with adequate aqueous solubility brings many advantages to preclinical and clinical research and development, allowing improvement of the Absorption, Distribution, Metabolization, and Elimination/Toxicity profile and “screenability” of drug candidates in High Throughput Screening techniques. This work compiles recent QSPR linear models established by our research group devoted to the quantification of aqueous solubilities and their comparison to previous research on the topic

CiteSeerX

Crossref

Directory of Open Access Journals

PubMed Central

Računarski modeli za predviđanje rastvorljivosti lekova

Author: Erić Slavica
Kalinić Marko
Popović Aleksandar
Publication venue: Savez farmaceutskih udruženja Srbije, Beograd
Publication date: 01/01/2010
Field of study

Aqueous solubility of a drug is a factor which can significantly influence its oral bioavailability, and can also affect the drug distribution in the body. Consideration of aqueous solubility in early stages of drug discovery and development is vital in reducing the incidence of late-stage drug development failures. The application of computational models for solubility prediction could provide the screening of combinatorial libraries, helping single-out potentially problematic and eliminate compounds with inadequate solubility. In addition to the prediction of solubility from chemical structure, the interpretation of such models can give an insight into structure-solubility relationships and can guide the optimization of structures in order to provide better solubility whilst retaining the activity of the investigated drugs. Development of such models is a complex process that requires consideration of numerous factors which can impact the final model's performance. Different solubility modeling approaches are discussed in this article. Despite intensive research on model development, prediction of the solubility of diverse drugs remains a challenging task. The quality of available experimental data used for modeling of solubility is increasingly recognized as one of the main causes for the limited reliability of many of the proposed models. Therefore, the full potential of the developed modeling methods will only be achieved by greater availability of reliable data obtained by same experimental methodology.Rastvorljivost leka u vodi je faktor koji može značajno da utiče na bioraspoloživost peroralno primenjenog leka, kao i na njegovu raspodelu u organizmu. Razmatranjem rastvorljivosti u ranim fazama otkrića i razvoja leka smanjuje se mogućnost neuspeha u daljem razvoju leka. Računarske metode za predviđanje rastvorljivosti lekova omogućavaju analizu kombinatornih baza podataka, identifikaciju potencijalno problematičnih jedinjenja i isključivanje onih čija je rastvorljivost neadekvatna. Pored predviđanja rastvorljivosti na osnovu hemijske strukture, analizom ovih modela moguće je detaljnije razjasniti odnose hemijske strukture i rastvorljivosti ispitivanih lekova i optimizovati strukture u cilju poboljšanja rastvorljivosti, pri čemu bi njihova aktivnost ostala nepromenjena. Razvoj ovakvih modela je kompleksan proces koji zahteva razmatranje velikog broja faktora koji mogu uticati na uspešnost predviđanja konačnog modela. U ovom radu su prikazani različiti pristupi koji se koriste u razvoju računarskih modela za predviđanje rastvorljivosti. I pored intenzivnog rada na razvoju ovih modela tokom protekle decenije, pouzdanost predviđanja rastvorljivosti lekova različitih struktura još uvek ostaje veliki izazov. Kvalitet dostupnih eksperimentalnih podataka koji se koriste u modelovanju rastvorljivosti se u sve većoj meri prepoznaje kao jedan od glavnih uzroka ograničene pouzdanosti većine do sada predloženih modela. Iskorišćenje punog potencijala razvijenih pristupa modelovanja rastvorljivosti uslovljeno je širom dostupnošću pouzdanih podataka za rastvorljivost određenih pod identičnim eksperimentalnim uslovima

FarFar - Repository of the Faculty of Pharmacy, University of Belgrade

ADME Evaluation in Drug Discovery. 4. Prediction of Aqueous Solubility Based on Atom Contribution Approach

Author: Butina D.
EcElroy N. R.
Engkvist O.
Ghose A. K.
Halgren T. A
Hou T. J.
Hou T. J.
Hou T. J.
Huuskonen J
Huuskonen J.
Jain N.
K. Xia
Klopman G.
Klopman G.
Kühne R.
Lee Y.
Liu R.
McFarland J. W.
Nirmalakhandan N. N. P.
Ran Y.
Suzuki T
T. J. Hou
Tetko I. V.
W. Zhang
Wegner J. K.
Weinenger D. SMILES
X. J. Xu
Yalkowsky S. H
Yan A. X.
Publication venue: 'American Chemical Society (ACS)'
Publication date
Field of study

Crossref

Molekulsko modeliranje odnosa strukturnih svojstava i aktivnosti molekula s pomoću programskog jezika Python (prvi dio)

Author: Mario Lovrić
Publication venue: 'Croatian Society of Chemical Engineers/HDKI'
Publication date: 01/01/2018
Field of study

Danas se količina podataka znatno povećava, a podatcima se pridaje sve veća vrijednost, kao i poznavanju njihove manipulacije i crpljenja vrijednih informacija. Poznat primjer crpljenja informacija je pretraživanje poznatih kemijskih spojeva i dizajniranje novih spojeva na osnovi znanja iz modela u svrhu istraživanja potencijalnih lijekova. Stoga je studentu kemije važno biti dobro pripremljen za trenutačno digitalno doba, gdje nije više dovoljno biti samo spretan u laboratoriju, nego je potrebno znati modelirati i raditi s podatcima. Ovaj rad pokriva osnove molekulskog modeliranja i QSAR-a te osnove rukovanja podatcima pomoću besplatnog programskog jezika Python i njegove biblioteke za molekulsko modeliranje RDKit. Ostale Pythonove biblioteke koje će se primjenjivati u radu su Pandas, za rukovanje i obradu svih vrsta podataka; statsmodels, Numpy, Scipy i SKLearn za matematičke i statističke operacije te linearnu algebru i Matplotlib i Seaborn za ispisivanje grafova. Programski jezik Python je sa svojim navedenim bibliotekama integriran u program Anaconda. Anaconda korisniku omogućuje jednostavnu primjenu i upravljanje bibliotekama te upotrebu sučelja Jupyter Notebook za programiranje i ispis grafičkih prikaza i rezultata analiza. U ovom, prvom dijelu rada analizirat će se problem predviđanja topljivosti u vodi na skupu organskih kemijskih spojeva pomoću univarijatne linearne regresije. Cilj rada je približiti kemičarima programiranje u jeziku Python, primjenu njegovih biblioteka i praktično rješavanje problema u molekulskom modeliranju

Directory of Open Access Journals

Full-text Institutional Repository of the Ruđer Bošković Institute

HRČAK - Portal of Croatian Scientific and Professional Journals

Hrčak - Portal of scientific journals of Croatia

Can small drugs predict the intrinsic aqueous solubility of ‘beyond Rule of 5’ big drugs?

Author: Alex Avdeef
Manfred Kansy
Publication venue: 'International Association of Physical Chemists (IAPC)'
Publication date: 01/01/2020
Field of study

The aim of the study was to explore to what extent small molecules (mostly from the Rule of 5 chemical space) can be used to predict the intrinsic aqueous solubility, S0, of big molecules from beyond the Rule of 5 (bRo5) space. It was demonstrated that the General Solubility Equation (GSE) and the Abraham Solvation Equation (ABSOLV) underpredict solubility in systematic but slightly ways. The Random Forest regression (RFR) method predicts solubility more accurately, albeit in the manner of a ‘black box.’ It was discovered that the GSE improves considerably in the case of big molecules when the coefficient of the log P term (octanol-water partition coefficient) in the equation is set to -0.4 instead of the traditional -1 value. The traditional GSE underpredicts solubility for molecules with experimental S0 < 50 µM. In contrast, the ABSOLV equation (trained with small molecules) underpredicts the solubility of big molecules in all cases tested. It was found that the errors in the ABSOLV-predicted solubilities of big molecules correlate linearly with the number of rotatable bonds, which suggests that flexibility may be an important factor in differentiating solubility of small from big molecules. Notably, most of the 31 big molecules considered have negative enthalpy of solution: these big molecules become less soluble with increasing temperature, which is compatible with ‘molecular chameleon’ behavior associated with intramolecular hydrogen bonding. The X‑ray structures of many of these molecules reveal void spaces in their crystal lattices large enough to accommodate many water molecules when such solids are in contact with aqueous media. The water sorbed into crystals suspended in aqueous solution may enhance solubility by way of intra-lattice solute-water interactions involving the numerous H‑bond acceptors in the big molecules studied. A ‘Solubility Enhancement–Big Molecules’ index was defined, which embodies many of the above findings.</p

PubMed Central

HRČAK - Portal of Croatian Scientific and Professional Journals

Hrčak - Portal of scientific journals of Croatia

QSPR studies on aqueous solubilities of drug-like compounds

Author: Castro Eduardo A.
Duchowicz Pablo Román
Publication venue
Publication date: 08/07/2014
Field of study

A rapidly growing area of modern pharmaceutical research is the prediction of aqueous solubility of drug-sized compounds from their molecular structures. There exist many different reasons for considering this physico-chemical property as a key parameter: the design of novel entities with adequate aqueous solubility brings many advantages to preclinical and clinical research and development, allowing improvement of the Absorption, Distribution, Metabolization, and Elimination/Toxicity profile and "screenability" of drug candidates in High Throughput Screening techniques. This work compiles recent QSPR linear models established by our research group devoted to the quantification of aqueous solubilities and their comparison to previous research on the topic.Facultad de Ciencias Exacta

Servicio de Difusión de la Creación Intelectual

Machine learning in prediction of intrinsic aqueous solubility of drug‐like compounds: Generalization, complexity, or predictive ability?

Author: Kern Roman
Lovrić Mario
Lučić Bono
Pavlović Kristina
Spataru Adrian
Wong Ming Wah
Žuvela Petar
Publication venue: 'Wiley'
Publication date: 01/01/2021
Field of study

We present a collection of publicly available intrinsic aqueous solubility data of 829 drug‐like compounds. Four different machine learning algorithms (random forests [RF], LightGBM, partial least squares, and least absolute shrinkage and selection operator [LASSO]) coupled with multistage permutation importance for feature selection and Bayesian hyperparameter optimization were used for the prediction of solubility based on chemical structural information. Our results show that LASSO yielded the best predictive ability on an external test set with a root mean square error (RMSE) (test) of 0.70 log points, an R2(test) of 0.80, and 105 features. Taking into account the number of descriptors as well, an RF model achieves the best balance between complexity and predictive ability with an RMSE(test) of 0.72 log points, an R2(test) of 0.78, and with only 17 features. On a more aggressive test set (principal component analysis [PCA]‐based split), better generalization was observed for the RF model. We propose a ranking score for choosing the best model, as test set performance is only one of the factors in creating an applicable model. The ranking score is a weighted combination of generalization, number of features, and test performance. Out of the two best learners, a consensus model was built exhibiting the best predictive ability and generalization with RMSE(test) of 0.67 log points and a R2(test) of 0.81

Crossref

Full-text Institutional Repository of the Ruđer Bošković Institute

ScholarBank@NUS