18 research outputs found

    Improving Chemical Autoencoder Latent Space and Molecular De novo Generation Diversity with Heteroencoders

    Full text link
    Chemical autoencoders are attractive models as they combine chemical space navigation with possibilities for de-novo molecule generation in areas of interest. This enables them to produce focused chemical libraries around a single lead compound for employment early in a drug discovery project. Here it is shown that the choice of chemical representation, such as SMILES strings, has a large influence on the properties of the latent space. It is further explored to what extent translating between different chemical representations influences the latent space similarity to the SMILES strings or circular fingerprints. By employing SMILES enumeration for either the encoder or decoder, it is found that the decoder has the largest influence on the properties of the latent space. Training a sequence to sequence heteroencoder based on recurrent neural networks(RNNs) with long short-term memory cells (LSTM) to predict different enumerated SMILES strings from the same canonical SMILES string gives the largest similarity between latent space distance and molecular similarity measured as circular fingerprints similarity. Using the output from the bottleneck in QSAR modelling of five molecular datasets shows that heteroencoder derived vectors markedly outperforms autoencoder derived vectors as well as models built using ECFP4 fingerprints, underlining the increased chemical relevance of the latent space. However, the use of enumeration during training of the decoder leads to a markedly increase in the rate of decoding to a different molecules than encoded, a tendency that can be counteracted with more complex network architectures

    QSPR Studies on Aqueous Solubilities of Drug-Like Compounds

    Get PDF
    A rapidly growing area of modern pharmaceutical research is the prediction of aqueous solubility of drug-sized compounds from their molecular structures. There exist many different reasons for considering this physico-chemical property as a key parameter: the design of novel entities with adequate aqueous solubility brings many advantages to preclinical and clinical research and development, allowing improvement of the Absorption, Distribution, Metabolization, and Elimination/Toxicity profile and “screenability” of drug candidates in High Throughput Screening techniques. This work compiles recent QSPR linear models established by our research group devoted to the quantification of aqueous solubilities and their comparison to previous research on the topic

    Računarski modeli za predviđanje rastvorljivosti lekova

    Get PDF
    Aqueous solubility of a drug is a factor which can significantly influence its oral bioavailability, and can also affect the drug distribution in the body. Consideration of aqueous solubility in early stages of drug discovery and development is vital in reducing the incidence of late-stage drug development failures. The application of computational models for solubility prediction could provide the screening of combinatorial libraries, helping single-out potentially problematic and eliminate compounds with inadequate solubility. In addition to the prediction of solubility from chemical structure, the interpretation of such models can give an insight into structure-solubility relationships and can guide the optimization of structures in order to provide better solubility whilst retaining the activity of the investigated drugs. Development of such models is a complex process that requires consideration of numerous factors which can impact the final model's performance. Different solubility modeling approaches are discussed in this article. Despite intensive research on model development, prediction of the solubility of diverse drugs remains a challenging task. The quality of available experimental data used for modeling of solubility is increasingly recognized as one of the main causes for the limited reliability of many of the proposed models. Therefore, the full potential of the developed modeling methods will only be achieved by greater availability of reliable data obtained by same experimental methodology.Rastvorljivost leka u vodi je faktor koji može značajno da utiče na bioraspoloživost peroralno primenjenog leka, kao i na njegovu raspodelu u organizmu. Razmatranjem rastvorljivosti u ranim fazama otkrića i razvoja leka smanjuje se mogućnost neuspeha u daljem razvoju leka. Računarske metode za predviđanje rastvorljivosti lekova omogućavaju analizu kombinatornih baza podataka, identifikaciju potencijalno problematičnih jedinjenja i isključivanje onih čija je rastvorljivost neadekvatna. Pored predviđanja rastvorljivosti na osnovu hemijske strukture, analizom ovih modela moguće je detaljnije razjasniti odnose hemijske strukture i rastvorljivosti ispitivanih lekova i optimizovati strukture u cilju poboljšanja rastvorljivosti, pri čemu bi njihova aktivnost ostala nepromenjena. Razvoj ovakvih modela je kompleksan proces koji zahteva razmatranje velikog broja faktora koji mogu uticati na uspešnost predviđanja konačnog modela. U ovom radu su prikazani različiti pristupi koji se koriste u razvoju računarskih modela za predviđanje rastvorljivosti. I pored intenzivnog rada na razvoju ovih modela tokom protekle decenije, pouzdanost predviđanja rastvorljivosti lekova različitih struktura još uvek ostaje veliki izazov. Kvalitet dostupnih eksperimentalnih podataka koji se koriste u modelovanju rastvorljivosti se u sve većoj meri prepoznaje kao jedan od glavnih uzroka ograničene pouzdanosti većine do sada predloženih modela. Iskorišćenje punog potencijala razvijenih pristupa modelovanja rastvorljivosti uslovljeno je širom dostupnošću pouzdanih podataka za rastvorljivost određenih pod identičnim eksperimentalnim uslovima

    Molekulsko modeliranje odnosa strukturnih svojstava i aktivnosti molekula s pomoću programskog jezika Python (prvi dio)

    Get PDF
    Danas se količina podataka znatno povećava, a podatcima se pridaje sve veća vrijednost, kao i poznavanju njihove manipulacije i crpljenja vrijednih informacija. Poznat primjer crpljenja informacija je pretraživanje poznatih kemijskih spojeva i dizajniranje novih spojeva na osnovi znanja iz modela u svrhu istraživanja potencijalnih lijekova. Stoga je studentu kemije važno biti dobro pripremljen za trenutačno digitalno doba, gdje nije više dovoljno biti samo spretan u laboratoriju, nego je potrebno znati modelirati i raditi s podatcima. Ovaj rad pokriva osnove molekulskog modeliranja i QSAR-a te osnove rukovanja podatcima pomoću besplatnog programskog jezika Python i njegove biblioteke za molekulsko modeliranje RDKit. Ostale Pythonove biblioteke koje će se primjenjivati u radu su Pandas, za rukovanje i obradu svih vrsta podataka; statsmodels, Numpy, Scipy i SKLearn za matematičke i statističke operacije te linearnu algebru i Matplotlib i Seaborn za ispisivanje grafova. Programski jezik Python je sa svojim navedenim bibliotekama integriran u program Anaconda. Anaconda korisniku omogućuje jednostavnu primjenu i upravljanje bibliotekama te upotrebu sučelja Jupyter Notebook za programiranje i ispis grafičkih prikaza i rezultata analiza. U ovom, prvom dijelu rada analizirat će se problem predviđanja topljivosti u vodi na skupu organskih kemijskih spojeva pomoću univarijatne linearne regresije. Cilj rada je približiti kemičarima programiranje u jeziku Python, primjenu njegovih biblioteka i praktično rješavanje problema u molekulskom modeliranju

    Can small drugs predict the intrinsic aqueous solubility of ‘beyond Rule of 5’ big drugs?

    Get PDF
    The aim of the study was to explore to what extent small molecules (mostly from the Rule of 5 chemical space) can be used to predict the intrinsic aqueous solubility, S0, of big molecules from beyond the Rule of 5 (bRo5) space. It was demonstrated that the General Solubility Equation (GSE) and the Abraham Solvation Equation (ABSOLV) underpredict solubility in systematic but slightly ways. The Random Forest regression (RFR) method predicts solubility more accurately, albeit in the manner of a ‘black box.’ It was discovered that the GSE improves considerably in the case of big molecules when the coefficient of the log P term (octanol-water partition coefficient) in the equation is set to -0.4 instead of the traditional -1 value. The traditional GSE underpredicts solubility for molecules with experimental S0 < 50 µM. In contrast, the ABSOLV equation (trained with small molecules) underpredicts the solubility of big molecules in all cases tested. It was found that the errors in the ABSOLV-predicted solubilities of big molecules correlate linearly with the number of rotatable bonds, which suggests that flexibility may be an important factor in differentiating solubility of small from big molecules. Notably, most of the 31 big molecules considered have negative enthalpy of solution: these big molecules become less soluble with increasing temperature, which is compatible with ‘molecular chameleon’ behavior associated with intramolecular hydrogen bonding. The X‑ray structures of many of these molecules reveal void spaces in their crystal lattices large enough to accommodate many water molecules when such solids are in contact with aqueous media. The water sorbed into crystals suspended in aqueous solution may enhance solubility by way of intra-lattice solute-water interactions involving the numerous H‑bond acceptors in the big molecules studied. A ‘Solubility Enhancement–Big Molecules’ index was defined, which embodies many of the above findings.</p

    QSPR studies on aqueous solubilities of drug-like compounds

    Get PDF
    A rapidly growing area of modern pharmaceutical research is the prediction of aqueous solubility of drug-sized compounds from their molecular structures. There exist many different reasons for considering this physico-chemical property as a key parameter: the design of novel entities with adequate aqueous solubility brings many advantages to preclinical and clinical research and development, allowing improvement of the Absorption, Distribution, Metabolization, and Elimination/Toxicity profile and "screenability" of drug candidates in High Throughput Screening techniques. This work compiles recent QSPR linear models established by our research group devoted to the quantification of aqueous solubilities and their comparison to previous research on the topic.Facultad de Ciencias Exacta

    Machine learning in prediction of intrinsic aqueous solubility of drug‐like compounds: Generalization, complexity, or predictive ability?

    Get PDF
    We present a collection of publicly available intrinsic aqueous solubility data of 829 drug‐like compounds. Four different machine learning algorithms (random forests [RF], LightGBM, partial least squares, and least absolute shrinkage and selection operator [LASSO]) coupled with multistage permutation importance for feature selection and Bayesian hyperparameter optimization were used for the prediction of solubility based on chemical structural information. Our results show that LASSO yielded the best predictive ability on an external test set with a root mean square error (RMSE) (test) of 0.70 log points, an R2(test) of 0.80, and 105 features. Taking into account the number of descriptors as well, an RF model achieves the best balance between complexity and predictive ability with an RMSE(test) of 0.72 log points, an R2(test) of 0.78, and with only 17 features. On a more aggressive test set (principal component analysis [PCA]‐based split), better generalization was observed for the RF model. We propose a ranking score for choosing the best model, as test set performance is only one of the factors in creating an applicable model. The ranking score is a weighted combination of generalization, number of features, and test performance. Out of the two best learners, a consensus model was built exhibiting the best predictive ability and generalization with RMSE(test) of 0.67 log points and a R2(test) of 0.81
    corecore