11 research outputs found
Capturing the Crystal: Prediction of Enthalpy of Sublimation, Crystal Lattice Energy, and Melting Points of Organic Compounds
Accurate computational prediction of melting points and
aqueous
solubilities of organic compounds would be very useful but is notoriously
difficult. Predicting the lattice energies of compounds is key to
understanding and predicting their melting behavior and ultimately
their solubility behavior. We report robust, predictive, quantitative
structure–property relationship (QSPR) models for enthalpies
of sublimation, crystal lattice energies, and melting points for a
very large and structurally diverse set of small organic compounds.
Sparse Bayesian feature selection and machine learning methods were
employed to select the most relevant molecular descriptors for the
model and to generate parsimonious quantitative models. The final
enthalpy of sublimation model is a four-parameter multilinear equation
that has an r<sup>2</sup> value of 0.96 and an average absolute error
of 7.9 ± 0.3 kJ.mol<sup>–1</sup>. The melting point model
can predict this property with a standard error of 45° ±
1 K and r<sup>2</sup> value of 0.79. Given the size and diversity
of the training data, these conceptually transparent and accurate
models can be used to predict sublimation enthalpy, lattice energy,
and melting points of organic compounds in general
Capturing the Crystal: Prediction of Enthalpy of Sublimation, Crystal Lattice Energy, and Melting Points of Organic Compounds
Accurate computational prediction of melting points and
aqueous
solubilities of organic compounds would be very useful but is notoriously
difficult. Predicting the lattice energies of compounds is key to
understanding and predicting their melting behavior and ultimately
their solubility behavior. We report robust, predictive, quantitative
structure–property relationship (QSPR) models for enthalpies
of sublimation, crystal lattice energies, and melting points for a
very large and structurally diverse set of small organic compounds.
Sparse Bayesian feature selection and machine learning methods were
employed to select the most relevant molecular descriptors for the
model and to generate parsimonious quantitative models. The final
enthalpy of sublimation model is a four-parameter multilinear equation
that has an r<sup>2</sup> value of 0.96 and an average absolute error
of 7.9 ± 0.3 kJ.mol<sup>–1</sup>. The melting point model
can predict this property with a standard error of 45° ±
1 K and r<sup>2</sup> value of 0.79. Given the size and diversity
of the training data, these conceptually transparent and accurate
models can be used to predict sublimation enthalpy, lattice energy,
and melting points of organic compounds in general
Aqueous Solubility Prediction: Do Crystal Lattice Interactions Help?
Aqueous
solubility is a very important physical property of small
molecule drugs and drug candidates but also one of the most difficult
to predict accurately. Aqueous solubility plays a major role in drug
delivery and pharmacokinetics. It is believed that crystal lattice
interactions are important in solubility and that including them in
solubility models should improve the accuracy of the models. We used
calculated values for lattice energy and sublimation enthalpy of organic
molecules as descriptors to determine whether these would improve
the accuracy of the aqueous solubility models. Multiple linear regression
employing an expectation maximization algorithm and a sparse prior
(MLREM) method and a nonlinear Bayesian regularized artificial neural
network with a Laplacian prior (BRANNLP) were used to derive optimal
predictive models of aqueous solubility of a large and highly diverse
data set of 4558 organic compounds over a normal ambient temperature
range of 20–30 °C (293–303 K). A randomly selected
test set and compounds from a solubility challenge were used to estimate
the predictive ability of the models. The BRANNLP method showed the
best statistical results with squared correlation coefficients of
0.90 and standard errors of 0.645–0.665 logÂ(<i>S</i>) for training and test sets. Surprisingly, including descriptors
that captured crystal lattice interactions did not significantly improve
the quality of these aqueous solubility models
Beware of <i>R</i><sup>2</sup>: Simple, Unambiguous Assessment of the Prediction Accuracy of QSAR and QSPR Models
The statistical metrics
used to characterize the external predictivity
of a model, i.e., how well it predicts the properties of an independent
test set, have proliferated over the past decade. This paper clarifies
some apparent confusion over the use of the coefficient of determination, <i>R</i><sup>2</sup>, as a measure of model fit and predictive
power in QSAR and QSPR modeling. <i>R</i><sup>2</sup> (or <i>r</i><sup>2</sup>) has been used in various contexts in the
literature in conjunction with training and test data for both ordinary
linear regression and regression through the origin as well as with
linear and nonlinear regression models. We analyze the widely adopted
model fit criteria suggested by Golbraikh and Tropsha (J. Mol. Graphics Modell. 2002, 20, 269−276) in a strict statistical manner. Shortcomings
in these criteria are identified, and a clearer and simpler alternative
method to characterize model predictivity is provided. The intent
is not to repeat the well-documented arguments for model validation
using test data but rather to guide the application of <i>R</i><sup>2</sup> as a model fit statistic. Examples are used to illustrate
both correct and incorrect uses of <i>R</i><sup>2</sup>.
Reporting the root-mean-square error or equivalent measures of dispersion,
which are typically of more practical importance than <i>R</i><sup>2</sup>, is also encouraged, and important challenges in addressing
the needs of different categories of users such as computational chemists,
experimental scientists, and regulatory decision support specialists
are outlined
Modeling the Influence of Fatty Acid Incorporation on Mesophase Formation in Amphiphilic Therapeutic Delivery Systems
Dispersed
amphiphile-fatty acid systems are of great interest in
drug delivery and gene therapies because of their potential for triggered
release of their payload. The mesophase behavior of these systems
is extremely complex and is affected by environmental factors such
as drug loading, percentage and nature of incorporated fatty acids,
temperature, pH, and so forth. It is important to study phase behavior
of amphiphilic materials as the mesophases directly influence the
release rate of the incorporated drugs. We describe a robust machine
learning method for predicting the phase behavior of these systems.
We have developed models for each mesophase that simultaneous and
reliably model the effects of amphiphile and fatty acid structure,
concentration, and temperature and that make accurate predictions
of these mesophases for conditions not used to train the models
Quantitative Structure–Property Relationship Modeling of Diverse Materials Properties
Quantitative Structure–Property Relationship Modeling of Diverse Materials Propertie
Predicting the Complex Phase Behavior of Self-Assembling Drug Delivery Nanoparticles
Amphiphilic lyotropic
liquid crystalline self-assembled nanomaterials
have important applications in the delivery of therapeutic and imaging
agents. However, little is known about the effect of the incorporated
drug on the structure of nanoparticles. Predicting these properties
is widely considered intractable. We present computational models
for three drug delivery carriers, loaded with 10 drugs at six concentrations
and two temperatures. These models predicted phase behavior for 11
new drugs. Subsequent synchrotron small-angle X-ray scattering experiments
validated the predictions
Competitive Inhibition Mechanism of Acetylcholinesterase without Catalytic Active Site Interaction: Study on Functionalized C<sub>60</sub> Nanoparticles via in Vitro and in Silico Assays
Acetylcholinesterase
(AChE) activity regulation by chemical agents or, potentially, nanomaterials
is important for both toxicology and pharmacology. Competitive inhibition
via direct catalytic active sites (CAS) binding or noncompetitive
inhibition through interference with substrate and product entering
and exiting has been recognized previously as an AChE-inhibition mechanism
for bespoke nanomaterials. The competitive inhibition by peripheral
anionic site (PAS) interaction without CAS binding remains unexplored.
Here, we proposed and verified the occurrence of a presumed competitive
inhibition of AChE without CAS binding for hydrophobically functionalized
C<sub>60</sub> nanoparticles (NPs) by employing both experimental
and computational methods. The kinetic inhibition analysis distinguished
six competitive inhibitors, probably targeting the PAS, from the pristine
and hydrophilically modified C<sub>60</sub> NPs. A simple quantitative
nanostructure–activity relationship (QNAR) model relating the
pocket accessible length of substituent to inhibition capacity was
then established to reveal how the geometry of the surface group decides
the NP difference in AChE inhibition. Molecular docking identified
the PAS as the potential binding site interacting with the NPs via
a T-shaped plug-in mode. Specifically, the fullerene core covered
the enzyme gorge as a lid through π–π stacking
with Tyr72 and Trp286 in the PAS, while the hydrophobic ligands on
the fullerene surface inserted into the AChE active site to provide
further stability for the complexes. The modeling predicted that inhibition
would be severely compromised by Tyr72 and Trp286 deletions, and the
subsequent site-directed mutagenesis experiments proved this prediction.
Our results demonstrate AChE competitive inhibition of NPs without
CAS participation to gain further understanding of both the neurotoxicity
and the curative effect of NPs
Accurate and interpretable nanoSAR models from genetic programming-based decision tree construction approaches
<p>The number of engineered nanomaterials (ENMs) being exploited commercially is growing rapidly, due to the novel properties they exhibit. Clearly, it is important to understand and minimize any risks to health or the environment posed by the presence of ENMs. Data-driven models that decode the relationships between the biological activities of ENMs and their physicochemical characteristics provide an attractive means of maximizing the value of scarce and expensive experimental data. Although such structure–activity relationship (SAR) methods have become very useful tools for modelling nanotoxicity endpoints (nanoSAR), they have limited robustness and predictivity and, most importantly, interpretation of the models they generate is often very difficult. New computational modelling tools or new ways of using existing tools are required to model the relatively sparse and sometimes lower quality data on the biological effects of ENMs. The most commonly used SAR modelling methods work best with large datasets, are not particularly good at feature selection, can be relatively opaque to interpretation, and may not account for nonlinearity in the structure–property relationships. To overcome these limitations, we describe the application of a novel algorithm, a genetic programming-based decision tree construction tool (GPTree) to nanoSAR modelling. We demonstrate the use of GPTree in the construction of accurate and <i>interpretable</i> nanoSAR models by applying it to four diverse literature datasets. We describe the algorithm and compare model results across the four studies. We show that GPTree generates models with accuracies equivalent to or superior to those of prior modelling studies on the same datasets. GPTree is a robust, automatic method for generation of accurate nanoSAR models with important advantages that it works with small datasets, automatically selects descriptors, and provides significantly improved interpretability of models.</p
Predicting the Effect of Lipid Structure on Mesophase Formation during in Meso Crystallization
Bicontinuous cubic lipidic materials
are increasingly used as crystallization
media for in meso crystallization of membrane proteins (MPs). Varying
the lipid architecture may assist with encapsulation of larger proteins
and promote crystal growth. However, not all lipids are compatible
with the components of typical crystallization screens, and compatibility
must therefore be checked prior to crystallization trials. The method
currently used, high-throughput small-angle X-ray scattering (HT SAXS),
may be time-consuming and is costly in valuable MP. We have therefore
employed a modeling approach using Bayesian regularized neural networks
to accurately predict the complex phase behavior of lipid materials
under the influence of the PACT crystallization screen and determine
the lipid characteristics that allow a lipid to retain a cubic phase
under the multiple components required during an in meso crystallization
trial. This information will be used to select robust lipids for use
in crystallization trials and may allow for the rational design of
new lipids, specifically for in meso crystallization