2,177 research outputs found
Modeling of the Acute Toxicity of Benzene Derivatives by Complementary QSAR Methods
A data set containing acute toxicity values (96-h LC50) of 69 substituted benzenes for
fathead minnow (Pimephales promelas) was investigated with two Quantitative Structure-
Activity Relationship (QSAR) models, either using or not using molecular descriptors,
respectively. Recursive Neural Networks (RNN) derive a QSAR by direct treatment of the
molecular structure, described through an appropriate graphical tool (variable-size labeled
rooted ordered trees) by defining suitable representation rules. The input trees are encoded by
an adaptive process able to learn, by tuning its free parameters, from a given set of structureactivity
training examples. Owing to the use of a flexible encoding approach, the model is
target invariant and does not need a priori definition of molecular descriptors. The results
obtained in this study were analyzed together with those of a model based on molecular
descriptors, i.e. a Multiple Linear Regression (MLR) model using CROatian MultiRegression
selection of descriptors (CROMRsel). The comparison revealed interesting similarities that
could lead to the development of a combined approach, exploiting the complementary
characteristics of the two approaches
Modeling of the acute toxicity of benzene derivatives by complementary QSAR methods
A data set containing acute toxicity values (96-h LC50) of 69 substituted benzenes for fathead minnow (Pimephales promelas) was investigated with two Quantitative Structure- Activity Relationship (QSAR) models, either using or not using molecular descriptors, respectively. Recursive Neural Networks (RNN) derive a QSAR by direct treatment of the molecular structure, described through an appropriate graphical tool (variable-size labeled rooted ordered trees) by defining suitable representation rules. The input trees are encoded by an adaptive process able to learn, by tuning its free parameters, from a given set of structureactivity training examples. Owing to the use of a flexible encoding approach, the model is target invariant and does not need a priori definition of molecular descriptors. The results obtained in this study were analyzed together with those of a model based on molecular descriptors, i.e. a Multiple Linear Regression (MLR) model using CROatian MultiRegression selection of descriptors (CROMRsel). The comparison revealed interesting similarities that could lead to the development of a combined approach, exploiting the complementary characteristics of the two approaches
Resolving transition metal chemical space: feature selection for machine learning and structure-property relationships
Machine learning (ML) of quantum mechanical properties shows promise for
accelerating chemical discovery. For transition metal chemistry where accurate
calculations are computationally costly and available training data sets are
small, the molecular representation becomes a critical ingredient in ML model
predictive accuracy. We introduce a series of revised autocorrelation functions
(RACs) that encode relationships between the heuristic atomic properties (e.g.,
size, connectivity, and electronegativity) on a molecular graph. We alter the
starting point, scope, and nature of the quantities evaluated in standard ACs
to make these RACs amenable to inorganic chemistry. On an organic molecule set,
we first demonstrate superior standard AC performance to other
presently-available topological descriptors for ML model training, with mean
unsigned errors (MUEs) for atomization energies on set-aside test molecules as
low as 6 kcal/mol. For inorganic chemistry, our RACs yield 1 kcal/mol ML MUEs
on set-aside test molecules in spin-state splitting in comparison to 15-20x
higher errors from feature sets that encode whole-molecule structural
information. Systematic feature selection methods including univariate
filtering, recursive feature elimination, and direct optimization (e.g., random
forest and LASSO) are compared. Random-forest- or LASSO-selected subsets 4-5x
smaller than RAC-155 produce sub- to 1-kcal/mol spin-splitting MUEs, with good
transferability to metal-ligand bond length prediction (0.004-5 {\AA} MUE) and
redox potential on a smaller data set (0.2-0.3 eV MUE). Evaluation of feature
selection results across property sets reveals the relative importance of
local, electronic descriptors (e.g., electronegativity, atomic number) in
spin-splitting and distal, steric effects in redox potential and bond lengths.Comment: 43 double spaced pages, 11 figures, 4 table
Stereo-Aware Extension of HOSE Codes
The file attached to this record is the author's final peer reviewed version. The Publisher's final version can be found by following the DOI link.Descriptions of molecular environments have many applications in chemoinformatics, including chemical shift prediction. Hierarchically ordered spherical environment (HOSE) codes are the most popular such descriptions. We developed a method to extend these with stereochemistry information. It enables distinguishing atoms which would be considered identical in traditional HOSE codes. The use of our method is demonstrated by chemical shift predictions for molecules in the nmrshiftdb2 database. We give a full specification and an implementation
An adaptive model for learning molecular endpoints
I will describe a recursive neural network that deals with undirected graphs, and its application to predicting property labels or activity values of small molecules.
The model is entirely general, in that it can process any undirected graph with a finite number of nodes by factorising it into a number of directed graphs with the same skeleton.
The model\u27s only input in the applications I will present is the graph representing the chemical structure of the molecule. In spite of its simplicity, the model outperforms or matches the state of the art in three of the four tasks, and in the fourth is outperformed only by a method resorting to a very problem-specific feature
- …