13 research outputs found
Managing Expectations and Imbalanced Training Data in Reactive Force Field Development: An Application to Water Adsorption on Alumina
ReaxFF
is a computationally efficient model for reactive molecular
dynamics simulations that has been applied to a wide variety of chemical
systems. When ReaxFF parameters are not yet available for a chemistry
of interest, they must be (re)optimized, for which one defines a set
of training data that the new ReaxFF parameters should reproduce.
ReaxFF training sets typically contain diverse properties with different
units, some of which are more abundant (by orders of magnitude) than
others. To find the best parameters, one conventionally minimizes
a weighted sum of squared errors over all of the data in the training
set. One of the challenges in such numerical optimizations is to assign
weights so that the optimized parameters represent a good compromise
among all the requirements defined in the training set. This work
introduces a new loss function, called Balanced Loss, and a workflow
that replaces weight assignment with a more manageable procedure.
The training data are divided into categories with corresponding “tolerances”, i.e., acceptable root-mean-square errors for the categories,
which define the expectations for the optimized ReaxFF parameters.
Through the Log-Sum-Exp form of Balanced Loss, the parameter optimization
is also a validation of one’s expectations, providing meaningful
feedback that can be used to reconfigure the tolerances if needed.
The new methodology is demonstrated with a nontrivial parametrization
of ReaxFF for water adsorption on alumina. This results in a new force
field that reproduces both the rare and frequent properties of a validation
set not used for training. We also demonstrate the robustness of the
new force field with a molecular dynamics simulation of water desorption
from a γ-Al2O3 slab model
Potential Energy Surface-Based Descriptors for Nanoporous Materials and its Applications to Classification and CO<sub>2</sub> Gas Adsorption into Zeolites
The generalization of high-throughput synthesis has recently
allowed
the discovery of thousands of new porous materials, generating a large
amount of information, with the development of specialized databases.
Widespread access to databases enabled an increase in algorithms and
models for property prediction and in silico design of materials.
The structural information on materials still needs to be rationalized
by the inclusion of descriptors to ease the characterization of solids.
This is essential for in silico screening to potential applications
based on machine learning (ML) approaches. Indeed, at the forefront
of a real revolution in the selection and design of porous materials
for many industrial applications, the use of appropriate descriptors
to encode solid material properties (topology, porosity, and surface
chemistry) is one of the fundamental aspects of the development of
ML-based models. Our analysis of the literature reveals a lack of
descriptors based on the potential energy surface (PES) of crystalline
materials embedding crucial information such as the porosity, the
topology, and the surface chemistry. In this work, we introduce new
PES-based descriptors including the surface probability distribution
of the local mean curvature (KH), the
electrostatic-PES distribution (σe), as well as the
local electrostatic-potential gradient surface probability distribution
(∇σe). Our descriptors allow the classification
of zeolites as well as its characterization by self-containing standard
morphological and topological information (pore diameter, tortuosity,
surface chemistry, etc.). We illustrate their usage to generate accurate
ML-based models of the isosteric heat of adsorption of CO2 on purely siliceous zeolites of the IZA database and ion-exchanged
zeolites in the function of the Si/Al ratio for the case of LTA topology
Potential Energy Surface-Based Descriptors for Nanoporous Materials and its Applications to Classification and CO<sub>2</sub> Gas Adsorption into Zeolites
The generalization of high-throughput synthesis has recently
allowed
the discovery of thousands of new porous materials, generating a large
amount of information, with the development of specialized databases.
Widespread access to databases enabled an increase in algorithms and
models for property prediction and in silico design of materials.
The structural information on materials still needs to be rationalized
by the inclusion of descriptors to ease the characterization of solids.
This is essential for in silico screening to potential applications
based on machine learning (ML) approaches. Indeed, at the forefront
of a real revolution in the selection and design of porous materials
for many industrial applications, the use of appropriate descriptors
to encode solid material properties (topology, porosity, and surface
chemistry) is one of the fundamental aspects of the development of
ML-based models. Our analysis of the literature reveals a lack of
descriptors based on the potential energy surface (PES) of crystalline
materials embedding crucial information such as the porosity, the
topology, and the surface chemistry. In this work, we introduce new
PES-based descriptors including the surface probability distribution
of the local mean curvature (KH), the
electrostatic-PES distribution (σe), as well as the
local electrostatic-potential gradient surface probability distribution
(∇σe). Our descriptors allow the classification
of zeolites as well as its characterization by self-containing standard
morphological and topological information (pore diameter, tortuosity,
surface chemistry, etc.). We illustrate their usage to generate accurate
ML-based models of the isosteric heat of adsorption of CO2 on purely siliceous zeolites of the IZA database and ion-exchanged
zeolites in the function of the Si/Al ratio for the case of LTA topology
GC-PPC-SAFT Equation of State for VLE and LLE of Hydrocarbons and Oxygenated Compounds. Sensitivity Analysis
Group-contribution
polar versions of SAFT equations of state are very useful for predictive
calculations of mixtures containing diverse polar molecules. In this
work, we have evaluated the predictive performance of one such model,
the so-called polar perturbed-chain (PPC) SAFT model for phase-equilibrium
properties of 290 hydrocarbons and monofunctional oxygenated compounds.
Emphasis has been given on carrying out an extensive evaluation considering
diverse types of phase behavior (vapor–liquid and liquid–liquid
equilibria) and properties/conditions (Henry’s law constant
for H<sub>2</sub>, N<sub>2</sub>, and CH<sub>4</sub>; infinite-dilution
activity coefficient in water; solubility in water; infinite-dilution <i>n</i>-octanol/water partition coefficient). In general, considering
the predictive nature of the calculations, encouraging results were
obtained. For pure-component vapor pressures and liquid molar volumes,
the deviations are very small, at 20% and 3%, respectively. The deviations
in the prediction of the Henry’s law constants are within a
factor of 2, with the best results found for the methane and nitrogen
solubilities. For solubilities in water and, consequently, for infinite-dilution <i>n</i>-octanol/water partition coefficients, deviations are within
a factor of 2 for hydrocarbons and within a factor of 4 for alcohols
and aldehydes, but they are large for the other oxygenated families.
To identify paths for improvement, a sensitivity analysis was performed,
indicating that all of the parameters make large contributions to
almost all properties. In addition, the sensitivity of the infinite-dilution
activity coefficient in water to the molecular size parameters was
extremely high. This suggests that a small change in these parameters
might improve the results significantly
Simulations of Interfacial Tension of Liquid–Liquid Ternary Mixtures Using Optimized Parametrization for Coarse-Grained Models
In this work, liquid–liquid
systems are studied by means
of coarse-grained Monte Carlo simulations (CG-MC) and Dissipative
Particle Dynamics (DPD). A methodology is proposed to reproduce liquid–liquid
equilibrium (LLE) and to provide variation of interfacial tension
(IFT), as a function of the solute concentration. A key step is the
parametrization method based on the use of the Flory–Huggins
parameter between DPD beads to calculate solute/solvent interactions.
Parameters are determined using a set of experimental compositional
data of LLE, following four different approaches. These approaches
are evaluated, and the results obtained are compared to analyze advantages/disadvantages
of each one. These methodologies have been compared through their
application on six systems: water/benzene/1,4-dioxane,water/chloroform/acetone,
water/benzene/acetic acid, water/benzene/2-propanol, water/hexane/acetone,
and water/hexane/2-propanol. CG-MC simulations in the Gibbs (NVT)
ensemble have been used to check the validity of parametrization approaches
for LLE reproduction. Then, CG-MC simulations in the osmotic (μ<sub>solute</sub>N<sub>solvent</sub>P<sub><i>zz</i></sub>T)
ensemble were carried out considering the two liquid phases with an
explicit interface. This step allows one to work at the same bulk
concentrations as the experimental data by imposing the precise bulk
phase compositions and predicting the interface composition. Finally,
DPD simulations were used to predict IFT values for different solute
concentrations. Our results on variation of IFT with solute concentration
in bulk phases are in good agreement with experimental data, but some
deviations can be observed for systems containing hexane molecules
Managing Expectations and Imbalanced Training Data in Reactive Force Field Development: An Application to Water Adsorption on Alumina
ReaxFF
is a computationally efficient model for reactive molecular
dynamics simulations that has been applied to a wide variety of chemical
systems. When ReaxFF parameters are not yet available for a chemistry
of interest, they must be (re)optimized, for which one defines a set
of training data that the new ReaxFF parameters should reproduce.
ReaxFF training sets typically contain diverse properties with different
units, some of which are more abundant (by orders of magnitude) than
others. To find the best parameters, one conventionally minimizes
a weighted sum of squared errors over all of the data in the training
set. One of the challenges in such numerical optimizations is to assign
weights so that the optimized parameters represent a good compromise
among all the requirements defined in the training set. This work
introduces a new loss function, called Balanced Loss, and a workflow
that replaces weight assignment with a more manageable procedure.
The training data are divided into categories with corresponding “tolerances”, i.e., acceptable root-mean-square errors for the categories,
which define the expectations for the optimized ReaxFF parameters.
Through the Log-Sum-Exp form of Balanced Loss, the parameter optimization
is also a validation of one’s expectations, providing meaningful
feedback that can be used to reconfigure the tolerances if needed.
The new methodology is demonstrated with a nontrivial parametrization
of ReaxFF for water adsorption on alumina. This results in a new force
field that reproduces both the rare and frequent properties of a validation
set not used for training. We also demonstrate the robustness of the
new force field with a molecular dynamics simulation of water desorption
from a γ-Al2O3 slab model
Potential Energy Surface-Based Descriptors for Nanoporous Materials and its Applications to Classification and CO<sub>2</sub> Gas Adsorption into Zeolites
The generalization of high-throughput synthesis has recently
allowed
the discovery of thousands of new porous materials, generating a large
amount of information, with the development of specialized databases.
Widespread access to databases enabled an increase in algorithms and
models for property prediction and in silico design of materials.
The structural information on materials still needs to be rationalized
by the inclusion of descriptors to ease the characterization of solids.
This is essential for in silico screening to potential applications
based on machine learning (ML) approaches. Indeed, at the forefront
of a real revolution in the selection and design of porous materials
for many industrial applications, the use of appropriate descriptors
to encode solid material properties (topology, porosity, and surface
chemistry) is one of the fundamental aspects of the development of
ML-based models. Our analysis of the literature reveals a lack of
descriptors based on the potential energy surface (PES) of crystalline
materials embedding crucial information such as the porosity, the
topology, and the surface chemistry. In this work, we introduce new
PES-based descriptors including the surface probability distribution
of the local mean curvature (KH), the
electrostatic-PES distribution (σe), as well as the
local electrostatic-potential gradient surface probability distribution
(∇σe). Our descriptors allow the classification
of zeolites as well as its characterization by self-containing standard
morphological and topological information (pore diameter, tortuosity,
surface chemistry, etc.). We illustrate their usage to generate accurate
ML-based models of the isosteric heat of adsorption of CO2 on purely siliceous zeolites of the IZA database and ion-exchanged
zeolites in the function of the Si/Al ratio for the case of LTA topology
Potential Energy Surface-Based Descriptors for Nanoporous Materials and its Applications to Classification and CO<sub>2</sub> Gas Adsorption into Zeolites
The generalization of high-throughput synthesis has recently
allowed
the discovery of thousands of new porous materials, generating a large
amount of information, with the development of specialized databases.
Widespread access to databases enabled an increase in algorithms and
models for property prediction and in silico design of materials.
The structural information on materials still needs to be rationalized
by the inclusion of descriptors to ease the characterization of solids.
This is essential for in silico screening to potential applications
based on machine learning (ML) approaches. Indeed, at the forefront
of a real revolution in the selection and design of porous materials
for many industrial applications, the use of appropriate descriptors
to encode solid material properties (topology, porosity, and surface
chemistry) is one of the fundamental aspects of the development of
ML-based models. Our analysis of the literature reveals a lack of
descriptors based on the potential energy surface (PES) of crystalline
materials embedding crucial information such as the porosity, the
topology, and the surface chemistry. In this work, we introduce new
PES-based descriptors including the surface probability distribution
of the local mean curvature (KH), the
electrostatic-PES distribution (σe), as well as the
local electrostatic-potential gradient surface probability distribution
(∇σe). Our descriptors allow the classification
of zeolites as well as its characterization by self-containing standard
morphological and topological information (pore diameter, tortuosity,
surface chemistry, etc.). We illustrate their usage to generate accurate
ML-based models of the isosteric heat of adsorption of CO2 on purely siliceous zeolites of the IZA database and ion-exchanged
zeolites in the function of the Si/Al ratio for the case of LTA topology
Prediction of Flash Points for Fuel Mixtures Using Machine Learning and a Novel Equation
In this work, a set of computationally
efficient, yet accurate,
methods to predict flash points of fuel mixtures based solely on their
chemical structures and mole fractions was developed. Two approaches
were tested using data obtained from the existing literature: (1)
machine learning directly applied to mixture flash point data (the
mixture QSPR approach) using additive descriptors and (2) machine
learning applied to pure compound properties (the QSPR approach) in
combination with Le Chatelier rule based calculations. It was found
that the second method performs better than the first with the available
databank and for the target application. We proposed a novel equation,
and we evaluated the performance of the resulting, fully predictive,
Le Chatelier rule based approach on new experimental data of surrogate
jet and diesel fuels, yielding excellent results. We predicted the
variation in flash point of diesel–gasoline blends with increasing
proportions of gasoline
Prediction of Density and Viscosity of Biofuel Compounds Using Machine Learning Methods
In the present work, temperature dependent models for
the prediction
of densities and dynamic viscosities of pure compounds within the
range of possible alternative fuel mixture components are presented.
The proposed models have been derived using machine learning methods
including Artificial Neural Networks and Support Vector Machines.
Experimental data used to train and validate the models was extracted
from the DIPPR database. A comparison between models using an ample
range of molecular descriptors and models using only functional group
count descriptors as inputs was performed, and consensus models were
created by testing different combinations of the individual models.
The resulting consensus models’ predictions were in agreement
with the available experimental data. Comparisons were also made between
predictions of our models and correlations validated by the DIPPR
staff. Our models were used to predict densities and dynamic viscosities
of compounds for which no experimental data exists. Our models were
also used to estimate other properties such as kinematic viscosities,
critical temperatures, and critical pressures for compounds in the
database. Finally, predictions were used to study the main trends
of density and viscosity at the aforementioned temperatures as a function
of the number of carbon atoms for chemical families of interest