13 research outputs found

    Managing Expectations and Imbalanced Training Data in Reactive Force Field Development: An Application to Water Adsorption on Alumina

    No full text
    ReaxFF is a computationally efficient model for reactive molecular dynamics simulations that has been applied to a wide variety of chemical systems. When ReaxFF parameters are not yet available for a chemistry of interest, they must be (re)optimized, for which one defines a set of training data that the new ReaxFF parameters should reproduce. ReaxFF training sets typically contain diverse properties with different units, some of which are more abundant (by orders of magnitude) than others. To find the best parameters, one conventionally minimizes a weighted sum of squared errors over all of the data in the training set. One of the challenges in such numerical optimizations is to assign weights so that the optimized parameters represent a good compromise among all the requirements defined in the training set. This work introduces a new loss function, called Balanced Loss, and a workflow that replaces weight assignment with a more manageable procedure. The training data are divided into categories with corresponding “tolerances”, i.e., acceptable root-mean-square errors for the categories, which define the expectations for the optimized ReaxFF parameters. Through the Log-Sum-Exp form of Balanced Loss, the parameter optimization is also a validation of one’s expectations, providing meaningful feedback that can be used to reconfigure the tolerances if needed. The new methodology is demonstrated with a nontrivial parametrization of ReaxFF for water adsorption on alumina. This results in a new force field that reproduces both the rare and frequent properties of a validation set not used for training. We also demonstrate the robustness of the new force field with a molecular dynamics simulation of water desorption from a γ-Al2O3 slab model

    Potential Energy Surface-Based Descriptors for Nanoporous Materials and its Applications to Classification and CO<sub>2</sub> Gas Adsorption into Zeolites

    No full text
    The generalization of high-throughput synthesis has recently allowed the discovery of thousands of new porous materials, generating a large amount of information, with the development of specialized databases. Widespread access to databases enabled an increase in algorithms and models for property prediction and in silico design of materials. The structural information on materials still needs to be rationalized by the inclusion of descriptors to ease the characterization of solids. This is essential for in silico screening to potential applications based on machine learning (ML) approaches. Indeed, at the forefront of a real revolution in the selection and design of porous materials for many industrial applications, the use of appropriate descriptors to encode solid material properties (topology, porosity, and surface chemistry) is one of the fundamental aspects of the development of ML-based models. Our analysis of the literature reveals a lack of descriptors based on the potential energy surface (PES) of crystalline materials embedding crucial information such as the porosity, the topology, and the surface chemistry. In this work, we introduce new PES-based descriptors including the surface probability distribution of the local mean curvature (KH), the electrostatic-PES distribution (σe), as well as the local electrostatic-potential gradient surface probability distribution (∇σe). Our descriptors allow the classification of zeolites as well as its characterization by self-containing standard morphological and topological information (pore diameter, tortuosity, surface chemistry, etc.). We illustrate their usage to generate accurate ML-based models of the isosteric heat of adsorption of CO2 on purely siliceous zeolites of the IZA database and ion-exchanged zeolites in the function of the Si/Al ratio for the case of LTA topology

    Potential Energy Surface-Based Descriptors for Nanoporous Materials and its Applications to Classification and CO<sub>2</sub> Gas Adsorption into Zeolites

    No full text
    The generalization of high-throughput synthesis has recently allowed the discovery of thousands of new porous materials, generating a large amount of information, with the development of specialized databases. Widespread access to databases enabled an increase in algorithms and models for property prediction and in silico design of materials. The structural information on materials still needs to be rationalized by the inclusion of descriptors to ease the characterization of solids. This is essential for in silico screening to potential applications based on machine learning (ML) approaches. Indeed, at the forefront of a real revolution in the selection and design of porous materials for many industrial applications, the use of appropriate descriptors to encode solid material properties (topology, porosity, and surface chemistry) is one of the fundamental aspects of the development of ML-based models. Our analysis of the literature reveals a lack of descriptors based on the potential energy surface (PES) of crystalline materials embedding crucial information such as the porosity, the topology, and the surface chemistry. In this work, we introduce new PES-based descriptors including the surface probability distribution of the local mean curvature (KH), the electrostatic-PES distribution (σe), as well as the local electrostatic-potential gradient surface probability distribution (∇σe). Our descriptors allow the classification of zeolites as well as its characterization by self-containing standard morphological and topological information (pore diameter, tortuosity, surface chemistry, etc.). We illustrate their usage to generate accurate ML-based models of the isosteric heat of adsorption of CO2 on purely siliceous zeolites of the IZA database and ion-exchanged zeolites in the function of the Si/Al ratio for the case of LTA topology

    GC-PPC-SAFT Equation of State for VLE and LLE of Hydrocarbons and Oxygenated Compounds. Sensitivity Analysis

    No full text
    Group-contribution polar versions of SAFT equations of state are very useful for predictive calculations of mixtures containing diverse polar molecules. In this work, we have evaluated the predictive performance of one such model, the so-called polar perturbed-chain (PPC) SAFT model for phase-equilibrium properties of 290 hydrocarbons and monofunctional oxygenated compounds. Emphasis has been given on carrying out an extensive evaluation considering diverse types of phase behavior (vapor–liquid and liquid–liquid equilibria) and properties/conditions (Henry’s law constant for H<sub>2</sub>, N<sub>2</sub>, and CH<sub>4</sub>; infinite-dilution activity coefficient in water; solubility in water; infinite-dilution <i>n</i>-octanol/water partition coefficient). In general, considering the predictive nature of the calculations, encouraging results were obtained. For pure-component vapor pressures and liquid molar volumes, the deviations are very small, at 20% and 3%, respectively. The deviations in the prediction of the Henry’s law constants are within a factor of 2, with the best results found for the methane and nitrogen solubilities. For solubilities in water and, consequently, for infinite-dilution <i>n</i>-octanol/water partition coefficients, deviations are within a factor of 2 for hydrocarbons and within a factor of 4 for alcohols and aldehydes, but they are large for the other oxygenated families. To identify paths for improvement, a sensitivity analysis was performed, indicating that all of the parameters make large contributions to almost all properties. In addition, the sensitivity of the infinite-dilution activity coefficient in water to the molecular size parameters was extremely high. This suggests that a small change in these parameters might improve the results significantly

    Simulations of Interfacial Tension of Liquid–Liquid Ternary Mixtures Using Optimized Parametrization for Coarse-Grained Models

    No full text
    In this work, liquid–liquid systems are studied by means of coarse-grained Monte Carlo simulations (CG-MC) and Dissipative Particle Dynamics (DPD). A methodology is proposed to reproduce liquid–liquid equilibrium (LLE) and to provide variation of interfacial tension (IFT), as a function of the solute concentration. A key step is the parametrization method based on the use of the Flory–Huggins parameter between DPD beads to calculate solute/solvent interactions. Parameters are determined using a set of experimental compositional data of LLE, following four different approaches. These approaches are evaluated, and the results obtained are compared to analyze advantages/disadvantages of each one. These methodologies have been compared through their application on six systems: water/benzene/1,4-dioxane,water/chloroform/acetone, water/benzene/acetic acid, water/benzene/2-propanol, water/hexane/acetone, and water/hexane/2-propanol. CG-MC simulations in the Gibbs (NVT) ensemble have been used to check the validity of parametrization approaches for LLE reproduction. Then, CG-MC simulations in the osmotic (μ<sub>solute</sub>N<sub>solvent</sub>P<sub><i>zz</i></sub>T) ensemble were carried out considering the two liquid phases with an explicit interface. This step allows one to work at the same bulk concentrations as the experimental data by imposing the precise bulk phase compositions and predicting the interface composition. Finally, DPD simulations were used to predict IFT values for different solute concentrations. Our results on variation of IFT with solute concentration in bulk phases are in good agreement with experimental data, but some deviations can be observed for systems containing hexane molecules

    Managing Expectations and Imbalanced Training Data in Reactive Force Field Development: An Application to Water Adsorption on Alumina

    No full text
    ReaxFF is a computationally efficient model for reactive molecular dynamics simulations that has been applied to a wide variety of chemical systems. When ReaxFF parameters are not yet available for a chemistry of interest, they must be (re)optimized, for which one defines a set of training data that the new ReaxFF parameters should reproduce. ReaxFF training sets typically contain diverse properties with different units, some of which are more abundant (by orders of magnitude) than others. To find the best parameters, one conventionally minimizes a weighted sum of squared errors over all of the data in the training set. One of the challenges in such numerical optimizations is to assign weights so that the optimized parameters represent a good compromise among all the requirements defined in the training set. This work introduces a new loss function, called Balanced Loss, and a workflow that replaces weight assignment with a more manageable procedure. The training data are divided into categories with corresponding “tolerances”, i.e., acceptable root-mean-square errors for the categories, which define the expectations for the optimized ReaxFF parameters. Through the Log-Sum-Exp form of Balanced Loss, the parameter optimization is also a validation of one’s expectations, providing meaningful feedback that can be used to reconfigure the tolerances if needed. The new methodology is demonstrated with a nontrivial parametrization of ReaxFF for water adsorption on alumina. This results in a new force field that reproduces both the rare and frequent properties of a validation set not used for training. We also demonstrate the robustness of the new force field with a molecular dynamics simulation of water desorption from a γ-Al2O3 slab model

    Potential Energy Surface-Based Descriptors for Nanoporous Materials and its Applications to Classification and CO<sub>2</sub> Gas Adsorption into Zeolites

    No full text
    The generalization of high-throughput synthesis has recently allowed the discovery of thousands of new porous materials, generating a large amount of information, with the development of specialized databases. Widespread access to databases enabled an increase in algorithms and models for property prediction and in silico design of materials. The structural information on materials still needs to be rationalized by the inclusion of descriptors to ease the characterization of solids. This is essential for in silico screening to potential applications based on machine learning (ML) approaches. Indeed, at the forefront of a real revolution in the selection and design of porous materials for many industrial applications, the use of appropriate descriptors to encode solid material properties (topology, porosity, and surface chemistry) is one of the fundamental aspects of the development of ML-based models. Our analysis of the literature reveals a lack of descriptors based on the potential energy surface (PES) of crystalline materials embedding crucial information such as the porosity, the topology, and the surface chemistry. In this work, we introduce new PES-based descriptors including the surface probability distribution of the local mean curvature (KH), the electrostatic-PES distribution (σe), as well as the local electrostatic-potential gradient surface probability distribution (∇σe). Our descriptors allow the classification of zeolites as well as its characterization by self-containing standard morphological and topological information (pore diameter, tortuosity, surface chemistry, etc.). We illustrate their usage to generate accurate ML-based models of the isosteric heat of adsorption of CO2 on purely siliceous zeolites of the IZA database and ion-exchanged zeolites in the function of the Si/Al ratio for the case of LTA topology

    Potential Energy Surface-Based Descriptors for Nanoporous Materials and its Applications to Classification and CO<sub>2</sub> Gas Adsorption into Zeolites

    No full text
    The generalization of high-throughput synthesis has recently allowed the discovery of thousands of new porous materials, generating a large amount of information, with the development of specialized databases. Widespread access to databases enabled an increase in algorithms and models for property prediction and in silico design of materials. The structural information on materials still needs to be rationalized by the inclusion of descriptors to ease the characterization of solids. This is essential for in silico screening to potential applications based on machine learning (ML) approaches. Indeed, at the forefront of a real revolution in the selection and design of porous materials for many industrial applications, the use of appropriate descriptors to encode solid material properties (topology, porosity, and surface chemistry) is one of the fundamental aspects of the development of ML-based models. Our analysis of the literature reveals a lack of descriptors based on the potential energy surface (PES) of crystalline materials embedding crucial information such as the porosity, the topology, and the surface chemistry. In this work, we introduce new PES-based descriptors including the surface probability distribution of the local mean curvature (KH), the electrostatic-PES distribution (σe), as well as the local electrostatic-potential gradient surface probability distribution (∇σe). Our descriptors allow the classification of zeolites as well as its characterization by self-containing standard morphological and topological information (pore diameter, tortuosity, surface chemistry, etc.). We illustrate their usage to generate accurate ML-based models of the isosteric heat of adsorption of CO2 on purely siliceous zeolites of the IZA database and ion-exchanged zeolites in the function of the Si/Al ratio for the case of LTA topology

    Prediction of Flash Points for Fuel Mixtures Using Machine Learning and a Novel Equation

    No full text
    In this work, a set of computationally efficient, yet accurate, methods to predict flash points of fuel mixtures based solely on their chemical structures and mole fractions was developed. Two approaches were tested using data obtained from the existing literature: (1) machine learning directly applied to mixture flash point data (the mixture QSPR approach) using additive descriptors and (2) machine learning applied to pure compound properties (the QSPR approach) in combination with Le Chatelier rule based calculations. It was found that the second method performs better than the first with the available databank and for the target application. We proposed a novel equation, and we evaluated the performance of the resulting, fully predictive, Le Chatelier rule based approach on new experimental data of surrogate jet and diesel fuels, yielding excellent results. We predicted the variation in flash point of diesel–gasoline blends with increasing proportions of gasoline

    Prediction of Density and Viscosity of Biofuel Compounds Using Machine Learning Methods

    No full text
    In the present work, temperature dependent models for the prediction of densities and dynamic viscosities of pure compounds within the range of possible alternative fuel mixture components are presented. The proposed models have been derived using machine learning methods including Artificial Neural Networks and Support Vector Machines. Experimental data used to train and validate the models was extracted from the DIPPR database. A comparison between models using an ample range of molecular descriptors and models using only functional group count descriptors as inputs was performed, and consensus models were created by testing different combinations of the individual models. The resulting consensus models’ predictions were in agreement with the available experimental data. Comparisons were also made between predictions of our models and correlations validated by the DIPPR staff. Our models were used to predict densities and dynamic viscosities of compounds for which no experimental data exists. Our models were also used to estimate other properties such as kinematic viscosities, critical temperatures, and critical pressures for compounds in the database. Finally, predictions were used to study the main trends of density and viscosity at the aforementioned temperatures as a function of the number of carbon atoms for chemical families of interest
    corecore