40 research outputs found
Equivalent Alkane Carbon Number of Live Crude Oil: A Predictive Model Based on Thermodynamics
We took advantage of recently published works and new experimental data to propose a model for the prediction of the Equivalent Alkane Carbon Number of live crude oil (EACNlo) for EOR processes. The model necessitates the a priori knowledge of reservoir pressure and temperature conditions as well as the initial gas to oil ratio. Additionally, some required volumetric properties for hydrocarbons were predicted using an equation of state. The model has been validated both on our own experimental data and data from the literature. These various case studies cover broad ranges of conditions in terms of API gravity index, gas to oil ratio, reservoir pressure and temperature, and composition of representative gas. The predicted EACNlo values reasonably agree with experimental EACN values, i.e. determined by comparison with salinity scans for a series of n-alkanes from nC8 to nC18. The model has been used to generate high pressure high temperature data, showing competing effects of the gas to oil ratio, pressure and temperature. The proposed model allows to strongly narrow down the spectrum of possibilities in terms of EACNlo values, and thus a more rational use of equipments
Equivalent alkane carbon number of crude oils: A predictive model based on machine learning
International audienceIn this work, we present the development of models for the prediction of the Equivalent Alkane Carbon Number of a dead oil (EACNdo) usable in the context of Enhanced Oil Recovery (EOR) processes. Models were constructed by means of data mining tools. To that end, we collected 29 crude oil samples originating from around the world. Each of these crude oils have been experimentally analysed, and we measured property such as EACNdo, American Petroleum Institute (API) gravity and , saturate, aromatic, resin, and asphaltene fractions. All this information was put in form of a database. Evolutionary Algorithms (EA) have been applied to the database to derive models able to predict Equivalent Alkane Carbon Number (EACN) of a crude oil. Developed correlations returned EACNdo values in agreement with reference experimental data. Models have been used to feed a thermodynamics based models able to estimate the EACN of a live oil. The application of such strategy to study cases have demonstrated that combining these two models appears as a relevant tool for fast and accurate estimates of live crude oil EACNs
Prediction of thermodynamic properties of adsorbed gases in zeolitic imidazolate frameworks
In this work we propose an original methodology to predict the isosteric heat of adsorption of polar and non-polar gases adsorbed in different Zeolitic Imidazolate Framework (ZIF) materials, combining molecular simulation results with a quantitative structure-property relationship (QSPR) approach. The main contribution of our study is the development of a series of structural and molecular descriptors that are useful to describe the adsorption capability of adsorbents. A linear relationship is established to correlate the characteristics of gases and ZIF structures with the isosteric heat of adsorption. A simple tool to estimate the hydrophilic/hydrophobic nature of the solids studied is proposed based on the analysis of our simulation results. The promising approach shown in this work would be useful for the selection of organic linkers in the development of new hybrid organic-inorganic materials. © The Royal Society of Chemistry 2012
Probabilistic Mean Quantitative Structure–Property Relationship Modeling of Jet Fuel Properties
International audienceWe present a novel probabilistic mean quantitative structure–property relationship (M-QSPR) method for the prediction of jet fuel properties considering two-dimensional gas chromatography measurements. Fuels are represented as one mean pseudo-structure that is inferred by a weighted average over structures of 1866 molecules that could be present in the individual fuel. The method allows training of models on both data of pure components and of fuels and does not require mixing rules for the calculation of the bulk property. This drastically increases the number of available training data and allows the direct learning of the mixing behavior. For the modeling, we use a Monte-Carlo dropout neural network, a probabilistic machine learning algorithm, that estimates prediction uncertainties due to possible unidentified isomers and dissimilarity of training and test data. Models are developed to predict the freezing point, flash point, net heat of combustion, and temperature-dependent properties such as density, viscosity, and surface tension. We investigate the effect of the presence of fuels in the training data on the predictions for up to 82 conventional fuels and 50 synthetic fuels. The results of the predictions are compared on three metrics that quantify accuracy, precision, and reliability. These metrics allow a comprehensive estimation of the predictive capability of the models. For the prediction of density, surface tension, and net heat of combustion, the M-QSPR method yields highly accurate results even without the presence of fuels in the training data. For properties with nonlinear behavior over temperature and complex fuel component interactions, like viscosity and freezing point, the presence of fuels in the training data was found to be essential for the method
ReaxFF Alumina Parametrization Data Set
ReaxFF Alumina Parametrization Data Set
This dataset contains all the data needed to reproduce the Alumina Parametrization of ReaxFF, see bibliographic reference in the metadata.
All AMS calculations are performed using AMS2023.101. Whenever possible, Python scripts are written so that they do not require AMS, making a large portion of the scripts reproducible with open-source software.
The instructions below assume that the archive is unpacked on a Linux system, as follows:
unzip dataset-parametrization.zip
This will preserve file permissions upon extraction. Do not transfer the files to a Linux system after unpacking this archive on Windows, as this will remove the file permission flags.
Data descriptor
*** How the data were generated ***
The following steps summarize how to reproduce the results in this dataset.
It is assumed that you have a Linux system with AMS 2023.101 installed. Copy the file amsenv.sh.example to amsenv.sh and change the variables in this copy to match the location of your AMS installation.
For all non-AMS scripts, a Micromamba environment is used, which can be created with ./setup-env-micromamba.sh. After installation, you either manually activate the environment with
source env/bin/activate
or use direnv.
The VASP calculations for the training and validation sets are converted into files for ParAMS by running the following scripts:
(cd training-set/conversion; ./job.sh)
(cd validation-set/conversion; ./job.sh)
The job scripts can also be submitted with sbatch on a cluster. (You may need to modify them to work on your system.)
This will produce several files for each set, of which the following are relevant:
chemformula.json: names and chemical formulas of all structures
counts.json: counts of data set items in each category, per structure
energies.json: electronic energies of all structures and chemical equations
ics_phase1.json: internal coordinates in phase 1, see article for details
ics_phase2.json: internal coordinates in phase 2, see article for details
job_collection_{name}.yaml: Job collections for ParAMS
{name}_set.yaml: Dataset entries for ParAMS
These output files are already included in this archive. Only the .yaml files are used by ParAMS. The JSON files are used by some of the scripts in this archive, and were also used to generate tables and figures in the article.
The parameter selection can be reproduced as follows:
(cd parameter-selection; ./parameter_selection_ams.py)
This will produce a parameter_interface.yaml file that can be used as input to ParAMS. It contains the selection of parameters, the bounds and the historical values taken from Joshi et al. The parameter_interface.yaml file is already included in the archive.
At this stage, all inputs for the parametrization are available. The actual parametrization workflow is implemented in the opt-*-p28 directories. Note that the inputs to ParAMS and some configuration files for the workflow are stored in opt-*-p28/results/given. To repeat the parametrization workflow, remove the existing outputs (all directories under opt-*-p28/results/ except given). If some directories still exist, these steps will not be repeated. After removing existing outputs, enter one of the opt-*-p28 directories and run
../opt/workflow.py
This will coordinate the submission of various jobs to Slurm including:
40 CMA optimizations,
the recomputation of the loss for a range geometryoptimization.MaxIterations values, for all 40 optimized force fields, and
the evaluation of the data sets, using the best result from the 40 CMA runs.
Again, you may need to modify the job scripts in opt/templates/*/ to make them work on your system.
*** Software that was used ***
AMS 2023.101.
Python 3.11 and all packages listed in environment.yaml. (These are installed with the command ./setup-env-micromamba.sh.)
The custom ParAMS extractors defined in ./extractors/. These extractors are a workaround for efficiency issues in ParAMS. Instead of listing each angle or distance as a separate dataset item, these extractors group such quantities into arrays, which speeds up the training and increases parallel efficiency. At the time of writing, there is still a bug in AMS 2023.101, which requires one to manually edit singleton arrays that lack square brackets in a dataset.yaml file.
The Python scripts under scripts/ are used to generate the training and validation sets.
The Balanced Loss function is implemented in a module site-packages/balanced_loss_ams.py.
*** Directory and file organization ***
Most directories have already been defined in the previous two sections. This section only discusses some points not mentioned above.
The parametrization workflow consists of three different types of jobs, whose implementation can be found in opt/templates.
Running ./setup-env-micromamba.sh installs the Python environment in a subdirectory env.
The VASP outputs can be found in training-set/vasp-calculations and validation-set/vasp-calculations. Note that POTCAR files are not included due to restrictions imposed by the VASP license.
Python scripts ending with _ams.py should be executed (or imported) with amspython. The distinction is necessary because AMS2023.101 includes Python 3.8, while the rest of the Python scripts may benefit from new features in Python 3.11. By using this filename convention, we can apply pyupgrade selectively for different Python versions.
The MANIFEST.sha256sum file can be used to check the archive for corrupted files, e.g. due to bit rot. The following command will verify all files after unpacking the archive:
cut -c 17- MANIFEST.sha256 | sha256sum -c
*** File content details ***
All .json files in this archive contain custom data structures specific to this project. To understand their contents, please refer to the source code of the scripts that generate and use these files.
All other file formats are defined in the context of external software packages (VASP, AMS, …) and these formats will not be explained here
ReaxFF Alumina Parametrization Data Set
<p>See README.md for an overview of the dataset and history of changes.</p>
ReaxFF Alumina Parametrization Data Set
<p>See README.md for an overview of the dataset</p>
Managing expectations and imbalanced training data in reactive force field development : an application to water adsorption on alumina
ReaxFF is a computationally efficient model for reactive molecular dynamics simulations that has been applied to a wide variety of chemical systems. When ReaxFF parameters are not yet available for a chemistry of interest, they must be (re)optimized, for which one defines a set of training data that the new ReaxFF parameters should reproduce. ReaxFF training sets typically contain diverse properties with different units, some of which are more abundant (by orders of magnitude) than others. To find the best parameters, one conventionally minimizes a weighted sum of squared errors over all of the data in the training set. One of the challenges in such numerical optimizations is to assign weights so that the optimized parameters represent a good compromise among all the requirements defined in the training set. This work introduces a new loss function, called Balanced Loss, and a workflow that replaces weight assignment with a more manageable procedure. The training data are divided into categories with corresponding "tolerances", i.e., acceptable root-mean-square errors for the categories, which define the expectations for the optimized ReaxFF parameters. Through the Log-Sum-Exp form of Balanced Loss, the parameter optimization is also a validation of one's expectations, providing meaningful feedback that can be used to reconfigure the tolerances if needed. The new methodology is demonstrated with a nontrivial parametrization of ReaxFF for water adsorption on alumina. This results in a new force field that reproduces both the rare and frequent properties of a validation set not used for training. We also demonstrate the robustness of the new force field with a molecular dynamics simulation of water desorption from a gamma-Al2O3 slab model