10 research outputs found
Critical role of quantum dynamical effects in the Raman spectroscopy of liquid water
Understanding the Raman spectroscopy at the atomistic level is important for
the elucidation of dynamical processes in liquid water. Because the
polarizability (or its time derivative) is often a highly nonlinear function of
coordinates or/and momenta, we employ the linearized semiclassical initial
value representation for quantum dynamical simulations of liquid water (and
heavy water) under ambient conditions based on an ab initio based, flexible,
polarizable model (the POLI2VS force field). It is shown that quantum dynamical
effects play a critical role in reproducing the peaks in the intermediate
region between the librational and bending bands, those between the bending and
stretching bands, and the double-peak in the stretching band in the
experimental isotropic Raman spectrum. In contrast, quantum dynamical effects
are important but less decisive in the anisotropic Raman spectrum. By
selectively freezing either the intramolecular O-H stretching or H-O-H bending
mode, we demonstrate that the peak in the intermediate region (2000-2400 cm-1)
of the isotropic Raman spectrum arises from the interplay of the stretching and
bending motions while a substantial part of the peak in the same intermediate
region of the anisotropic Raman spectrum may be attributed to the combined
motion of the bending and librational modes
Pretraining of attention-based deep learning potential model for molecular simulation
Abstract Machine learning-assisted modeling of the inter-atomic potential energy surface (PES) is revolutionizing the field of molecular simulation. With the accumulation of high-quality electronic structure data, a model that can be pretrained on all available data and finetuned on downstream tasks with a small additional effort would bring the field to a new stage. Here we propose DPA-1, a Deep Potential model with a gated attention mechanism, which is highly effective for representing the conformation and chemical spaces of atomic systems and learning the PES. We tested DPA-1 on a number of systems and observed superior performance compared with existing benchmarks. When pretrained on large-scale datasets containing 56 elements, DPA-1 can be successfully applied to various downstream tasks with a great improvement of sample efficiency. Surprisingly, for different elements, the learned type embedding parameters form a s p i r a l in the latent space and have a natural correspondence with their positions on the periodic table, showing interesting interpretability of the pretrained DPA-1 model
DPA-2: Towards a universal large atomic model for molecular and material simulation
<h2><strong>Data:</strong></h2>
<div>
<ul>
<li>The complete collection of datasets employed in this research is encapsulated within the archive file <em>data-v1.3.tgz. </em>This encompasses both the upstream datasets for pre-training and downstream datasets for fine-tuning, all in <a href="https://github.com/deepmodeling/deepmd-kit/blob/master/doc/data/system.md" target="_blank" rel="noopener">DeePMD format</a>. We recommend creating a new directory and employing the command 'tar -xzvf data-v1.3.tgz' to extract the data files.</li>
<li>Inside each dataset contained in subdirectories (e.g., Domains, Metals, H2O, and Others), one will find:
<ul>
<li>A README file</li>
<li>A 'train' directory (included if utilized in upstream pre-training)
<ul>
<li>train.json -- A list of file paths for training systems</li>
<li>test.json -- A list of file paths for testing systems</li>
</ul>
</li>
<li>A 'downstream' directory (included if utilized in downstream fine-tuning)
<ul>
<li>train.json -- A list of file paths for training systems</li>
<li>test.json -- A list of file paths for testing systems</li>
</ul>
</li>
<li>*Main data files comprising various structures</li>
<li>*Additional processing scripts</li>
</ul>
</li>
<li>
<div>The root directory contains train.json and downstream.json files that amalgamate the respective upstream and downstream splits mentioned above.</div>
</li>
<li>
<div>The datasets used in this study are described in Section S1 of the Supplementary Materials and are readily accessible on <a href="https://www.aissquare.com/" target="_blank" rel="noopener">AIS Square</a>, which provides extensive details.</div>
</li>
</ul>
</div>
<p> </p>
<h2><strong>Code:</strong></h2>
<ul>
<li>
<div>The 'code' directory, extractable from the archive <em>Code_model_script.tgz</em>, includes the DeePMD-kit's source code, which is based on PyTorch (2.0) Version. Installation and usage instructions can be found within the README file located in deepmd-pytorch-devel.zip.</div>
</li>
<li>UPDATE: deepmd-pytorch-devel-0110.zip supports unsupervised learning through denoising, see its README for more details.</li>
</ul>
<p> </p>
<h2><strong>Model:</strong></h2>
<ul>
<li>Within the 'model' directory, also found in the extracted <em>Code_model_script.tgz</em>, resides the multi-task pre-trained DPA-2 model utilized in this research. Accompanying the model is its configuration file, input.json, which details the simultaneous pre-training of this model across 18 upstream datasets with shared descriptor parameters for 1 million steps.</li>
</ul>
<p> </p>
<h2><strong>Scripts:</strong></h2>
<ul>
<li>The 'scripts' directory, part of the uncompressed <em>Code_model_script.tgz</em>, comprises all the scripts used for training, fine-tuning (learning curve analysis), and distillation in this work:
<ul>
<li>1. Upstream_single_task_training: Contains individual training scripts for DPA-2, Gemnet-OC, Equiformer-V2, Nequip, and Allegro, corresponding to the 18 upstream datasets.</li>
<li>2. Downstream_lcurve_workflow: Includes code and input files to evaluate the learning curves, including tests for DPA-2 fine-tuning transferability across 15 downstream datasets, as depicted in Figure 3 of the manuscript. </li>
<li>3. Distillation_workflow: Provides input files for distilling the fine-tuned DPA-2 models in datasets such as H2O-PBE0TS-MD, SSE-PBE-D, and FerroEle-D, as illustrated in Figure 4 of the manuscript.</li>
</ul>
</li>
<li>It is important to note that the scripts in 'Upstream_single_task_training' require the installation of deepmd-pytorch and other related models from their respective repositories (Gemnet-OC and Equiformer-V2: <a href="https://github.com/Open-Catalyst-Project/ocp" target="_blank" rel="noopener">here</a> [commit hash: 9bc9373], Nequip: <a href="https://github.com/mir-group/nequip" target="_blank" rel="noopener">here</a> [commit hash: dceaf49, tag: v0.5.6], Allegro: <a href="https://github.com/mir-group/allegro" target="_blank" rel="noopener">here</a> [commit hash: 22f673c]).</li>
<li>The scripts in 'Downstream_lcurve_workflow' and 'Distillation_workflow' leverage <a href="https://github.com/deepmodeling/dflow" target="_blank" rel="noopener">Dflow</a>—a Python framework for constructing scientific computing workflow—and <a href="https://github.com/deepmodeling/dpgen2" target="_blank" rel="noopener">dpgen2</a>, the 2nd generation of the Deep Potential GENerator, both of which are repositories in the <a href="https://deepmodeling.com/" target="_blank" rel="noopener">Deep Modeling Community</a>.</li>
</ul>