Search CORE

80 research outputs found

Fine-Tuned Language Models Generate Stable Inorganic Materials as Text

Author: Gruver Nate
Madotto Andrea
Sriram Anuroop
Ulissi Zachary
Wilson Andrew Gordon
Zitnick C. Lawrence
Publication venue
Publication date: 06/02/2024
Field of study

We propose fine-tuning large language models for generation of stable materials. While unorthodox, fine-tuning large language models on text-encoded atomistic data is simple to implement yet reliable, with around 90% of sampled structures obeying physical constraints on atom positions and charges. Using energy above hull calculations from both learned ML potentials and gold-standard DFT calculations, we show that our strongest model (fine-tuned LLaMA-2 70B) can generate materials predicted to be metastable at about twice the rate (49% vs 28%) of CDVAE, a competing diffusion model. Because of text prompting's inherent flexibility, our models can simultaneously be used for unconditional generation of stable material, infilling of partial structures and text-conditional generation. Finally, we show that language models' ability to capture key symmetries of crystal structures improves with model scale, suggesting that the biases of pretrained LLMs are surprisingly well-suited for atomistic data.Comment: ICLR 2024. Code available at: https://github.com/facebookresearch/crystal-ll

arXiv.org e-Print Archive

GemNet-OC: Developing Graph Neural Networks for Large and Diverse Molecular Simulation Datasets

Author: Das Abhishek
Gasteiger Johannes
Günnemann Stephan
Shuaibi Muhammed
Sriram Anuroop
Ulissi Zachary
Zitnick C. Lawrence
Publication venue
Publication date: 30/09/2022
Field of study

Recent years have seen the advent of molecular simulation datasets that are orders of magnitude larger and more diverse. These new datasets differ substantially in four aspects of complexity: 1. Chemical diversity (number of different elements), 2. system size (number of atoms per sample), 3. dataset size (number of data samples), and 4. domain shift (similarity of the training and test set). Despite these large differences, benchmarks on small and narrow datasets remain the predominant method of demonstrating progress in graph neural networks (GNNs) for molecular simulation, likely due to cheaper training compute requirements. This raises the question -- does GNN progress on small and narrow datasets translate to these more complex datasets? This work investigates this question by first developing the GemNet-OC model based on the large Open Catalyst 2020 (OC20) dataset. GemNet-OC outperforms the previous state-of-the-art on OC20 by 16% while reducing training time by a factor of 10. We then compare the impact of 18 model components and hyperparameter choices on performance in multiple datasets. We find that the resulting model would be drastically different depending on the dataset used for making model choices. To isolate the source of this discrepancy we study six subsets of the OC20 dataset that individually test each of the above-mentioned four dataset aspects. We find that results on the OC-2M subset correlate well with the full OC20 dataset while being substantially cheaper to train on. Our findings challenge the common practice of developing GNNs solely on small datasets, but highlight ways of achieving fast development cycles and generalizable results via moderately-sized, representative datasets such as OC-2M and efficient models such as GemNet-OC. Our code and pretrained model weights are open-sourced

arXiv.org e-Print Archive

From Molecules to Materials: Pre-training Large Generalizable Models for Atomic Property Prediction

Author: Kitchin John R.
Kolluru Adeesh
Shoghi Nima
Ulissi Zachary W.
Wood Brandon M.
Zitnick C. Lawrence
Publication venue
Publication date: 25/10/2023
Field of study

Foundation models have been transformational in machine learning fields such as natural language processing and computer vision. Similar success in atomic property prediction has been limited due to the challenges of training effective models across multiple chemical domains. To address this, we introduce Joint Multi-domain Pre-training (JMP), a supervised pre-training strategy that simultaneously trains on multiple datasets from different chemical domains, treating each dataset as a unique pre-training task within a multi-task framework. Our combined training dataset consists of

\sim

120M systems from OC20, OC22, ANI-1x, and Transition-1x. We evaluate performance and generalization by fine-tuning over a diverse set of downstream tasks and datasets including: QM9, rMD17, MatBench, QMOF, SPICE, and MD22. JMP demonstrates an average improvement of 59% over training from scratch, and matches or sets state-of-the-art on 34 out of 40 tasks. Our work highlights the potential of pre-training strategies that utilize diverse data to advance property prediction across chemical domains, especially for low-data tasks

arXiv.org e-Print Archive

AdsorbML: A Leap in Efficiency for Adsorption Energy Calculations using Generalizable Machine Learning Potentials

Author: Das Abhishek
Lan Janice
Palizhati Aini
Shuaibi Muhammed
Ulissi Zachary W.
Uyttendaele Matt
Wander Brook
Wood Brandon M.
Zitnick C. Lawrence
Publication venue
Publication date: 01/09/2023
Field of study

Computational catalysis is playing an increasingly significant role in the design of catalysts across a wide range of applications. A common task for many computational methods is the need to accurately compute the adsorption energy for an adsorbate and a catalyst surface of interest. Traditionally, the identification of low energy adsorbate-surface configurations relies on heuristic methods and researcher intuition. As the desire to perform high-throughput screening increases, it becomes challenging to use heuristics and intuition alone. In this paper, we demonstrate machine learning potentials can be leveraged to identify low energy adsorbate-surface configurations more accurately and efficiently. Our algorithm provides a spectrum of trade-offs between accuracy and efficiency, with one balanced option finding the lowest energy configuration 87.36% of the time, while achieving a 2000x speedup in computation. To standardize benchmarking, we introduce the Open Catalyst Dense dataset containing nearly 1,000 diverse surfaces and 100,000 unique configurations.Comment: 26 pages, 7 figures. Submitted to npj Computational Material

arXiv.org e-Print Archive

Directory of Open Access Journals

Control of self-assembly in micro- and nano-scale systems

Author: Ali Mesbah
Arbuckle
Bakar
Beck
Bishnoi
Braatz
Braatz
Chen
Chen
Coffey
Cognet
Damasceno
Edlund
Fichthorn
Glotzer
Glotzer
Glotzer
Goh
Granick
Grzelczak
Hagan
Haleblian
Hermanto
Hong
Idan
Jiang
Joel A. Paulson
Jones
Juárez
Kagan
Kendall
Kevrekidis
Kevrekidis
Klavins
Klok
Kufer
Lakerveld
Lemons
Li
Libbrecht
Liu
Mapili
Mark C. Molaro
Mesbah
Moon
Palma
Pankavich
Richard D. Braatz
Schoen
Solis
Solis
Srivastava
Swan
Talreja
Talreja
Tang
Ulissi
Wakeham
Wang
Whitelam
Whitesides
Whitesides
Xiaoxiang Zhu
Xue
Zwanzig
Øksendal
Publication venue: 'Elsevier BV'
Publication date: 01/01/2015
Field of study

Control of self-assembling systems at the micro- and nano-scale provides new opportunities for the engineering of novel materials in a bottom-up fashion. These systems have several challenges associated with control including high-dimensional and stochastic nonlinear dynamics, limited sensors for real-time measurements, limited actuation for control, and kinetic trapping of the system in undesirable configurations. Three main strategies for addressing these challenges are described, which include particle design (active self-assembly), open-loop control, and closed-loop (feedback) control. The strategies are illustrated using a variety of examples such as the design of patchy and Janus particles, the toggling of magnetic fields to induce the crystallization of paramagnetic colloids, and high-throughput crystallization of organic compounds in nanoliter droplets. An outlook of the future research directions and the necessary technological advancements for control of micro- and nano-scale self-assembly is provided

Crossref

eScholarship - University of California

Sviluppo di in anticorpo monoclonale anti-Asaia (anti-Asaia monoclonal antibody)

Author: A. Amici
C. Kalogris
G. Favia
I. Ricci
M. Montani
U. Ulissi
Publication venue
Publication date: 01/01/2012
Field of study

La presente invenzione riguarda lo sviluppo di un anticorpo monoclonale prodotto contro batteri del genere Asaia. Il succitato anticorpo è caratterizzato da elevata specificità ed è stato ottenuto mediante la fusione di cellule mielomatose murine con splenociti ottenuti da topi Balb/c immunizzati con il batterio di nostro interesse, ovvero Asaia, secondo la tecnica messa a punto da Kohler e Milstein nel 1974 (Nature 256: 495-497)

Archivio istituzionale della ricerca - Università di Camerino

Application of RELAP5/Mod3.3 - Fluent coupling codes to CIRCE-HERO

Author: Angelucci M.
Barone G.
Ciolini R.
Forgione N.
Martelli D.
Tarantino M.
Ulissi C.
Publication venue: 'IOP Publishing'
Publication date: 01/01/2019
Field of study

This paper presents the work ongoing at the DICI (Dipartimento di Ingegneria Civile e Industriale) of the University of Pisa on the application of coupled methodology between Fluent CFD code and RELAP5/Mod3.3 system code. In particular, this methodology was applied to the LBE-water heat exchanger HERO, with the aim to analyse the performances of this component. The test section object of this study is installed inside the vessel S100 of the CIRCE facility, built at ENEA Brasimone Research Centre. In the proposed methodology the CFD code is adopted to simulate the LBE side of the HERO heat exchanger, whereas the secondary side (two-phase flow, water-vapour) is simulated by the STH code. In this procedure, the variables exchanged between the boundaries of the two codes are: the bulk temperature and heat transfer coefficient of the ascending water (in two-phase flow) obtained from RELAP5 and transferred to Fluent code; the wall temperature at the water side surface of the pipes is calculated by Fluent and passed to RELAP5 code. The coupling procedure was verified by comparing the obtained results with the analogous ones achieved with the RELAP5 stand-alone calculation, proving that the developed coupling methodology is reliable. Further, the coupled simulation allows to obtain more accurate information on the LBE side

Archivio della Ricerca - Università di Pisa

Application of RELAP5/Mod3.3–Fluent coupling codes to CIRCE-HERO

Author: Angelucci M.
Barone G.
Ciolini R.
Forgione N.
Martelli D.
Tarantino M.
Ulissi C.
Publication venue
Publication date: 01/01/2018
Field of study

Archivio della Ricerca - Università di Pisa

I marcatori genetici nelle indagini diagnostiche per la verifica di assenza di condizioni di alcol dipendenza in ambiente di lavoro

Author: Baldassari M.
Buscemi L.
Onori N.
Tagliabracci A
Turchi C.
Ulissi A.
Publication venue
Publication date
Field of study

IRIS UniversitÃ Politecnica delle Marche