Search CORE

35 research outputs found

What indeed can GPT models do in chemistry? A comprehensive benchmark on eight tasks

Author: Chawla Nitesh V.
Guo Kehan
Guo Taicheng
Guo Zhichun
Liang Zhengwen
nan Bozhao
Wiest Olaf
Zhang Xiangliang
Publication venue
Publication date: 27/05/2023
Field of study

Large Language Models (LLMs) with strong abilities in natural language processing tasks have emerged and have been rapidly applied in various kinds of areas such as science, finance and software engineering. However, the capability of LLMs to advance the field of chemistry remains unclear. In this paper,we establish a comprehensive benchmark containing 8 practical chemistry tasks, including 1) name prediction, 2) property prediction, 3) yield prediction, 4) reaction prediction, 5) retrosynthesis (prediction of reactants from products), 6)text-based molecule design, 7) molecule captioning, and 8) reagent selection. Our analysis draws on widely recognized datasets including BBBP, Tox21, PubChem, USPTO, and ChEBI, facilitating a broad exploration of the capacities of LLMs within the context of practical chemistry. Three GPT models (GPT-4, GPT-3.5,and Davinci-003) are evaluated for each chemistry task in zero-shot and few-shot in-context learning settings with carefully selected demonstration examples and specially crafted prompts. The key results of our investigation are 1) GPT-4 outperforms the other two models among the three evaluated; 2) GPT models exhibit less competitive performance in tasks demanding precise understanding of molecular SMILES representation, such as reaction prediction and retrosynthesis;3) GPT models demonstrate strong capabilities in text-related explanation tasks such as molecule captioning; and 4) GPT models exhibit comparable or better performance to classical machine learning models when applied to chemical problems that can be transformed into classification or ranking tasks, such as property prediction, and yield prediction

arXiv.org e-Print Archive

Data Interpreter: An LLM Agent For Data Science

Author: Chen Jiaqi
Cheng Yuheng
Fei Yaying
Guo Taicheng
Hong Sirui
Li Danyang
Liang Xinbing
Lin Yizhang
Liu Bang
Liu Bangbang
Lu Xiangtao
Tang Xiangru
Tao Wei
Wang Jinlin
Wang Wenyi
Wu Binhao
Wu Chenglin
Xu Zongze
Yang Min
Zhang Jiayi
Zhang Li
Zhang Lingyao
Zheng Xiawu
Zhou Tuo
Zhuge Mingchen
Publication venue
Publication date: 12/03/2024
Field of study

Large Language Model (LLM)-based agents have demonstrated remarkable effectiveness. However, their performance can be compromised in data science scenarios that require real-time data adjustment, expertise in optimization due to complex dependencies among various tasks, and the ability to identify logical errors for precise reasoning. In this study, we introduce the Data Interpreter, a solution designed to solve with code that emphasizes three pivotal techniques to augment problem-solving in data science: 1) dynamic planning with hierarchical graph structures for real-time data adaptability;2) tool integration dynamically to enhance code proficiency during execution, enriching the requisite expertise;3) logical inconsistency identification in feedback, and efficiency enhancement through experience recording. We evaluate the Data Interpreter on various data science and real-world tasks. Compared to open-source baselines, it demonstrated superior performance, exhibiting significant improvements in machine learning tasks, increasing from 0.86 to 0.95. Additionally, it showed a 26% increase in the MATH dataset and a remarkable 112% improvement in open-ended tasks. The solution will be released at https://github.com/geekan/MetaGPT

arXiv.org e-Print Archive

Formation and optical properties of brown carbon from small α-dicarbonyls and amines

Author: An Taicheng
Du Zhuofei
Gomez-Hermandez Mario
Guo Song
Hu Min
Ji Yuemeng
Li Yixin
Lin Yun
Marrero-Ortiz Wilmarie
Peng Jianfei
Secrest Jeremiah
Wang Yuan
Wang Yujue
Zamora Misti L.
Zhang Renyi
Publication venue: 'American Chemical Society (ACS)'
Publication date: 02/01/2019
Field of study

Brown Carbon (BrC) aerosols scatter and absorb solar radiation, directly affecting the Earth’s radiative budget. However, considerable uncertainty exists concerning the chemical mechanism leading to BrC formation and their optical properties. In this work, BrC particles were prepared from mixtures of small α-dicarbonyls (glyoxal and methylglyoxal) and amines (methylamine, dimethylamine, and trimethylamine). The absorption and scattering of BrC particles were measured using a photoacoustic extinctometer (405 and 532 nm), and the chemical composition of the α-dicarbonyl-amine mixtures was analyzed using orbitrap-mass spectrometry and thermal desorption-ion drift-chemical ionization mass spectrometry. The single scattering albedo for methylglyoxal-amine mixtures is smaller than that of glyoxal-amine mixtures and increases with the methyl substitution of amines. The mass absorption cross-section for methylglyoxal-amine mixtures is two times higher at 405 nm wavelength than that at 532 nm wavelength. The derived refractive indexes at the 405 nm wavelength are 1.40–1.64 for the real part and 0.002–0.195 for the imaginary part. Composition analysis in the α-dicarbonyl-amine mixtures reveals N-heterocycles as the dominant products, which are formed via multiple steps involving nucleophilic attack, steric hindrance, and dipole–dipole interaction between α-dicarbonyls and amines. BrC aerosols, if formed from the particle-phase reaction of methylglyoxal with methylamine, likely contribute to atmospheric warming

Caltech Authors

The Francis Crick Institute