Mol-Instructions: A Large-Scale Biomolecular Instruction Dataset for
  Large Language Models

Chen, Huajun; Chen, Zhuo; Fan, Xiaohui; Fang, Yin; Huang, Rui; Liang, Xiaozhuan; Liu, Kangwei; Zhang, Ningyu

Mol-Instructions: A Large-Scale Biomolecular Instruction Dataset for Large Language Models

Authors: Huajun Chen
Zhuo Chen
Xiaohui Fan
Yin Fang
Rui Huang
Xiaozhuan Liang
Kangwei Liu
Ningyu Zhang
Publication date: 29 August 2023
Publisher

Abstract

Large Language Models (LLMs), with their remarkable task-handling capabilities and innovative outputs, have catalyzed significant advancements across a spectrum of fields. However, their proficiency within specialized domains such as biomolecular studies remains limited. To address this challenge, we introduce Mol-Instructions, a meticulously curated, comprehensive instruction dataset expressly designed for the biomolecular realm. Mol-Instructions is composed of three pivotal components: molecule-oriented instructions, protein-oriented instructions, and biomolecular text instructions, each curated to enhance the understanding and prediction capabilities of LLMs concerning biomolecular features and behaviors. Through extensive instruction tuning experiments on the representative LLM, we underscore the potency of Mol-Instructions to enhance the adaptability and cognitive acuity of large models within the complex sphere of biomolecular studies, thereby promoting advancements in the biomolecular research community. Mol-Instructions is made publicly accessible for future research endeavors and will be subjected to continual updates for enhanced applicability.Comment: Project homepage: https://github.com/zjunlp/Mol-Instructions. Add quantitative evaluation

Similar works

Full text

Available Versions

arXiv.org e-Print Archive

oai:arXiv.org:2306.08018

Last time updated on 10/09/2023