7 research outputs found
Mol-Instructions: A Large-Scale Biomolecular Instruction Dataset for Large Language Models
Large Language Models (LLMs), with their remarkable task-handling
capabilities and innovative outputs, have catalyzed significant advancements
across a spectrum of fields. However, their proficiency within specialized
domains such as biomolecular studies remains limited. To address this
challenge, we introduce Mol-Instructions, a meticulously curated, comprehensive
instruction dataset expressly designed for the biomolecular realm.
Mol-Instructions is composed of three pivotal components: molecule-oriented
instructions, protein-oriented instructions, and biomolecular text
instructions, each curated to enhance the understanding and prediction
capabilities of LLMs concerning biomolecular features and behaviors. Through
extensive instruction tuning experiments on the representative LLM, we
underscore the potency of Mol-Instructions to enhance the adaptability and
cognitive acuity of large models within the complex sphere of biomolecular
studies, thereby promoting advancements in the biomolecular research community.
Mol-Instructions is made publicly accessible for future research endeavors and
will be subjected to continual updates for enhanced applicability.Comment: Project homepage: https://github.com/zjunlp/Mol-Instructions. Add
quantitative evaluation