Recently, the impressive performance of large language models (LLMs) on a
wide range of tasks has attracted an increasing number of attempts to apply
LLMs in drug discovery. However, molecule optimization, a critical task in the
drug discovery pipeline, is currently an area that has seen little involvement
from LLMs. Most of existing approaches focus solely on capturing the underlying
patterns in chemical structures provided by the data, without taking advantage
of expert feedback. These non-interactive approaches overlook the fact that the
drug discovery process is actually one that requires the integration of expert
experience and iterative refinement. To address this gap, we propose
DrugAssist, an interactive molecule optimization model which performs
optimization through human-machine dialogue by leveraging LLM's strong
interactivity and generalizability. DrugAssist has achieved leading results in
both single and multiple property optimization, simultaneously showcasing
immense potential in transferability and iterative optimization. In addition,
we publicly release a large instruction-based dataset called
MolOpt-Instructions for fine-tuning language models on molecule optimization
tasks. We have made our code and data publicly available at
https://github.com/blazerye/DrugAssist, which we hope to pave the way for
future research in LLMs' application for drug discovery.Comment: Geyan Ye and Xibao Cai are equal contributors; Longyue Wang is
corresponding autho