4 research outputs found
OphGLM: Training an Ophthalmology Large Language-and-Vision Assistant based on Instructions and Dialogue
Large multimodal language models (LMMs) have achieved significant success in
general domains. However, due to the significant differences between medical
images and text and general web content, the performance of LMMs in medical
scenarios is limited. In ophthalmology, clinical diagnosis relies on multiple
modalities of medical images, but unfortunately, multimodal ophthalmic large
language models have not been explored to date. In this paper, we study and
construct an ophthalmic large multimodal model. Firstly, we use fundus images
as an entry point to build a disease assessment and diagnosis pipeline to
achieve common ophthalmic disease diagnosis and lesion segmentation. Then, we
establish a new ophthalmic multimodal instruction-following and dialogue
fine-tuning dataset based on disease-related knowledge data and publicly
available real-world medical dialogue. We introduce visual ability into the
large language model to complete the ophthalmic large language and vision
assistant (OphGLM). Our experimental results demonstrate that the OphGLM model
performs exceptionally well, and it has the potential to revolutionize clinical
applications in ophthalmology. The dataset, code, and models will be made
publicly available at https://github.com/ML-AILab/OphGLM.Comment: OphGLM:The first ophthalmology large language-and-vision assistant
based on instructions and dialogu