Objective: To evaluate the efficiency of large language models (LLMs) such as
ChatGPT to assist in diagnosing neuro-ophthalmic diseases based on detailed
case descriptions. Methods: We selected 22 different case reports of
neuro-ophthalmic diseases from a publicly available online database. These
cases included a wide range of chronic and acute diseases that are commonly
seen by neuro-ophthalmic sub-specialists. We inserted the text from each case
as a new prompt into both ChatGPT v3.5 and ChatGPT Plus v4.0 and asked for the
most probable diagnosis. We then presented the exact information to two
neuro-ophthalmologists and recorded their diagnoses followed by comparison to
responses from both versions of ChatGPT. Results: ChatGPT v3.5, ChatGPT Plus
v4.0, and the two neuro-ophthalmologists were correct in 13 (59%), 18 (82%), 19
(86%), and 19 (86%) out of 22 cases, respectively. The agreement between the
various diagnostic sources were as follows: ChatGPT v3.5 and ChatGPT Plus v4.0,
13 (59%); ChatGPT v3.5 and the first neuro-ophthalmologist, 12 (55%); ChatGPT
v3.5 and the second neuro-ophthalmologist, 12 (55%); ChatGPT Plus v4.0 and the
first neuro-ophthalmologist, 17 (77%); ChatGPT Plus v4.0 and the second
neuro-ophthalmologist, 16 (73%); and first and second neuro-ophthalmologists 17
(17%). Conclusions: The accuracy of ChatGPT v3.5 and ChatGPT Plus v4.0 in
diagnosing patients with neuro-ophthalmic diseases was 59% and 82%,
respectively. With further development, ChatGPT Plus v4.0 may have potential to
be used in clinical care settings to assist clinicians in providing quick,
accurate diagnoses of patients in neuro-ophthalmology. The applicability of
using LLMs like ChatGPT in clinical settings that lack access to subspeciality
trained neuro-ophthalmologists deserves further research