Search CORE

3 research outputs found

Evaluating the Performance of ChatGPT in Ophthalmology

Author: Daniel Milad MD
Fares Antaki MD, CM
Jonathan El-Khoury MD
Renaud Duval MD
Samir Touma MD, CM
Publication venue: Elsevier
Publication date: 01/12/2023
Field of study

Purpose: Foundation models are a novel type of artificial intelligence algorithms, in which models are pretrained at scale on unannotated data and fine-tuned for a myriad of downstream tasks, such as generating text. This study assessed the accuracy of ChatGPT, a large language model (LLM), in the ophthalmology question-answering space. Design: Evaluation of diagnostic test or technology. Participants: ChatGPT is a publicly available LLM. Methods: We tested 2 versions of ChatGPT (January 9 “legacy” and ChatGPT Plus) on 2 popular multiple choice question banks commonly used to prepare for the high-stakes Ophthalmic Knowledge Assessment Program (OKAP) examination. We generated two 260-question simulated exams from the Basic and Clinical Science Course (BCSC) Self-Assessment Program and the OphthoQuestions online question bank. We carried out logistic regression to determine the effect of the examination section, cognitive level, and difficulty index on answer accuracy. We also performed a post hoc analysis using Tukey’s test to decide if there were meaningful differences between the tested subspecialties. Main Outcome Measures: We reported the accuracy of ChatGPT for each examination section in percentage correct by comparing ChatGPT’s outputs with the answer key provided by the question banks. We presented logistic regression results with a likelihood ratio (LR) chi-square. We considered differences between examination sections statistically significant at a P value of < 0.05. Results: The legacy model achieved 55.8% accuracy on the BCSC set and 42.7% on the OphthoQuestions set. With ChatGPT Plus, accuracy increased to 59.4% ± 0.6% and 49.2% ± 1.0%, respectively. Accuracy improved with easier questions when controlling for the examination section and cognitive level. Logistic regression analysis of the legacy model showed that the examination section (LR, 27.57; P = 0.006) followed by question difficulty (LR, 24.05; P < 0.001) were most predictive of ChatGPT’s answer accuracy. Although the legacy model performed best in general medicine and worst in neuro-ophthalmology (P < 0.001) and ocular pathology (P = 0.029), similar post hoc findings were not seen with ChatGPT Plus, suggesting more consistent results across examination sections. Conclusion: ChatGPT has encouraging performance on a simulated OKAP examination. Specializing LLMs through domain-specific pretraining may be necessary to improve their performance in ophthalmic subspecialties. Financial Disclosure(s): Proprietary or commercial disclosure may be found in the Footnotes and Disclosures at the end of this article

Directory of Open Access Journals

Ocular, Auricular, and Oral Manifestations of Inflammatory Bowel Disease

Author: A Dupuy
A Galor
A Grasland
A Greco
A Ragam
AA Banares
AM Asuncion
AP Zbar
B Bodaghi
B Mijandrusic-Sincic
B Thrash
BF Haynes
Brian Duff
C Lunardi
C Prendiville
CA Durno
CM Healy
CN Bernstein
CS Karmody
CS Manolakis
DA Jabs
DA Margileth
DB Cury
DE Kardon
DJ Touma
E Michailidou
EA Petrelli
EB Suhler
EB Suhler
EW St Clair
F Alawi
F Femiano
Francis A. Farraye
FT Veloso
G Clare
G Ficarra
G Litsas
GJ Jaffe
ID O’Neill
J Lysy
J Mate-Jimenez
J Mathews
JH Cho
JN Martel
Judy Nee
K Durrani
KB Lankarani
KM Das
L Renfro
L Riente
M Cordero-Coma
M Plauth
M Rowland
M Salmi
MC Santos Sousa Dos
MH Tan
MJ McCullough
MJ Ruckenstein
MK Elias
ML Martinez Martinez
MR Ally
MS Sapienza
N Akbayir
NB Allen
P Doctor
Pranjal Thakuria
QD Nguyen
R Ghadban
R Mady
R Mintz
RC Tripathi
RS Vollertsen
S Pittock
S Verma
Samir A. Shah
Sean Fine
SR Vavricka
SS Broughton
SV Lourenco
T Bozkurt
T Felekis
TR Orchard
TS Hansen
V Cloche
Z Touma
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref