Large Language Models (LLMs) have become a pivotal research area, potentially
making beneficial contributions in fields like healthcare where they can
streamline automated billing and decision support. However, the frequent use of
specialized coded languages like ICD-10, which are regularly updated and
deviate from natural language formats, presents potential challenges for LLMs
in creating accurate and meaningful latent representations. This raises
concerns among healthcare professionals about potential inaccuracies or
``hallucinations" that could result in the direct impact of a patient.
Therefore, this study evaluates whether large language models (LLMs) are aware
of medical code ontologies and can accurately generate names from these codes.
We assess the capabilities and limitations of both general and
biomedical-specific generative models, such as GPT, LLaMA-2, and Meditron,
focusing on their proficiency with domain-specific terminologies. While the
results indicate that LLMs struggle with coded language, we offer insights on
how to adapt these models to reason more effectively