749 research outputs found
Towards Mitigating Hallucination in Large Language Models via Self-Reflection
Large language models (LLMs) have shown promise for generative and
knowledge-intensive tasks including question-answering (QA) tasks. However, the
practical deployment still faces challenges, notably the issue of
"hallucination", where models generate plausible-sounding but unfaithful or
nonsensical information. This issue becomes particularly critical in the
medical domain due to the uncommon professional concepts and potential social
risks involved. This paper analyses the phenomenon of hallucination in medical
generative QA systems using widely adopted LLMs and datasets. Our investigation
centers on the identification and comprehension of common problematic answers,
with a specific emphasis on hallucination. To tackle this challenge, we present
an interactive self-reflection methodology that incorporates knowledge
acquisition and answer generation. Through this feedback process, our approach
steadily enhances the factuality, consistency, and entailment of the generated
answers. Consequently, we harness the interactivity and multitasking ability of
LLMs and produce progressively more precise and accurate answers. Experimental
results on both automatic and human evaluation demonstrate the superiority of
our approach in hallucination reduction compared to baselines.Comment: Accepted by the findings of EMNLP 202
Biomedical Question Answering: A Survey of Approaches and Challenges
Automatic Question Answering (QA) has been successfully applied in various
domains such as search engines and chatbots. Biomedical QA (BQA), as an
emerging QA task, enables innovative applications to effectively perceive,
access and understand complex biomedical knowledge. There have been tremendous
developments of BQA in the past two decades, which we classify into 5
distinctive approaches: classic, information retrieval, machine reading
comprehension, knowledge base and question entailment approaches. In this
survey, we introduce available datasets and representative methods of each BQA
approach in detail. Despite the developments, BQA systems are still immature
and rarely used in real-life settings. We identify and characterize several key
challenges in BQA that might lead to this issue, and discuss some potential
future directions to explore.Comment: In submission to ACM Computing Survey
- …