Assessing Answer Accuracy, Hallucination, and Document Relevance in a RAG-Based Chatbot at Osnabrück University

Abstract

This study evaluates a retrieval-augmented generation chatbot tailored for Osnabrück University, focusing on answer accuracy, hallucination, and the relevance of retrieved documents. Through human and automated evaluations conducted bilingually (German and English), it examines the chatbot's ability to deliver coherent, accurate, and contextually grounded responses to real user inquiries. Results highlight the chatbot's strengths in linguistic fluency and low hallucination rates but indicate variability in accuracy and context relevance. The study underscores the importance of hybrid evaluation methods combining automated metrics and targeted human assessments, offering insights into future refinement of domain-specific chatbots

Similar works

This paper was published in osnaDocs (Universität Osnabrück).

Having an issue?

Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.

Licence: http://creativecommons.org/licenses/by/3.0/de/