This study evaluates a retrieval-augmented generation chatbot tailored for Osnabrück University, focusing on answer accuracy, hallucination, and the relevance of retrieved documents. Through human and automated evaluations conducted bilingually (German and English), it examines the chatbot's ability to deliver coherent, accurate, and contextually grounded responses to real user inquiries. Results highlight the chatbot's strengths in linguistic fluency and low hallucination rates but indicate variability in accuracy and context relevance. The study underscores the importance of hybrid evaluation methods combining automated metrics and targeted human assessments, offering insights into future refinement of domain-specific chatbots
Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.