Conversational recommender systems (CRS) are interactive agents that support
their users in recommendation-related goals through multi-turn conversations.
Generally, a CRS can be evaluated in various dimensions. Today's CRS mainly
rely on offline(computational) measures to assess the performance of their
algorithms in comparison to different baselines. However, offline measures can
have limitations, for example, when the metrics for comparing a newly generated
response with a ground truth do not correlate with human perceptions, because
various alternative generated responses might be suitable too in a given dialog
situation. Current research on machine learning-based CRS models therefore
acknowledges the importance of humans in the evaluation process, knowing that
pure offline measures may not be sufficient in evaluating a highly interactive
system like a CRS.Comment: 6 pages, 2 figures