How should conversational agents respond to verbal abuse through the user? To
answer this question, we conduct a large-scale crowd-sourced evaluation of
abuse response strategies employed by current state-of-the-art systems. Our
results show that some strategies, such as "polite refusal" score highly across
the board, while for other strategies demographic factors, such as age, as well
as the severity of the preceding abuse influence the user's perception of which
response is appropriate. In addition, we find that most data-driven models lag
behind rule-based or commercial systems in terms of their perceived
appropriateness