1 research outputs found
MoCa: Measuring Human-Language Model Alignment on Causal and Moral Judgment Tasks
Human commonsense understanding of the physical and social world is organized
around intuitive theories. These theories support making causal and moral
judgments. When something bad happens, we naturally ask: who did what, and why?
A rich literature in cognitive science has studied people's causal and moral
intuitions. This work has revealed a number of factors that systematically
influence people's judgments, such as the violation of norms and whether the
harm is avoidable or inevitable. We collected a dataset of stories from 24
cognitive science papers and developed a system to annotate each story with the
factors they investigated. Using this dataset, we test whether large language
models (LLMs) make causal and moral judgments about text-based scenarios that
align with those of human participants. On the aggregate level, alignment has
improved with more recent LLMs. However, using statistical analyses, we find
that LLMs weigh the different factors quite differently from human
participants. These results show how curated, challenge datasets combined with
insights from cognitive science can help us go beyond comparisons based merely
on aggregate metrics: we uncover LLMs implicit tendencies and show to what
extent these align with human intuitions.Comment: 34 pages, 7 figures. NeurIPS 202