1 research outputs found
Using Large Language Models to Simulate Multiple Humans and Replicate Human Subject Studies
We introduce a new type of test, called a Turing Experiment (TE), for
evaluating how well a language model, such as GPT-3, can simulate different
aspects of human behavior. Unlike the Turing Test, which involves simulating a
single arbitrary individual, a TE requires simulating a representative sample
of participants in human subject research. We give TEs that attempt to
replicate well-established findings in prior studies. We design a methodology
for simulating TEs and illustrate its use to compare how well different
language models are able to reproduce classic economic, psycholinguistic, and
social psychology experiments: Ultimatum Game, Garden Path Sentences, Milgram
Shock Experiment, and Wisdom of Crowds. In the first three TEs, the existing
findings were replicated using recent models, while the last TE reveals a
"hyper-accuracy distortion" present in some language models.Comment: Added Turing Experiment (TE) framing and Wisdom of Crowds T