The recent proliferation of research into transformer based natural language
processing has led to a number of studies which attempt to detect the presence
of human-like cognitive behavior in the models. We contend that, as is true of
human psychology, the investigation of cognitive behavior in language models
must be conducted in an appropriate population of an appropriate size for the
results to be meaningful. We leverage work in uncertainty estimation in a novel
approach to efficiently construct experimental populations. The resultant tool,
PopulationLM, has been made open source. We provide theoretical grounding in
the uncertainty estimation literature and motivation from current cognitive
work regarding language models. We discuss the methodological lessons from
other scientific communities and attempt to demonstrate their application to
two artificial population studies. Through population based experimentation we
find that language models exhibit behavior consistent with typicality effects
among categories highly represented in training. However, we find that language
models don't tend to exhibit structural priming effects. Generally, our results
show that single models tend to over estimate the presence of cognitive
behaviors in neural models