There has been a steep recent increase in the number of large language model
(LLM) papers, producing a dramatic shift in the scientific landscape which
remains largely undocumented through bibliometric analysis. Here, we analyze
388K papers posted on the CS and Stat arXivs, focusing on changes in
publication patterns in 2023 vs. 2018-2022. We analyze how the proportion of
LLM papers is increasing; the LLM-related topics receiving the most attention;
the authors writing LLM papers; how authors' research topics correlate with
their backgrounds; the factors distinguishing highly cited LLM papers; and the
patterns of international collaboration. We show that LLM research increasingly
focuses on societal impacts: there has been an 18x increase in the proportion
of LLM-related papers on the Computers and Society sub-arXiv, and authors newly
publishing on LLMs are more likely to focus on applications and societal
impacts than more experienced authors. LLM research is also shaped by social
dynamics: we document gender and academic/industry disparities in the topics
LLM authors focus on, and a US/China schism in the collaboration network.
Overall, our analysis documents the profound ways in which LLM research both
shapes and is shaped by society, attesting to the necessity of sociotechnical
lenses.Comment: Working pape