Large language models shape and are shaped by society: A survey of arXiv
  publication patterns

Agostini, Gabriel; Balachandar, Sidhika; Garg, Nikhil; Movva, Rajiv; Peng, Kenny; Pierson, Emma

Large language models shape and are shaped by society: A survey of arXiv publication patterns

Authors: Gabriel Agostini
Sidhika Balachandar
Nikhil Garg
Rajiv Movva
Kenny Peng
Emma Pierson
Publication date: 20 July 2023
Publisher

Abstract

There has been a steep recent increase in the number of large language model (LLM) papers, producing a dramatic shift in the scientific landscape which remains largely undocumented through bibliometric analysis. Here, we analyze 388K papers posted on the CS and Stat arXivs, focusing on changes in publication patterns in 2023 vs. 2018-2022. We analyze how the proportion of LLM papers is increasing; the LLM-related topics receiving the most attention; the authors writing LLM papers; how authors' research topics correlate with their backgrounds; the factors distinguishing highly cited LLM papers; and the patterns of international collaboration. We show that LLM research increasingly focuses on societal impacts: there has been an 18x increase in the proportion of LLM-related papers on the Computers and Society sub-arXiv, and authors newly publishing on LLMs are more likely to focus on applications and societal impacts than more experienced authors. LLM research is also shaped by social dynamics: we document gender and academic/industry disparities in the topics LLM authors focus on, and a US/China schism in the collaboration network. Overall, our analysis documents the profound ways in which LLM research both shapes and is shaped by society, attesting to the necessity of sociotechnical lenses.Comment: Working pape

Similar works

Full text

Available Versions

arXiv.org e-Print Archive

oai:arXiv.org:2307.10700

Last time updated on 26/07/2023