Search CORE

16 research outputs found

LLM-Assisted Content Analysis: Using Large Language Models to Support Deductive Coding

Author: Bollenbacher John
Chew Robert
Kim Annice
Speer Jessica
Wenger Michael
Publication venue
Publication date: 23/06/2023
Field of study

Deductive coding is a widely used qualitative research method for determining the prevalence of themes across documents. While useful, deductive coding is often burdensome and time consuming since it requires researchers to read, interpret, and reliably categorize a large body of unstructured text documents. Large language models (LLMs), like ChatGPT, are a class of quickly evolving AI tools that can perform a range of natural language processing and reasoning tasks. In this study, we explore the use of LLMs to reduce the time it takes for deductive coding while retaining the flexibility of a traditional content analysis. We outline the proposed approach, called LLM-assisted content analysis (LACA), along with an in-depth case study using GPT-3.5 for LACA on a publicly available deductive coding data set. Additionally, we conduct an empirical benchmark using LACA on 4 publicly available data sets to assess the broader question of how well GPT-3.5 performs across a range of deductive coding tasks. Overall, we find that GPT-3.5 can often perform deductive coding at levels of agreement comparable to human coders. Additionally, we demonstrate that LACA can help refine prompts for deductive coding, identify codes for which an LLM is randomly guessing, and help assess when to use LLMs vs. human coders for deductive coding. We conclude with several implications for future practice of deductive coding and related research methods

arXiv.org e-Print Archive

Massive Multi-Agent Data-Driven Simulations of the GitHub Ecosystem

Author: Ahn Yong-Yeol
Blythe Jim
Bollenbacher John
Ferrara Emilio
Flammini Alessandro
Huang Di
Hui Pik-Mai
Krohn Rachel
Lerman Kristina
Menczer Filippo
Muric Goran
Pacheco Diogo
Sapienza Anna
Tregubov Alexey
Weninger Tim
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 15/08/2019
Field of study

Simulating and predicting planetary-scale techno-social systems poses heavy computational and modeling challenges. The DARPA SocialSim program set the challenge to model the evolution of GitHub, a large collaborative software-development ecosystem, using massive multi-agent simulations. We describe our best performing models and our agent-based simulation framework, which we are currently extending to allow simulating other planetary-scale techno-social systems. The challenge problem measured participant's ability, given 30 months of meta-data on user activity on GitHub, to predict the next months' activity as measured by a broad range of metrics applied to ground truth, using agent-based simulation. The challenge required scaling to a simulation of roughly 3 million agents producing a combined 30 million actions, acting on 6 million repositories with commodity hardware. It was also important to use the data optimally to predict the agent's next moves. We describe the agent framework and the data analysis employed by one of the winning teams in the challenge. Six different agent models were tested based on a variety of machine learning and statistical methods. While no single method proved the most accurate on every metric, the broadly most successful sampled from a stationary probability distribution of actions and repositories for each agent. Two reasons for the success of these agents were their use of a distinct characterization of each agent, and that GitHub users change their behavior relatively slowly

arXiv.org e-Print Archive

Crossref

Stat5 Synergizes with T Cell Receptor/Antigen Stimulation in the Development of Lymphoblastic Lymphoma

Author: Bollenbacher Julie
Copeland Neal G.
Jenkins Nancy A.
Kelly John A.
Kovanen Panu E.
Lee Stephen
Leonard Warren J.
Morse Herbert C.
Pise-Masison Cynthia A.
Radonovich Michael F.
Spolski Rosanne
Suzuki Takeshi
Publication venue: The Rockefeller University Press
Publication date: 01/01/2003
Field of study

Signal transducer and activator of transcription (STAT) proteins are latent transcription factors that mediate a wide range of actions induced by cytokines, interferons, and growth factors. We now report the development of thymic T cell lymphoblastic lymphomas in transgenic mice in which Stat5a or Stat5b is overexpressed within the lymphoid compartment. The rate of lymphoma induction was markedly enhanced by immunization or by the introduction of TCR transgenes. Remarkably, the Stat5 transgene potently induced development of CD8+ T cells, even in mice expressing a class II–restricted TCR transgene, with resulting CD8+ T cell lymphomas. These data demonstrate the oncogenic potential of dysregulated expression of a STAT protein that is not constitutively activated, and that TCR stimulation can contribute to this process

CiteSeerX

Crossref

PubMed Central

Network Effects in an Agent-Based Model of Tax Evasion with Social Influence

Author: Di Huang
Diogo Pacheco
Fernando Garcia Alvarado
Jim Blythe
John Bollenbacher
Pik-Mai Hui
Rachel Krohn
Publication venue: Demazeau Y., Matson E., Corchado J., De la Prieta F.
Publication date: 01/01/2019
Field of study

Archivio istituzionale della ricerca - Università degli Studi di Venezia Ca' Foscari

CoVaxxy Tweet IDs data set

Author: Axelrod David
Bollenbacher John
Bryden John
DeVerna Matthew
Loynes Nikals
Menczer Filippo
Pierri Francesco
Torres-Lugo Christopher
Truong Bao
Yang Kai-Cheng
Publication venue
Publication date: 09/02/2021
Field of study

A collection of Tweet IDs related to Covid-19 Vaccines, gathered from Twitter since Jan 4, 2021. Please see https://arxiv.org/abs/2101.07694 for more information

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

CoVaxxy Tweet IDs data set

Author: Axelrod David
Bollenbacher John
Bryden John
DeVerna Matthew
Loynes Nikals
Menczer Filippo
Pierri Francesco
Torres-Lugo Christopher
Truong Bao
Yang Kai-Cheng
Publication venue
Publication date: 09/02/2021
Field of study

A collection of Tweet IDs related to Covid-19 Vaccines, gathered from Twitter since Jan 4, 2021. Please see https://arxiv.org/abs/2101.07694 for more information

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

CoVaxxy Tweet IDs data set

Author: Axelrod David
Bollenbacher John
Bryden John
DeVerna Matthew
Loynes Nikals
Menczer Filippo
Pierri Francesco
Torres-Lugo Christopher
Truong Bao
Yang Kai-Cheng
Publication venue
Publication date: 09/02/2021
Field of study

A collection of Tweet IDs related to Covid-19 Vaccines, gathered from Twitter since Jan 4, 2021. Please see https://arxiv.org/abs/2101.07694 for more information

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

CoVaxxy Tweet IDs data set

Author: Axelrod David
Bollenbacher John
Bryden John
DeVerna Matthew
Loynes Nikals
Menczer Filippo
Pierri Francesco
Torres-Lugo Christopher
Truong Bao
Yang Kai-Cheng
Publication venue
Publication date: 09/02/2021
Field of study

A collection of Tweet IDs related to Covid-19 Vaccines, gathered from Twitter since Jan 4, 2021. Please see https://arxiv.org/abs/2101.07694 for more information

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

CoVaxxy: A Collection of English-Language Twitter Posts About COVID-19 Vaccines

Author: Bao Tran Truong
Christopher Torres-Lugo
David Axelrod
Filippo Menczer
Francesco Pierri
John Bollenbacher
John Bryden
Kai-Cheng Yang
Matthew R. DeVerna
Niklas Loynes
Publication venue
Publication date: 01/01/2021
Field of study

With a substantial proportion of the population currently hesitant to take the COVID-19 vaccine, it is important that people have access to accurate information. However, there is a large amount of low-credibility information about vaccines spreading on social media. In this paper, we present the CoVaxxy dataset, a growing collection of English-language Twitter posts about COVID-19 vaccines. Using one week of data, we provide statistics regarding the numbers of tweets over time, the hashtags used, and the websites shared. We also illustrate how these data might be utilized by performing an analysis of the prevalence over time of high- and low-credibility sources, topic groups of hashtags, and geographical distributions. Additionally, we develop and present the CoVaxxy dashboard, allowing people to visualize the relationship between COVID-19 vaccine adoption and U.S. geo-located posts in our dataset. This dataset can be used to study the impact of online information on COVID-19 health outcomes (e.g., vaccine uptake) and our dashboard can help with exploration of the data

Archivio istituzionale della ricerca - Politecnico di Milano

Association for the Advancement of Artificial Intelligence: AAAI Publications