16 research outputs found

    LLM-Assisted Content Analysis: Using Large Language Models to Support Deductive Coding

    Full text link
    Deductive coding is a widely used qualitative research method for determining the prevalence of themes across documents. While useful, deductive coding is often burdensome and time consuming since it requires researchers to read, interpret, and reliably categorize a large body of unstructured text documents. Large language models (LLMs), like ChatGPT, are a class of quickly evolving AI tools that can perform a range of natural language processing and reasoning tasks. In this study, we explore the use of LLMs to reduce the time it takes for deductive coding while retaining the flexibility of a traditional content analysis. We outline the proposed approach, called LLM-assisted content analysis (LACA), along with an in-depth case study using GPT-3.5 for LACA on a publicly available deductive coding data set. Additionally, we conduct an empirical benchmark using LACA on 4 publicly available data sets to assess the broader question of how well GPT-3.5 performs across a range of deductive coding tasks. Overall, we find that GPT-3.5 can often perform deductive coding at levels of agreement comparable to human coders. Additionally, we demonstrate that LACA can help refine prompts for deductive coding, identify codes for which an LLM is randomly guessing, and help assess when to use LLMs vs. human coders for deductive coding. We conclude with several implications for future practice of deductive coding and related research methods

    Massive Multi-Agent Data-Driven Simulations of the GitHub Ecosystem

    Full text link
    Simulating and predicting planetary-scale techno-social systems poses heavy computational and modeling challenges. The DARPA SocialSim program set the challenge to model the evolution of GitHub, a large collaborative software-development ecosystem, using massive multi-agent simulations. We describe our best performing models and our agent-based simulation framework, which we are currently extending to allow simulating other planetary-scale techno-social systems. The challenge problem measured participant's ability, given 30 months of meta-data on user activity on GitHub, to predict the next months' activity as measured by a broad range of metrics applied to ground truth, using agent-based simulation. The challenge required scaling to a simulation of roughly 3 million agents producing a combined 30 million actions, acting on 6 million repositories with commodity hardware. It was also important to use the data optimally to predict the agent's next moves. We describe the agent framework and the data analysis employed by one of the winning teams in the challenge. Six different agent models were tested based on a variety of machine learning and statistical methods. While no single method proved the most accurate on every metric, the broadly most successful sampled from a stationary probability distribution of actions and repositories for each agent. Two reasons for the success of these agents were their use of a distinct characterization of each agent, and that GitHub users change their behavior relatively slowly

    Stat5 Synergizes with T Cell Receptor/Antigen Stimulation in the Development of Lymphoblastic Lymphoma

    Get PDF
    Signal transducer and activator of transcription (STAT) proteins are latent transcription factors that mediate a wide range of actions induced by cytokines, interferons, and growth factors. We now report the development of thymic T cell lymphoblastic lymphomas in transgenic mice in which Stat5a or Stat5b is overexpressed within the lymphoid compartment. The rate of lymphoma induction was markedly enhanced by immunization or by the introduction of TCR transgenes. Remarkably, the Stat5 transgene potently induced development of CD8+ T cells, even in mice expressing a class II–restricted TCR transgene, with resulting CD8+ T cell lymphomas. These data demonstrate the oncogenic potential of dysregulated expression of a STAT protein that is not constitutively activated, and that TCR stimulation can contribute to this process

    CoVaxxy Tweet IDs data set

    No full text
    A collection of Tweet IDs related to Covid-19 Vaccines, gathered from Twitter since Jan 4, 2021. Please see https://arxiv.org/abs/2101.07694 for more information

    CoVaxxy Tweet IDs data set

    No full text
    A collection of Tweet IDs related to Covid-19 Vaccines, gathered from Twitter since Jan 4, 2021. Please see https://arxiv.org/abs/2101.07694 for more information

    CoVaxxy Tweet IDs data set

    No full text
    A collection of Tweet IDs related to Covid-19 Vaccines, gathered from Twitter since Jan 4, 2021. Please see https://arxiv.org/abs/2101.07694 for more information

    CoVaxxy Tweet IDs data set

    No full text
    A collection of Tweet IDs related to Covid-19 Vaccines, gathered from Twitter since Jan 4, 2021. Please see https://arxiv.org/abs/2101.07694 for more information

    CoVaxxy: A Collection of English-Language Twitter Posts About COVID-19 Vaccines

    No full text
    With a substantial proportion of the population currently hesitant to take the COVID-19 vaccine, it is important that people have access to accurate information. However, there is a large amount of low-credibility information about vaccines spreading on social media. In this paper, we present the CoVaxxy dataset, a growing collection of English-language Twitter posts about COVID-19 vaccines. Using one week of data, we provide statistics regarding the numbers of tweets over time, the hashtags used, and the websites shared. We also illustrate how these data might be utilized by performing an analysis of the prevalence over time of high- and low-credibility sources, topic groups of hashtags, and geographical distributions. Additionally, we develop and present the CoVaxxy dashboard, allowing people to visualize the relationship between COVID-19 vaccine adoption and U.S. geo-located posts in our dataset. This dataset can be used to study the impact of online information on COVID-19 health outcomes (e.g., vaccine uptake) and our dashboard can help with exploration of the data
    corecore