Search CORE

66 research outputs found

Assessing public awareness of social justice documentary films based on news coverage versus social media

Author: Diesner Jana
Publication venue: 'iSchools'
Publication date: 15/03/2016
Field of study

The comprehensive measurement of the impact that information products have on individuals, groups and society is of practical relevance to many actors, including philanthropic funding organizations. In this paper we focus on assessing one dimension of impact, namely public awareness, which we conceptualize as the amount and substance of attention that information products gain from the press and social media. We are looking at a type of products that philanthropic organizations fund, namely social justice documentaries. Using topic modeling as a text summarization technique, we find that films from certain domains, such as “Politics and Government” and “Environment and Nature,” attract more attention than productions on others, such as “Gender and Ethnicity”. We also observe that film-related public discourse on social media (Facebook and non-expert reviews) has a higher overlap with the content of a film than press coverage of films does. This is partially due to the fact that social media users focus more on the topics of a production whereas the press pays strong attention to cinematographic and related features

Illinois Digital Environment for Access to Learning and Scholarship Repository

PyTAIL: Interactive and Incremental Learning of NLP Models with Human in the Loop for Online Data

Author: Diesner Jana
Mishra Shubhanshu
Publication venue
Publication date: 24/11/2022
Field of study

Online data streams make training machine learning models hard because of distribution shift and new patterns emerging over time. For natural language processing (NLP) tasks that utilize a collection of features based on lexicons and rules, it is important to adapt these features to the changing data. To address this challenge we introduce PyTAIL, a python library, which allows a human in the loop approach to actively train NLP models. PyTAIL enhances generic active learning, which only suggests new instances to label by also suggesting new features like rules and lexicons to label. Furthermore, PyTAIL is flexible enough for users to accept, reject, or update rules and lexicons as the model is being trained. Finally, we simulate the performance of PyTAIL on existing social media benchmark datasets for text classification. We compare various active learning strategies on these benchmarks. The model closes the gap with as few as 10% of the training data. Finally, we also highlight the importance of tracking evaluation metric on remaining data (which is not yet merged with active learning) alongside the test dataset. This highlights the effectiveness of the model in accurately annotating the remaining dataset, which is especially suitable for batch processing of large unlabelled corpora. PyTAIL will be available at https://github.com/socialmediaie/pytail.Comment: 9pages, 3 figures, 2 table

arXiv.org e-Print Archive

Examining the Causal Effect of First Names on Language Models: The Case of Social Commonsense Reasoning

Author: Diesner Jana
Jeoung Sullam
Kilicoglu Halil
Publication venue
Publication date: 01/06/2023
Field of study

As language models continue to be integrated into applications of personal and societal relevance, ensuring these models' trustworthiness is crucial, particularly with respect to producing consistent outputs regardless of sensitive attributes. Given that first names may serve as proxies for (intersectional) socio-demographic representations, it is imperative to examine the impact of first names on commonsense reasoning capabilities. In this paper, we study whether a model's reasoning given a specific input differs based on the first names provided. Our underlying assumption is that the reasoning about Alice should not differ from the reasoning about James. We propose and implement a controlled experimental framework to measure the causal effect of first names on commonsense reasoning, enabling us to distinguish between model predictions due to chance and caused by actual factors of interest. Our results indicate that the frequency of first names has a direct effect on model prediction, with less frequent names yielding divergent predictions compared to more frequent names. To gain insights into the internal mechanisms of models that are contributing to these behaviors, we also conduct an in-depth explainable analysis. Overall, our findings suggest that to ensure model robustness, it is essential to augment datasets with more diverse first names during the configuration stage

arXiv.org e-Print Archive

StereoMap: Quantifying the Awareness of Human-like Stereotypes in Large Language Models

Author: Diesner Jana
Ge Yubin
Jeoung Sullam
Publication venue
Publication date: 31/10/2023
Field of study

Large Language Models (LLMs) have been observed to encode and perpetuate harmful associations present in the training data. We propose a theoretically grounded framework called StereoMap to gain insights into their perceptions of how demographic groups have been viewed by society. The framework is grounded in the Stereotype Content Model (SCM); a well-established theory from psychology. According to SCM, stereotypes are not all alike. Instead, the dimensions of Warmth and Competence serve as the factors that delineate the nature of stereotypes. Based on the SCM theory, StereoMap maps LLMs' perceptions of social groups (defined by socio-demographic features) using the dimensions of Warmth and Competence. Furthermore, the framework enables the investigation of keywords and verbalizations of reasoning of LLMs' judgments to uncover underlying factors influencing their perceptions. Our results show that LLMs exhibit a diverse range of perceptions towards these groups, characterized by mixed evaluations along the dimensions of Warmth and Competence. Furthermore, analyzing the reasonings of LLMs, our findings indicate that LLMs demonstrate an awareness of social disparities, often stating statistical data and research findings to support their reasoning. This study contributes to the understanding of how LLMs perceive and represent social groups, shedding light on their potential biases and the perpetuation of harmful associations.Comment: Accepted to EMNLP 202

arXiv.org e-Print Archive

An Automated Methodology for Conducting a Social Network Study of a University Faculty

Author: David Krackhardt
Jana Diesner
Jana Diesner
Kathleen Carley
Kathleen M Carley
Kathleen M. Carley
Maksim Tsvetovat
Terrill L. Frantz
Publication venue: 'Elsevier BV'
Publication date: 01/01/2005
Field of study

Crossref

Sentiment Analysis with Incremental Human-in-the-Loop Learning and Lexical Resource Customization

Author: Byrne Jason
Diesner Jana
Mishra Shubhanshu
Surbeck Elizabeth
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2015
Field of study

The adjustment of probabilistic models for sentiment analysis to changes in language use and the perception of products can be realized via incremental learning techniques. We provide a free, open and GUI-based sentiment analysis tool that allows for a) relabeling predictions and/or adding labeled instances to retrain the weights of a given model, and b) customizing lexical resources to account for false positives and false negatives in sentiment lexicons. Our results show that incrementally updating a model with information from new and labeled instances can substantially increase accuracy. The provided solution can be particularly helpful for gradually refining or enhancing models in an easily accessible fashion while avoiding a) the costs for training a new model from scratch and b) the deterioration of prediction accuracy over time.Anheuser-Busch InBevOpe

Crossref

Illinois Digital Environment for Access to Learning and Scholarship Repository