66 research outputs found

    Assessing public awareness of social justice documentary films based on news coverage versus social media

    Get PDF
    The comprehensive measurement of the impact that information products have on individuals, groups and society is of practical relevance to many actors, including philanthropic funding organizations. In this paper we focus on assessing one dimension of impact, namely public awareness, which we conceptualize as the amount and substance of attention that information products gain from the press and social media. We are looking at a type of products that philanthropic organizations fund, namely social justice documentaries. Using topic modeling as a text summarization technique, we find that films from certain domains, such as “Politics and Government” and “Environment and Nature,” attract more attention than productions on others, such as “Gender and Ethnicity”. We also observe that film-related public discourse on social media (Facebook and non-expert reviews) has a higher overlap with the content of a film than press coverage of films does. This is partially due to the fact that social media users focus more on the topics of a production whereas the press pays strong attention to cinematographic and related features

    PyTAIL: Interactive and Incremental Learning of NLP Models with Human in the Loop for Online Data

    Full text link
    Online data streams make training machine learning models hard because of distribution shift and new patterns emerging over time. For natural language processing (NLP) tasks that utilize a collection of features based on lexicons and rules, it is important to adapt these features to the changing data. To address this challenge we introduce PyTAIL, a python library, which allows a human in the loop approach to actively train NLP models. PyTAIL enhances generic active learning, which only suggests new instances to label by also suggesting new features like rules and lexicons to label. Furthermore, PyTAIL is flexible enough for users to accept, reject, or update rules and lexicons as the model is being trained. Finally, we simulate the performance of PyTAIL on existing social media benchmark datasets for text classification. We compare various active learning strategies on these benchmarks. The model closes the gap with as few as 10% of the training data. Finally, we also highlight the importance of tracking evaluation metric on remaining data (which is not yet merged with active learning) alongside the test dataset. This highlights the effectiveness of the model in accurately annotating the remaining dataset, which is especially suitable for batch processing of large unlabelled corpora. PyTAIL will be available at https://github.com/socialmediaie/pytail.Comment: 9pages, 3 figures, 2 table

    Examining the Causal Effect of First Names on Language Models: The Case of Social Commonsense Reasoning

    Full text link
    As language models continue to be integrated into applications of personal and societal relevance, ensuring these models' trustworthiness is crucial, particularly with respect to producing consistent outputs regardless of sensitive attributes. Given that first names may serve as proxies for (intersectional) socio-demographic representations, it is imperative to examine the impact of first names on commonsense reasoning capabilities. In this paper, we study whether a model's reasoning given a specific input differs based on the first names provided. Our underlying assumption is that the reasoning about Alice should not differ from the reasoning about James. We propose and implement a controlled experimental framework to measure the causal effect of first names on commonsense reasoning, enabling us to distinguish between model predictions due to chance and caused by actual factors of interest. Our results indicate that the frequency of first names has a direct effect on model prediction, with less frequent names yielding divergent predictions compared to more frequent names. To gain insights into the internal mechanisms of models that are contributing to these behaviors, we also conduct an in-depth explainable analysis. Overall, our findings suggest that to ensure model robustness, it is essential to augment datasets with more diverse first names during the configuration stage

    StereoMap: Quantifying the Awareness of Human-like Stereotypes in Large Language Models

    Full text link
    Large Language Models (LLMs) have been observed to encode and perpetuate harmful associations present in the training data. We propose a theoretically grounded framework called StereoMap to gain insights into their perceptions of how demographic groups have been viewed by society. The framework is grounded in the Stereotype Content Model (SCM); a well-established theory from psychology. According to SCM, stereotypes are not all alike. Instead, the dimensions of Warmth and Competence serve as the factors that delineate the nature of stereotypes. Based on the SCM theory, StereoMap maps LLMs' perceptions of social groups (defined by socio-demographic features) using the dimensions of Warmth and Competence. Furthermore, the framework enables the investigation of keywords and verbalizations of reasoning of LLMs' judgments to uncover underlying factors influencing their perceptions. Our results show that LLMs exhibit a diverse range of perceptions towards these groups, characterized by mixed evaluations along the dimensions of Warmth and Competence. Furthermore, analyzing the reasonings of LLMs, our findings indicate that LLMs demonstrate an awareness of social disparities, often stating statistical data and research findings to support their reasoning. This study contributes to the understanding of how LLMs perceive and represent social groups, shedding light on their potential biases and the perpetuation of harmful associations.Comment: Accepted to EMNLP 202

    Sentiment Analysis with Incremental Human-in-the-Loop Learning and Lexical Resource Customization

    Get PDF
    The adjustment of probabilistic models for sentiment analysis to changes in language use and the perception of products can be realized via incremental learning techniques. We provide a free, open and GUI-based sentiment analysis tool that allows for a) relabeling predictions and/or adding labeled instances to retrain the weights of a given model, and b) customizing lexical resources to account for false positives and false negatives in sentiment lexicons. Our results show that incrementally updating a model with information from new and labeled instances can substantially increase accuracy. The provided solution can be particularly helpful for gradually refining or enhancing models in an easily accessible fashion while avoiding a) the costs for training a new model from scratch and b) the deterioration of prediction accuracy over time.Anheuser-Busch InBevOpe
    • …
    corecore