4 research outputs found

    A Data Fusion Framework for Multi-Domain Morality Learning

    Full text link
    Language models can be trained to recognize the moral sentiment of text, creating new opportunities to study the role of morality in human life. As interest in language and morality has grown, several ground truth datasets with moral annotations have been released. However, these datasets vary in the method of data collection, domain, topics, instructions for annotators, etc. Simply aggregating such heterogeneous datasets during training can yield models that fail to generalize well. We describe a data fusion framework for training on multiple heterogeneous datasets that improve performance and generalizability. The model uses domain adversarial training to align the datasets in feature space and a weighted loss function to deal with label shift. We show that the proposed framework achieves state-of-the-art performance in different datasets compared to prior works in morality inference

    Non-Binary Gender Expression in Online Interactions

    Full text link
    Many openly non-binary gender individuals participate in social networks. However, the relationship between gender and online interactions is not well understood, which may result in disparate treatment by large language models. We investigate individual identity on Twitter, focusing on gender expression as represented by users chosen pronouns. We find that non-binary groups tend to receive less attention in the form of likes and followers. We also find that nonbinary users send and receive tweets with above-average toxicity. The study highlights the importance of considering gender as a spectrum, rather than a binary, in understanding online interactions and expression

    A Data Fusion Framework for Multi-Domain Morality Learning

    No full text
    Language models can be trained to recognize the moral sentiment of text, creating new opportunities to study the role of morality in human life. As interest in language and morality has grown, several ground truth datasets with moral annotations have been released. However, these datasets vary in the method of data collection, domain, topics, instructions for annotators, etc. Simply aggregating such heterogeneous datasets during training can yield models that fail to generalize well. We describe a data fusion framework for training on multiple heterogeneous datasets that improve performance and generalizability. The model uses domain adversarial training to align the datasets in feature space and a weighted loss function to deal with label shift. We show that the proposed framework achieves state-of-the-art performance in different datasets compared to prior works in morality inference
    corecore