7 research outputs found

    Gender bias in machine learning for sentiment analysis

    Get PDF
    This is an accepted manuscript of an article published by Emerald Publishing Limited in Online Information Review on 01/01/2018, available online: https://doi.org/10.1108/OIR-05-2017-0153 The accepted version of the publication may differ from the final published version.Purpose: This paper investigates whether machine learning induces gender biases in the sense of results that are more accurate for male authors than for female authors. It also investigates whether training separate male and female variants could improve the accuracy of machine learning for sentiment analysis. Design/methodology/approach: This article uses ratings-balanced sets of reviews of restaurants and hotels (3 sets) to train algorithms with and without gender selection. Findings: Accuracy is higher on female-authored reviews than on male-authored reviews for all data sets, so applications of sentiment analysis using mixed gender datasets will over represent the opinions of women. Training on same gender data improves performance less than having additional data from both genders. Practical implications: End users of sentiment analysis should be aware that its small gender biases can affect the conclusions drawn from it and apply correction factors when necessary. Users of systems that incorporate sentiment analysis should be aware that performance will vary by author gender. Developers do not need to create gender-specific algorithms unless they have more training data than their system can cope with. Originality/value: This is the first demonstration of gender bias in machine learning sentiment analysis

    Monitoring bias and fairness in machine learning models: A review

    Get PDF
    Introduction: Machine learning algorithms are quickly gaining traction in both the private and public sectors for their ability to automate both simple and complex decision-making processes. The vast majority of economic sectors, including transportation, retail, advertisement, and energy, are being disrupted by widespread data digitization and the emerging technologies that leverage it. Computerized systems are being introduced in government operations to improve accuracy and objectivity, and AI is having an impact on democracy and governance [1]

    SSentiaA: A Self-Supervised Sentiment Analyzer for Classification From Unlabeled Data

    Get PDF
    In recent years, supervised machine learning (ML) methods have realized remarkable performance gains for sentiment classification utilizing labeled data. However, labeled data are usually expensive to obtain, thus, not always achievable. When annotated data are unavailable, the unsupervised tools are exercised, which still lag behind the performance of supervised ML methods by a large margin. Therefore, in this work, we focus on improving the performance of sentiment classification from unlabeled data. We present a self-supervised hybrid methodology SSentiA (Self-supervised Sentiment Analyzer) that couples an ML classifier with a lexicon-based method for sentiment classification from unlabeled data. We first introduce LRSentiA (Lexical Rule-based Sentiment Analyzer), a lexicon-based method to predict the semantic orientation of a review along with the confidence score of prediction. Utilizing the confidence scores of LRSentiA, we generate highly accurate pseudo-labels for SSentiA that incorporates a supervised ML algorithm to improve the performance of sentiment classification for less polarized and complex reviews. We compare the performances of LRSentiA and SSSentA with the existing unsupervised, lexicon-based and self-supervised methods in multiple datasets. The LRSentiA performs similarly to the existing lexicon-based methods in both binary and 3-class sentiment analysis. By combining LRSentiA with an ML classifier, the hybrid approach SSentiA attains 10%–30% improvements in macro F1 score for both binary and 3-class sentiment analysis. The results suggest that in domains where annotated data are unavailable, SSentiA can significantly improve the performance of sentiment classification. Moreover, we demonstrate that using 30%–60% annotated training data, SSentiA delivers similar performances of the fully labeled training dataset

    Digital footprints of Kashmiri pandit migration on Twitter

    Get PDF
    © 2022 The Authors. Published by EPI. This is an open access article available under a Creative Commons licence. The published version can be accessed at the following link on the publisher’s website: https://doi.org/10.3145/epi.2022.nov.07The paper investigates changing levels of online concern about the Kashmiri Pandit migration of the 1990s on Twitter. Although decades old, this movement of people is an ongoing issue in India, with no current resolution. Analysing changing reactions to it on social media may shed light on trends in public attitudes to the event. Tweets were downloaded from Twitter using the academic version of its application programming interface (API) with the aid of the free social media analytics software Mozdeh. A set of 1000 tweets was selected for content analysis with a random number generator in Mozdeh. The results show that the number of tweets about the issue has increased over time, mainly from India, and predominantly driven by the release of films like Shikara and The Kashmir Files. The tweets show apparent universal sup-port for the Pandits but often express strong emotions or criticize the actions of politicians, showing that the migration is an ongoing source of anguish and frustration that needs resolution. The results also show that social media analysis can give insights even into primarily offline political issues that predate the popularity of the web, and can easily incorporate international perspectives necessary to understand complex migration issues

    Responsible AI and Analytics for an Ethical and Inclusive Digitized Society

    Get PDF

    TripAdvisor reviews of hotels and restaurants by gender

    No full text
    Datasets of Tripadvisor reviews by UK residents of UK hotels and restaurants, together with the user's rating of the hotel.<div>Datasets are split by:</div><div>Hotel star level (2, 3, 4 or all[mixed]) or Restaurant;</div><div>Reviewer gender (M=male-authored reviews; F=female-authored reviews; MF=equal numbers of male and female authored reviews for each rating level);</div><div>Number of texts (1k, 2k, 4k, 8k, 16k, or all available)</div><div><br></div><div>Each dataset contains equal numbers of reviews at each rating level.</div><div>The reviews were selected at random from TripAdvisor.</div><div><br></div><div>This data is from this paper:</div><div>Thelwall, M. (2018). <a href="http://wlv.openrepository.com/wlv/handle/2436/620690">Gender bias in machine learning for sentiment analysis</a>. <em>Online Information Review</em>, 42(3), 343-354. doi: 10.1108/OIR-05-2017-0152<br></div><div><br></div