334 research outputs found

    Analyzing Disproportionate Reaction via Comparative Multilingual Targeted Sentiment in Twitter

    Get PDF
    Global events such as terrorist attacks are commented upon in social media, such as Twitter, in different languages and from different parts of the world. Most prior studies have focused on monolingual sentiment analysis, and therefore excluded an extensive proportion of the Twitter userbase. In this paper, we perform a multilingual comparative sentiment analysis study on the terrorist attack in Paris, during November 2015. In particular, we look at targeted sentiment, investigating opinions on specific entities, not simply the general sentiment of each tweet. Given the potentially inflammatory and polarizing effect that these types of tweets may have on attitudes, we examine the sentiments expressed about different targets and explore whether disproportionate reaction was expressed about such targets across different languages. Specifically, we assess whether the sentiment for French speaking Twitter users during the Paris attack differs from English-speaking ones. We identify disproportionately negative attitudes in the English dataset over the French one towards some entities and, via a crowdsourcing experiment, illustrate that this also extends to forming an annotator bias

    Regional sentiment bias in social media reporting during crises

    Get PDF
    Crisis events such as terrorist attacks are extensively commented upon on social media platforms such as Twitter. For this reason, social media content posted during emergency events is increasingly being used by news media and in social studies to characterize the public’s reaction to those events. This is typically achieved by having journalists select ‘representative’ tweets to show, or a classifier trained on prior human-annotated tweets is used to provide a sentiment/emotion breakdown for the event. However, social media users, journalists and annotators do not exist in isolation, they each have their own context and world view. In this paper, we ask the question, ‘to what extent do local and international biases affect the sentiments expressed on social media and the way that social media content is interpreted by annotators’. In particular, we perform a multi-lingual study spanning two events and three languages. We show that there are marked disparities between the emotions expressed by users in different languages for an event. For instance, during the 2016 Paris attack, there was 16% more negative comments written in the English than written in French, even though the event originated on French soil. Furthermore, we observed that sentiment biases also affect annotators from those regions, which can negatively impact the accuracy of social media labelling efforts. This highlights the need to consider the sentiment biases of users in different countries, both when analysing events through the lens of social media, but also when using social media as a data source, and for training automatic classification models

    Translation and Social Media Communication in the Age of the Pandemic

    Get PDF
    This collection of essays represents the first of its kind in exploring the conjunction of translation and social media communication, with a focus on how these practices intersect and transform each other against the backdrop of the cascading COVID-19 crisis. The contributions in the book offer empirical case studies as well as personal reflections on the topic, illuminating a broad range of themes such as knowledge translation, crisis communications, language policies, cyberpolitics and digital platformization. Together they demonstrate the vital role of translation in the trust-based construction of global public health discourses, while accounting for the new medialities that are reshaping the conception, experience and critique of translation in response to the cultural, political and ecological challenges in the post-pandemic world. Written by leading scholars in translation studies, media studies and literary studies, this volume sets to open up new conversations among these fields in relation to the global pandemic and its aftermath

    Advances in Social Media Research:Past, Present and Future

    Get PDF
    Social media comprises communication websites that facilitate relationship forming between users from diverse backgrounds, resulting in a rich social structure. User generated content encourages inquiry and decision-making. Given the relevance of social media to various stakeholders, it has received significant attention from researchers of various fields, including information systems. There exists no comprehensive review that integrates and synthesises the findings of literature on social media. This study discusses the findings of 132 papers (in selected IS journals) on social media and social networking published between 1997 and 2017. Most papers reviewed here examine the behavioural side of social media, investigate the aspect of reviews and recommendations, and study its integration for organizational purposes. Furthermore, many studies have investigated the viability of online communities/social media as a marketing medium, while others have explored various aspects of social media, including the risks associated with its use, the value that it creates, and the negative stigma attached to it within workplaces. The use of social media for information sharing during critical events as well as for seeking and/or rendering help has also been investigated in prior research. Other contexts include political and public administration, and the comparison between traditional and social media. Overall, our study identifies multiple emergent themes in the existing corpus, thereby furthering our understanding of advances in social media research. The integrated view of the extant literature that our study presents can help avoid duplication by future researchers, whilst offering fruitful lines of enquiry to help shape research for this emerging field

    Translation and Social Media Communication in the Age of the Pandemic

    Get PDF
    This collection of essays represents the first of its kind in exploring the conjunction of translation and social media communication, with a focus on how these practices intersect and transform each other against the backdrop of the cascading COVID-19 crisis. The contributions in the book offer empirical case studies as well as personal reflections on the topic, illuminating a broad range of themes such as knowledge translation, crisis communications, language policies, cyberpolitics and digital platformization. Together they demonstrate the vital role of translation in the trust-based construction of global public health discourses, while accounting for the new medialities that are reshaping the conception, experience and critique of translation in response to the cultural, political and ecological challenges in the post-pandemic world. Written by leading scholars in translation studies, media studies and literary studies, this volume sets to open up new conversations among these fields in relation to the global pandemic and its aftermath

    Overcoming Racial Harms to Democracy from Artificial Intelligence

    Get PDF
    While the United States is becoming more racially diverse, generative artificial intelligence and related technologies threaten to undermine truly representative democracy. Left unchecked, AI will exacerbate already substantial existing challenges, such as racial polarization, cultural anxiety, antidemocratic attitudes, racial vote dilution, and voter suppression. Synthetic video and audio (“deepfakes”) receive the bulk of popular attention—but are just the tip of the iceberg. Microtargeting of racially tailored disinformation, racial bias in automated election administration, discriminatory voting restrictions, racially targeted cyberattacks, and AI-powered surveillance that chills racial justice claims are just a few examples of how AI is threatening democracy. Unfortunately, existing laws—including the Voting Rights Act—are unlikely to address the challenges. These problems, however, are not insurmountable if policymakers, activists, and technology companies act now. This Article asserts that AI should be regulated to facilitate a racially inclusive democracy, proposes novel principles that provide a framework to regulate AI, and offers specific policy interventions to illustrate the implementation of the principles. Even though race is the most significant demographic factor that shapes voting patterns in the United States, this is the first article to comprehensively identify the racial harms to democracy posed by AI and offer a way forward

    On the Keyword Extraction and Bias Analysis, Graph-based Exploration and Data Augmentation for Abusive Language Detection in Low-Resource Settings

    Get PDF
    Tesis por compendio[ES] La detección del lenguaje abusivo es una tarea que se ha vuelto cada vez más importante en la era digital moderna, donde la comunicación se produce a través de diversas plataformas en línea. El aumento de las interacciones en estas plataformas ha provocado un aumento de la aparición del lenguaje abusivo. Abordar dicho contenido es crucial para mantener un entorno en línea seguro e inclusivo. Sin embargo, esta tarea enfrenta varios desafíos que la convierten en un área compleja y que demanda de continua investigación y desarrollo. En particular, detectar lenguaje abusivo en entornos con escasez de datos presenta desafíos adicionales debido a que el desarrollo de sistemas automáticos precisos a menudo requiere de grandes conjuntos de datos anotados. En esta tesis investigamos diferentes aspectos de la detección del lenguaje abusivo, prestando especial atención a entornos con datos limitados. Primero, estudiamos el sesgo hacia palabras clave abusivas en modelos entrenados para la detección del lenguaje abusivo. Con este propósito, proponemos dos métodos para extraer palabras clave potencialmente abusivas de colecciones de textos. Luego evaluamos el sesgo hacia las palabras clave extraídas y cómo se puede modificar este sesgo para influir en el rendimiento de la detección del lenguaje abusivo. El análisis y las conclusiones de este trabajo revelan evidencia de que es posible mitigar el sesgo y que dicha reducción puede afectar positivamente el desempeño de los modelos. Sin embargo, notamos que no es posible establecer una correspondencia similar entre la variación del sesgo y el desempeño de los modelos cuando hay escasez datos con las técnicas de reducción del sesgo estudiadas. En segundo lugar, investigamos el uso de redes neuronales basadas en grafos para detectar lenguaje abusivo. Por un lado, proponemos una estrategia de representación de textos diseñada con el objetivo de obtener un espacio de representación en el que los textos abusivos puedan distinguirse fácilmente de otros textos. Por otro lado, evaluamos la capacidad de redes neuronales convolucionales basadas en grafos para clasificar textos abusivos. La siguiente parte de nuestra investigación se centra en analizar cómo el aumento de datos puede influir en el rendimiento de la detección del lenguaje abusivo. Para ello, investigamos dos técnicas bien conocidas basadas en el principio de minimización del riesgo en la vecindad de instancias originales y proponemos una variante para una de ellas. Además, evaluamos técnicas simples basadas en el reemplazo de sinónimos, inserción aleatoria, intercambio aleatorio y eliminación aleatoria de palabras. Las contribuciones de esta tesis ponen de manifiesto el potencial de las redes neuronales basadas en grafos y de las técnicas de aumento de datos para mejorar la detección del lenguaje abusivo, especialmente cuando hay limitación de datos. Estas contribuciones han sido publicadas en conferencias y revistas internacionales.[CA] La detecció del llenguatge abusiu és una tasca que s'ha tornat cada vegada més important en l'era digital moderna, on la comunicació es produïx a través de diverses plataformes en línia. L'augment de les interaccions en estes plataformes ha provocat un augment de l'aparició de llenguatge abusiu. Abordar este contingut és crucial per a mantindre un entorn en línia segur i inclusiu. No obstant això, esta tasca enfronta diversos desafiaments que la convertixen en una àrea complexa i contínua de recerca i desenvolupament. En particular, detectar llenguatge abusiu en entorns amb escassetat de dades presenta desafiaments addicionals pel fet que el desenvolupament de sistemes automàtics precisos sovint requerix de grans conjunts de dades anotades. En esta tesi investiguem diferents aspectes de la detecció del llenguatge abusiu, prestant especial atenció a entorns amb dades limitades. Primer, estudiem el biaix cap a paraules clau abusives en models entrenats per a la detecció de llenguatge abusiu. Amb este propòsit, proposem dos mètodes per a extraure paraules clau potencialment abusives de col·leccions de textos. Després avaluem el biaix cap a les paraules clau extretes i com es pot modificar este biaix per a influir en el rendiment de la detecció de llenguatge abusiu. L'anàlisi i les conclusions d'este treball revelen evidència que és possible mitigar el biaix i que esta reducció pot afectar positivament l'acompliment dels models. No obstant això, notem que no és possible establir una correspondència similar entre la variació del biaix i l'acompliment dels models quan hi ha escassetat dades amb les tècniques de reducció del biaix estudiades. En segon lloc, investiguem l'ús de xarxes neuronals basades en grafs per a detectar llenguatge abusiu. D'una banda, proposem una estratègia de representació textual dissenyada amb l'objectiu d'obtindre un espai de representació en el qual els textos abusius puguen distingir-se fàcilment d'altres textos. D'altra banda, avaluem la capacitat de models basats en xarxes neuronals convolucionals basades en grafs per a classificar textos abusius. La següent part de la nostra investigació se centra en analitzar com l'augment de dades pot influir en el rendiment de la detecció del llenguatge abusiu. Per a això, investiguem dues tècniques ben conegudes basades en el principi de minimització del risc en el veïnatge d'instàncies originals i proposem una variant per a una d'elles. A més, avaluem tècniques simples basades en el reemplaçament de sinònims, inserció aleatòria, intercanvi aleatori i eliminació aleatòria de paraules. Les contribucions d'esta tesi destaquen el potencial de les xarxes neuronals basades en grafs i de les tècniques d'augment de dades per a millorar la detecció del llenguatge abusiu, especialment quan hi ha limitació de dades. Estes contribucions han sigut publicades en revistes i conferències internacionals.[EN] Abusive language detection is a task that has become increasingly important in the modern digital age, where communication takes place via various online platforms. The increase in online interactions has led to an increase in the occurrence of abusive language. Addressing such content is crucial to maintaining a safe and inclusive online environment. However, this task faces several challenges that make it a complex and ongoing area of research and development. In particular, detecting abusive language in environments with sparse data poses an additional challenge, since the development of accurate automated systems often requires large annotated datasets. In this thesis we investigate different aspects of abusive language detection, paying particular attention to environments with limited data. First, we study the bias toward abusive keywords in models trained for abusive language detection. To this end, we propose two methods for extracting potentially abusive keywords from datasets. We then evaluate the bias toward the extracted keywords and how this bias can be modified in order to influence abusive language detection performance. The analysis and conclusions of this work reveal evidence that it is possible to mitigate the bias and that such a reduction can positively affect the performance of the models. However, we notice that it is not possible to establish a similar correspondence between bias mitigation and model performance in low-resource settings with the studied bias mitigation techniques. Second, we investigate the use of models based on graph neural networks to detect abusive language. On the one hand, we propose a text representation framework designed with the aim of obtaining a representation space in which abusive texts can be easily distinguished from other texts. On the other hand, we evaluate the ability of models based on convolutional graph neural networks to classify abusive texts. The next part of our research focuses on analyzing how data augmentation can influence the performance of abusive language detection. To this end, we investigate two well-known techniques based on the principle of vicinal risk minimization and propose a variant for one of them. In addition, we evaluate simple techniques based on the operations of synonym replacement, random insertion, random swap, and random deletion. The contributions of this thesis highlight the potential of models based on graph neural networks and data augmentation techniques to improve abusive language detection, especially in low-resource settings. These contributions have been published in several international conferences and journals.This research work was partially funded by the Spanish Ministry of Science and Innovation under the research project MISMIS-FAKEnHATE on Misinformation and Miscommunication in social media: FAKE news and HATE speech (PGC2018-096212-B-C31). The authors thank also the EU-FEDER Comunitat Valenciana 2014-2020 grant IDIFEDER/2018/025. This work was done in the framework of the research project on Fairness and Transparency for equitable NLP applications in social media, funded by MCIN/AEI/10.13039/501100011033 and by ERDF, EU A way of making EuropePI. FairTransNLP research project (PID2021-124361OB-C31) funded by MCIN/AEI/10.13039/501100011033 and by ERDF, EU A way of making Europe. Part of the work presented in this article was performed during the first author’s research visit to the University of Mannheim, supported through a Contact Fellowship awarded by the DAAD scholarship program “STIBET Doktoranden”.Peña Sarracén, GLDL. (2024). On the Keyword Extraction and Bias Analysis, Graph-based Exploration and Data Augmentation for Abusive Language Detection in Low-Resource Settings [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/203266Compendi

    Challenges and perspectives of hate speech research

    Get PDF
    This book is the result of a conference that could not take place. It is a collection of 26 texts that address and discuss the latest developments in international hate speech research from a wide range of disciplinary perspectives. This includes case studies from Brazil, Lebanon, Poland, Nigeria, and India, theoretical introductions to the concepts of hate speech, dangerous speech, incivility, toxicity, extreme speech, and dark participation, as well as reflections on methodological challenges such as scraping, annotation, datafication, implicity, explainability, and machine learning. As such, it provides a much-needed forum for cross-national and cross-disciplinary conversations in what is currently a very vibrant field of research

    The Palgrave Handbook of Digital Russia Studies

    Get PDF
    This open access handbook presents a multidisciplinary and multifaceted perspective on how the ‘digital’ is simultaneously changing Russia and the research methods scholars use to study Russia. It provides a critical update on how Russian society, politics, economy, and culture are reconfigured in the context of ubiquitous connectivity and accounts for the political and societal responses to digitalization. In addition, it answers practical and methodological questions in handling Russian data and a wide array of digital methods. The volume makes a timely intervention in our understanding of the changing field of Russian Studies and is an essential guide for scholars, advanced undergraduate and graduate students studying Russia today
    corecore