9 research outputs found

    Detecting Deception, Partisan, and Social Biases

    Full text link
    Tesis por compendio[ES] En la actualidad, el mundo político tiene tanto o más impacto en la sociedad que ésta en el mundo político. Los líderes o representantes de partidos políticos hacen uso de su poder en los medios de comunicación, para modificar posiciones ideológicas y llegar al pueblo con el objetivo de ganar popularidad en las elecciones gubernamentales.A través de un lenguaje engañoso, los textos políticos pueden contener sesgos partidistas y sociales que minan la percepción de la realidad. Como resultado, los seguidores de una ideología, o miembros de una categoría social, se sienten amenazados por otros grupos sociales o ideológicos, o los perciben como competencia, derivándose así una polarización política con agresiones físicas y verbales. La comunidad científica del Procesamiento del Lenguaje Natural (NLP, según sus siglas en inglés) contribuye cada día a detectar discursos de odio, insultos, mensajes ofensivos, e información falsa entre otras tareas computacionales que colindan con ciencias sociales. Sin embargo, para abordar tales tareas, es necesario hacer frente a diversos problemas entre los que se encuentran la dificultad de tener textos etiquetados, las limitaciones de no trabajar con un equipo interdisciplinario, y los desafíos que entraña la necesidad de soluciones interpretables por el ser humano. Esta tesis se enfoca en la detección de sesgos partidistas y sesgos sociales, tomando como casos de estudio el hiperpartidismo y los estereotipos sobre inmigrantes. Para ello, se propone un modelo basado en una técnica de enmascaramiento de textos capaz de detectar lenguaje engañoso incluso en temas controversiales, siendo capaz de capturar patrones del contenido y el estilo de escritura. Además, abordamos el problema usando modelos basados en BERT, conocidos por su efectividad al capturar patrones sintácticos y semánticos sobre las mismas representaciones de textos. Ambos enfoques, la técnica de enmascaramiento y los modelos basados en BERT, se comparan en términos de desempeño y explicabilidad en la detección de hiperpartidismo en noticias políticas y estereotipos sobre inmigrantes. Para la identificación de estos últimos, se propone una nueva taxonomía con fundamentos teóricos en sicología social, y con la que se etiquetan textos extraídos de intervenciones partidistas llevadas a cabo en el Parlamento español. Los resultados muestran que los enfoques propuestos contribuyen al estudio del hiperpartidismo, así como a identif i car cuándo los ciudadanos y políticos enmarcan a los inmigrantes en una imagen de víctima, recurso económico, o amenaza. Finalmente, en esta investigación interdisciplinaria se demuestra que los estereotipos sobre inmigrantes son usados como estrategia retórica en contextos políticos.[CA] Avui, el món polític té tant o més impacte en la societat que la societat en el món polític. Els líders polítics, o representants dels partits polítics, fan servir el seu poder als mitjans de comunicació per modif i car posicions ideològiques i arribar al poble per tal de guanyar popularitat a les eleccions governamentals. Mitjançant un llenguatge enganyós, els textos polítics poden contenir biaixos partidistes i socials que soscaven la percepció de la realitat. Com a resultat, augmenta la polarització política nociva perquè els seguidors d'una ideologia, o els membres d'una categoria social, veuen els altres grups com una amenaça o competència, que acaba en agressions verbals i físiques amb resultats desafortunats. La comunitat de Processament del llenguatge natural (PNL) té cada dia noves aportacions amb enfocaments que ajuden a detectar discursos d'odi, insults, missatges ofensius i informació falsa, entre altres tasques computacionals relacionades amb les ciències socials. No obstant això, molts obstacles impedeixen eradicar aquests problemes, com ara la dif i cultat de tenir textos anotats, les limitacions dels enfocaments no interdisciplinaris i el repte afegit per la necessitat de solucions interpretables. Aquesta tesi se centra en la detecció de biaixos partidistes i socials, prenent com a cas pràctic l'hiperpartidisme i els estereotips sobre els immigrants. Proposem un model basat en una tècnica d'emmascarament que permet detectar llenguatge enganyós en temes polèmics i no polèmics, capturant pa-trons relacionats amb l'estil i el contingut. A més, abordem el problema avaluant models basats en BERT, coneguts per ser efectius per capturar patrons semàntics i sintàctics en la mateixa representació. Comparem aquests dos enfocaments (la tècnica d'emmascarament i els models basats en BERT) en termes de rendiment i les seves solucions explicables en la detecció de l'hiperpartidisme en les notícies polítiques i els estereotips d'immigrants. Per tal d'identificar els estereotips dels immigrants, proposem una nova tax-onomia recolzada per la teoria de la psicologia social i anotem un conjunt de dades de les intervencions partidistes al Parlament espanyol. Els resultats mostren que els nostres models poden ajudar a estudiar l'hiperpartidisme i identif i car diferents marcs en què els ciutadans i els polítics perceben els immigrants com a víctimes, recursos econòmics o amenaces. Finalment, aquesta investigació interdisciplinària demostra que els estereotips dels immigrants s'utilitzen com a estratègia retòrica en contextos polítics.[EN] Today, the political world has as much or more impact on society than society has on the political world. Political leaders, or representatives of political parties, use their power in the media to modify ideological positions and reach the people in order to gain popularity in government elections. Through deceptive language, political texts may contain partisan and social biases that undermine the perception of reality. As a result, harmful political polarization increases because the followers of an ideology, or members of a social category, see other groups as a threat or competition, ending in verbal and physical aggression with unfortunate outcomes. The Natural Language Processing (NLP) community has new contri-butions every day with approaches that help detect hate speech, insults, of f ensive messages, and false information, among other computational tasks related to social sciences. However, many obstacles prevent eradicating these problems, such as the dif f i culty of having annotated texts, the limitations of non-interdisciplinary approaches, and the challenge added by the necessity of interpretable solutions. This thesis focuses on the detection of partisan and social biases, tak-ing hyperpartisanship and stereotypes about immigrants as case studies. We propose a model based on a masking technique that can detect deceptive language in controversial and non-controversial topics, capturing patterns related to style and content. Moreover, we address the problem by evalu-ating BERT-based models, known to be ef f ective at capturing semantic and syntactic patterns in the same representation. We compare these two approaches (the masking technique and the BERT-based models) in terms of their performance and the explainability of their decisions in the detection of hyperpartisanship in political news and immigrant stereotypes. In order to identify immigrant stereotypes, we propose a new taxonomy supported by social psychology theory and annotate a dataset from partisan interventions in the Spanish parliament. Results show that our models can help study hyperpartisanship and identify dif f erent frames in which citizens and politicians perceive immigrants as victims, economic resources, or threat. Finally, this interdisciplinary research proves that immigrant stereotypes are used as a rhetorical strategy in political contexts.This PhD thesis was funded by the MISMIS-FAKEnHATE research project (PGC2018-096212-B-C31) of the Spanish Ministry of Science and Innovation.Sánchez Junquera, JJ. (2022). Detecting Deception, Partisan, and Social Biases [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/185784Compendi

    Linguistic Threat Assessment: Understanding Targeted Violence through Computational Linguistics

    Get PDF
    Language alluding to possible violence is widespread online, and security professionals are increasingly faced with the issue of understanding and mitigating this phenomenon. The volume of extremist and violent online data presents a workload that is unmanageable for traditional, manual threat assessment. Computational linguistics may be of particular relevance to understanding threats of grievance-fuelled targeted violence on a large scale. This thesis seeks to advance knowledge on the possibilities and pitfalls of threat assessment through automated linguistic analysis. Based on in-depth interviews with expert threat assessment practitioners, three areas of language are identified which can be leveraged for automation of threat assessment, namely, linguistic content, style, and trajectories. Implementations of each area are demonstrated in three subsequent quantitative chapters. First, linguistic content is utilised to develop the Grievance Dictionary, a psycholinguistic dictionary aimed at measuring concepts related to grievance-fuelled violence in text. Thereafter, linguistic content is supplemented with measures of linguistic style in order to examine the feasibility of author profiling (determining gender, age, and personality) in abusive texts. Lastly, linguistic trajectories are measured over time in order to assess the effect of an external event on an extremist movement. Collectively, the chapters in this thesis demonstrate that linguistic automation of threat assessment is indeed possible. The concluding chapter describes the limitations of the proposed approaches and illustrates where future potential lies to improve automated linguistic threat assessment. Ideally, developers of computational implementations for threat assessment strive for explainability and transparency. Furthermore, it is argued that computational linguistics holds particular promise for large-scale measurement of grievance-fuelled language, but is perhaps less suited to prediction of actual violent behaviour. Lastly, researchers and practitioners involved in threat assessment are urged to collaboratively and critically evaluate novel computational tools which may emerge in the future

    EVALITA Evaluation of NLP and Speech Tools for Italian - December 17th, 2020

    Get PDF
    Welcome to EVALITA 2020! EVALITA is the evaluation campaign of Natural Language Processing and Speech Tools for Italian. EVALITA is an initiative of the Italian Association for Computational Linguistics (AILC, http://www.ai-lc.it) and it is endorsed by the Italian Association for Artificial Intelligence (AIxIA, http://www.aixia.it) and the Italian Association for Speech Sciences (AISV, http://www.aisv.it)

    EVALITA Evaluation of NLP and Speech Tools for Italian - December 17th, 2020

    Get PDF
    Welcome to EVALITA 2020! EVALITA is the evaluation campaign of Natural Language Processing and Speech Tools for Italian. EVALITA is an initiative of the Italian Association for Computational Linguistics (AILC, http://www.ai-lc.it) and it is endorsed by the Italian Association for Artificial Intelligence (AIxIA, http://www.aixia.it) and the Italian Association for Speech Sciences (AISV, http://www.aisv.it)

    Linguistic variation across Twitter and Twitter trolling

    Get PDF
    Trolling is used to label a variety of behaviours, from the spread of misinformation and hyperbole to targeted abuse and malicious attacks. Despite this, little is known about how trolling varies linguistically and what its major linguistic repertoires and communicative functions are in comparison to general social media posts. Consequently, this dissertation collects two corpora of tweets – a general English Twitter corpus and a Twitter trolling corpus using other Twitter users’ accusations – and introduces and applies a new short-text version of Multi-Dimensional Analysis to each corpus, which is designed to identify aggregated dimensions of linguistic variation across them. The analysis finds that trolling tweets and general tweets only differ on the final dimension of linguistic variation, but share the following linguistic repertoires: “Informational versus Interactive”, “Personal versus Other Description”, and “Promotional versus Oppositional”. Moreover, the analysis compares trolling tweets to general Twitter’s dimensions and finds that trolling tweets and general tweets are remarkably more similar than they are different in their distribution along all dimensions. These findings counter various theories on trolling and problematise the notion that trolling can be detected automatically using grammatical variation. Overall, this dissertation provides empirical evidence on how trolling and general tweets vary linguistically

    Proceedings of the Seventh Italian Conference on Computational Linguistics CLiC-it 2020

    Get PDF
    On behalf of the Program Committee, a very warm welcome to the Seventh Italian Conference on Computational Linguistics (CLiC-it 2020). This edition of the conference is held in Bologna and organised by the University of Bologna. The CLiC-it conference series is an initiative of the Italian Association for Computational Linguistics (AILC) which, after six years of activity, has clearly established itself as the premier national forum for research and development in the fields of Computational Linguistics and Natural Language Processing, where leading researchers and practitioners from academia and industry meet to share their research results, experiences, and challenges
    corecore