13 research outputs found

    Automatic Misogyny Detection in Social Media: a Survey

    Get PDF
    This article presents鈥痑鈥痵urvey of鈥痑utomated鈥痬isogyny identification鈥痶echniques鈥痠n social media, especially in Twitter.鈥疶his problem is urgent鈥痓ecause of鈥痶he鈥痟igh speed鈥痑t which messages on鈥痵ocial platforms grow and鈥痶he鈥痺idespread鈥痷se of鈥痮ffensive language (including misogynistic鈥痩anguage) in them. In this鈥痑rticle鈥痺e survey鈥痑pproaches proposed in the literature to solve the problem of misogynistic message鈥痳ecognition. These include鈥痗lassical machine learning models like Sup-port Vector Machine, Naive Bayes, Logistic Regression and鈥痚nsembles of different classical machine learning models and deep neural networks鈥痵uch as鈥疞ong Short-term memory and Convolutional Neural Networks.鈥疻e consider鈥痳esults of experiments with these鈥痬odels in different languages: English, Spanish and Italian tweets. The survey describes some features鈥痺hich help to identify misogynistic tweets and some challenges which aim was to create misogyny language classifiers.鈥疶he survey includes鈥痭ot鈥痮nly鈥痬odels鈥痺hich help to identify misogyny language, but also systems which help鈥痶o recognize a target of an offense (an individual or a group of persons)

    Offensive Language Recognition in Social Media

    Full text link
    [EN] This article proposes an approach to solving the problem of multiclassification within the framework of aggressive language recognition in Twitter. At the stage of preprocessing external data is added to the existing dataset, which is based on information in the links in dataset. This made it possible to expand the training dataset and thereby to improve the quality of the classification. The model created is an ensemble of classical machine learning models included Logistic Regression, Support Vector Machines, Naive Bayes models and a combination of Logistic Regression and Naive Bayes. The obtained value of macro F1-score for one of the experiments achieved 0.61, which exceeds the state-of-art published value by 1 percentage point. This indicates the potential value of the proposed approach in the field of hate speech recognition in social media.The work of Paolo Rosso was partially funded by the Spanish MICINN under the research project MISMISFAKEnHATE on Misinformation and Miscommunication in social media: FAKE news and HATE speech (PGC2018-096212-B-C31).Shushkevich, E.; Cardiff, J.; Rosso, P.; Akhtyamova, L. (2020). Offensive Language Recognition in Social Media. Computaci贸n y Sistemas. 24(2):523-532. https://doi.org/10.13053/CyS-24-2-3376S52353224

    Online Misogyny and the Law: Are Human Rights Protected on the Net?

    Get PDF
    This paper opens by analysing the complexity of misogyny, sexism, and toxic masculinity. It then examines online misogyny, dissecting the many acts and behaviours that comprise this kind of digital discrimination. It considers the Gamergate scandal and demonstrates how the video game industry reinforces gender stereotypes. It closes with an analysis of efficiency and limits of legislative systems for combatting online sexism

    Automated Identification of Sexual Orientation and Gender Identity Discriminatory Texts from Issue Comments

    Full text link
    In an industry dominated by straight men, many developers representing other gender identities and sexual orientations often encounter hateful or discriminatory messages. Such communications pose barriers to participation for women and LGBTQ+ persons. Due to sheer volume, manual inspection of all communications for discriminatory communication is infeasible for a large-scale Free Open-Source Software (FLOSS) community. To address this challenge, this study aims to develop an automated mechanism to identify Sexual orientation and Gender identity Discriminatory (SGID) texts from software developers' communications. On this goal, we trained and evaluated SGID4SE ( Sexual orientation and Gender Identity Discriminatory text identification for (4) Software Engineering texts) as a supervised learning-based SGID detection tool. SGID4SE incorporates six preprocessing steps and ten state-of-the-art algorithms. SGID4SE implements six different strategies to improve the performance of the minority class. We empirically evaluated each strategy and identified an optimum configuration for each algorithm. In our ten-fold cross-validation-based evaluations, a BERT-based model boosts the best performance with 85.9% precision, 80.0% recall, and 82.9% F1-Score for the SGID class. This model achieves 95.7% accuracy and 80.4% Matthews Correlation Coefficient. Our dataset and tool establish a foundation for further research in this direction

    Misogyny Detection in Social Media on the Twitter Platform

    Get PDF
    The thesis is devoted to the problem of misogyny detection in social media. In the work we analyse the difference between all offensive language and misogyny language in social media, and review the best existing approaches to detect offensive and misogynistic language, which are based on classical machine learning and neural networks. We also review recent shared tasks aimed to detect misogyny in social media, several of which we have participated in. We propose an approach to the detection and classification of misogyny in texts, based on the construction of an ensemble of models of classical machine learning: Logistic Regression, Naive Bayes, Support Vectors Machines. Also, at the preprocessing stage we used some linguistic features, and novel approaches which allow us to improve the quality of classification. We tested the model on the real datasets both English and multilingual corpora. The results we achieved with our model are highly competitive in this area and demonstrate the capability for future improvement

    On the Detection of False Information: From Rumors to Fake News

    Full text link
    Tesis por compendio[ES] En tiempos recientes, el desarrollo de las redes sociales y de las agencias de noticias han tra铆do nuevos retos y amenazas a la web. Estas amenazas han llamado la atenci贸n de la comunidad investigadora en Procesamiento del Lenguaje Natural (PLN) ya que est谩n contaminando las plataformas de redes sociales. Un ejemplo de amenaza ser铆an las noticias falsas, en las que los usuarios difunden y comparten informaci贸n falsa, inexacta o enga帽osa. La informaci贸n falsa no se limita a la informaci贸n verificable, sino que tambi茅n incluye informaci贸n que se utiliza con fines nocivos. Adem谩s, uno de los desaf铆os a los que se enfrentan los investigadores es la gran cantidad de usuarios en las plataformas de redes sociales, donde detectar a los difusores de informaci贸n falsa no es tarea f谩cil. Los trabajos previos que se han propuesto para limitar o estudiar el tema de la detecci贸n de informaci贸n falsa se han centrado en comprender el lenguaje de la informaci贸n falsa desde una perspectiva ling眉铆stica. En el caso de informaci贸n verificable, estos enfoques se han propuesto en un entorno monoling眉e. Adem谩s, apenas se ha investigado la detecci贸n de las fuentes o los difusores de informaci贸n falsa en las redes sociales. En esta tesis estudiamos la informaci贸n falsa desde varias perspectivas. En primer lugar, dado que los trabajos anteriores se centraron en el estudio de la informaci贸n falsa en un entorno monoling眉e, en esta tesis estudiamos la informaci贸n falsa en un entorno multiling眉e. Proponemos diferentes enfoques multiling眉es y los comparamos con un conjunto de baselines monoling眉es. Adem谩s, proporcionamos estudios sistem谩ticos para los resultados de la evaluaci贸n de nuestros enfoques para una mejor comprensi贸n. En segundo lugar, hemos notado que el papel de la informaci贸n afectiva no se ha investigado en profundidad. Por lo tanto, la segunda parte de nuestro trabajo de investigaci贸n estudia el papel de la informaci贸n afectiva en la informaci贸n falsa y muestra c贸mo los autores de contenido falso la emplean para manipular al lector. Aqu铆, investigamos varios tipos de informaci贸n falsa para comprender la correlaci贸n entre la informaci贸n afectiva y cada tipo (Propaganda, Trucos / Enga帽os, Clickbait y S谩tira). Por 煤ltimo, aunque no menos importante, en un intento de limitar su propagaci贸n, tambi茅n abordamos el problema de los difusores de informaci贸n falsa en las redes sociales. En esta direcci贸n de la investigaci贸n, nos enfocamos en explotar varias caracter铆sticas basadas en texto extra铆das de los mensajes de perfiles en l铆nea de tales difusores. Estudiamos diferentes conjuntos de caracter铆sticas que pueden tener el potencial de ayudar a discriminar entre difusores de informaci贸n falsa y verificadores de hechos.[CA] En temps recents, el desenvolupament de les xarxes socials i de les ag猫ncies de not铆cies han portat nous reptes i amenaces a la web. Aquestes amenaces han cridat l'atenci贸 de la comunitat investigadora en Processament de Llenguatge Natural (PLN) ja que estan contaminant les plataformes de xarxes socials. Un exemple d'amena莽a serien les not铆cies falses, en qu猫 els usuaris difonen i comparteixen informaci贸 falsa, inexacta o enganyosa. La informaci贸 falsa no es limita a la informaci贸 verificable, sin贸 que tamb茅 inclou informaci贸 que s'utilitza amb fins nocius. A m茅s, un dels desafiaments als quals s'enfronten els investigadors 茅s la gran quantitat d'usuaris en les plataformes de xarxes socials, on detectar els difusors d'informaci贸 falsa no 茅s tasca f脿cil. Els treballs previs que s'han proposat per limitar o estudiar el tema de la detecci贸 d'informaci贸 falsa s'han centrat en comprendre el llenguatge de la informaci贸 falsa des d'una perspectiva ling眉铆stica. En el cas d'informaci贸 verificable, aquests enfocaments s'han proposat en un entorn monoling眉e. A m茅s, gaireb茅 no s'ha investigat la detecci贸 de les fonts o els difusors d'informaci贸 falsa a les xarxes socials. En aquesta tesi estudiem la informaci贸 falsa des de diverses perspectives. En primer lloc, at猫s que els treballs anteriors es van centrar en l'estudi de la informaci贸 falsa en un entorn monoling眉e, en aquesta tesi estudiem la informaci贸 falsa en un entorn multiling眉e. Proposem diferents enfocaments multiling眉es i els comparem amb un conjunt de baselines monoling眉es. A m茅s, proporcionem estudis sistem脿tics per als resultats de l'avaluaci贸 dels nostres enfocaments per a una millor comprensi贸. En segon lloc, hem notat que el paper de la informaci贸 afectiva no s'ha investigat en profunditat. Per tant, la segona part del nostre treball de recerca estudia el paper de la informaci贸 afectiva en la informaci贸 falsa i mostra com els autors de contingut fals l'empren per manipular el lector. Aqu铆, investiguem diversos tipus d'informaci贸 falsa per comprendre la correlaci贸 entre la informaci贸 afectiva i cada tipus (Propaganda, Trucs / Enganys, Clickbait i S脿tira). Finalment, per貌 no menys important, en un intent de limitar la seva propagaci贸, tamb茅 abordem el problema dels difusors d'informaci贸 falsa a les xarxes socials. En aquesta direcci贸 de la investigaci贸, ens enfoquem en explotar diverses caracter铆stiques basades en text extretes dels missatges de perfils en l铆nia de tals difusors. Estudiem diferents conjunts de caracter铆stiques que poden tenir el potencial d'ajudar a discriminar entre difusors d'informaci贸 falsa i verificadors de fets.[EN] In the recent years, the development of social media and online news agencies has brought several challenges and threats to the Web. These threats have taken the attention of the Natural Language Processing (NLP) research community as they are polluting the online social media platforms. One of the examples of these threats is false information, in which false, inaccurate, or deceptive information is spread and shared by online users. False information is not limited to verifiable information, but it also involves information that is used for harmful purposes. Also, one of the challenges that researchers have to face is the massive number of users in social media platforms, where detecting false information spreaders is not an easy job. Previous work that has been proposed for limiting or studying the issue of detecting false information has focused on understanding the language of false information from a linguistic perspective. In the case of verifiable information, approaches have been proposed in a monolingual setting. Moreover, detecting the sources or the spreaders of false information in social media has not been investigated much. In this thesis we study false information from several aspects. First, since previous work focused on studying false information in a monolingual setting, in this thesis we study false information in a cross-lingual one. We propose different cross-lingual approaches and we compare them to a set of monolingual baselines. Also, we provide systematic studies for the evaluation results of our approaches for better understanding. Second, we noticed that the role of affective information was not investigated in depth. Therefore, the second part of our research work studies the role of the affective information in false information and shows how the authors of false content use it to manipulate the reader. Here, we investigate several types of false information to understand the correlation between affective information and each type (Propaganda, Hoax, Clickbait, Rumor, and Satire). Last but not least, in an attempt to limit its spread, we also address the problem of detecting false information spreaders in social media. In this research direction, we focus on exploiting several text-based features extracted from the online profile messages of those spreaders. We study different feature sets that can have the potential to help to identify false information spreaders from fact checkers.Ghanem, BHH. (2020). On the Detection of False Information: From Rumors to Fake News [Tesis doctoral]. Universitat Polit猫cnica de Val猫ncia. https://doi.org/10.4995/Thesis/10251/158570TESISCompendi
    corecore