Search CORE

13 research outputs found

GruPaTo at SemEval-2020 Task 12: Retraining mBERT on Social Media and Fine-tuned Offensive Language Models

Author: Basile Valerio
Caselli Tommaso
Colla Davide
Publication venue: International Committee for Computational Linguistics
Publication date: 01/01/2020
Field of study

Institutional Research Information System University of Turin

GruPaTo at SemEval-2020 Task 12:Retraining mBERT on Social Media and Fine-tuned Offensive Language Models

Author: Basile Valerio
Caselli Tommaso
Colla Davide
Granitzer Michael
Mitrović Jelena
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2020
Field of study

We introduce an approach to multilingual Offensive Language Detection based on the mBERT transformer model. We download extra training data from Twitter in English, Danish, and Turkish, and use it to re-train the model. We then fine-tuned the model on the provided training data and, in some configurations, implement transfer learning approach exploiting the typological relatedness between English and Danish. Our systems obtained good results across the three languages (.9036 for EN, .7619 for DA, and .7789 for TR)

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

Dissertations of the University of Groningen

Smart detection of offensive words in social media using the soundex algorithm and permuterm index

Author: Abukaraki Anas
Abukhalil Tamer
Al Rawashdeh Tawfiq
Al-Jaafreh Moha'med
Alksasbeh Malek Z.
Alqaralleh Bassam A. Y.
Publication venue: Institute of Advanced Engineering and Science
Publication date: 01/10/2021
Field of study

Offensive posts in the social media that are inappropriate for a specific age, level of maturity, or impression are quite often destined more to unadult than adult participants. Nowadays, the growth in the number of the masked offensive words in the social media is one of the ethically challenging problems. Thus, there has been growing interest in development of methods that can automatically detect posts with such words. This study aimed at developing a method that can detect the masked offensive words in which partial alteration of the word may trick the conventional monitoring systems when being posted on social media. The proposed method progresses in a series of phases that can be broken down into a pre-processing phase, which includes filtering, tokenization, and stemming; offensive word extraction phase, which relies on using the soundex algorithm and permuterm index; and a post-processing phase that classifies the users’ posts in order to highlight the offensive content. Accordingly, the method detects the masked offensive words in the written text, thus forbidding certain types of offensive words from being published. Results of evaluation of performance of the proposed method indicate a 99% accuracy of detection of offensive words

ZENODO

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Institute of Advanced Engineering and Science

GruPaTo at SemEval-2020 Task 12:Retraining mBERT on Social Media and Fine-tuned Offensive Language Models

Author: Basile Valerio
Caselli Tommaso
Colla Davide
Granitzer Michael
Mitrović Jelena
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2020
Field of study

University of Groningen

GruPaTo at SemEval-2020 Task 12:Retraining mBERT on Social Media and Fine-tuned Offensive Language Models

Author: Basile Valerio
Caselli Tommaso
Colla Davide
Granitzer Michael
Mitrović Jelena
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2020
Field of study

ARTS repository - University of Groningen

A Hybrid Model for Monolingual and Multilingual Toxic Comment Detection

Author: Huang Degen
Song Guizhe
Zhang* Yanping
Publication venue: 'Mechanical Engineering Faculty in Slavonski Brod'
Publication date: 01/01/2021
Field of study

Social media provides a public and convenient platform for people to communicate. However, it is also open to hateful behavior and toxic comments. Social networks, like Facebook, Twitter, and many others, have been working on developing effective toxic comment detection methods to provide better service. Monolingual language model focuses on a single-language and provides high accuracy in detection. Multilingual language model provides better generalization performance. In order to improve the effectiveness of detecting toxic comments in multiple languages, we propose a hybrid model, which fuses monolingual model and multilingual model. We use labeled data to fine-tune the monolingual pre-trained model. We use masked language modeling to semi-supervise the fine-tuning of multilingual pre-trained model on unlabeled data and then use labeled data to fine-tune the model. Through this way, we can fully utilize the large amount of unlabeled data; reduce dependence on labeled comment data; and improve the effectiveness of detection. We also design several comparative experiments. The results demonstrate the effectiveness and advantage of our proposed model, especially compared to the XLM-RoBERTa multilingual fine-tuning model

Directory of Open Access Journals

HRČAK - Portal of Croatian Scientific and Professional Journals

Hrčak - Portal of scientific journals of Croatia

Twitter-based Polarised Embeddings for Abusive Language Detection

Author: Caselli Tommaso
David Roy
Graumans Leon
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/09/2019
Field of study

We present a method to generate polarised word embeddings using controversial topics as search terms in Twitter as proxies for interactions among social media communities that may be liable to use abusive language. We investigate to what extent models trained with these embeddings perform with respect to generic embeddings across four data sets of abusive language, both in the same domain and out of domain, using simple linear classifiers. Our results show that the polarised embeddings are competitive in the same domain data sets, and perform better in out of domain one

Crossref

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

Dissertations of the University of Groningen

Defining and Detecting Toxicity on Social Media: Context and Knowledge are Key

Author: Kursuncu Ugur
Shalin Valerie
Sheth Amit
Publication venue: Scholar Commons
Publication date: 21/04/2021
Field of study

As the role of online platforms has become increasingly prominent for communication, toxic behaviors, such as cyberbullying and harassment, have been rampant in the last decade. On the other hand, online toxicity is multi-dimensional and sensitive in nature, which makes its detection challenging. As the impact of exposure to online toxicity can lead to serious implications for individuals and communities, reliable models and algorithms are required for detecting and understanding such communications. In this paper We define toxicity to provide a foundation drawing social theories. Then, we provide an approach that identifies multiple dimensions of toxicity and incorporates explicit knowledge in a statistical learning algorithm to resolve ambiguity across such dimensions

arXiv.org e-Print Archive

Scholar Commons - Institutional Repository of the University of South Carolina