Search CORE

6 research outputs found

Transformers to Fight the COVID-19 Infodemic

Author: Hettiarachchi Hansi
Ranasinghe Tharindu
Uyangodage Lasitha
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/06/2021
Field of study

The massive spread of false information on social media has become a global risk especially in a global pandemic situation like COVID-19. False information detection has thus become a surging research topic in recent months. NLP4IF-2021 shared task on fighting the COVID-19 infodemic has been organised to strengthen the research in false information detection where the participants are asked to predict seven different binary labels regarding false information in a tweet. The shared task has been organised in three languages; Arabic, Bulgarian and English. In this paper, we present our approach to tackle the task objective using transformers. Overall, our approach achieves a 0.707 mean F1 score in Arabic, 0.578 mean F1 score in Bulgarian and 0.864 mean F1 score in English ranking 4th place in all the languages

Birmingham City University Open Access Repository

BCU Open Access

Aston Publications Explorer

Lancaster E-Prints

Can Multilingual Transformers Fight the COVID-19 Infodemic?

Author: Hettiarachchi Hansi
Ranasinghe Tharindu
Uyangodage Lasitha
Publication venue: INCOMA Ltd
Publication date: 01/09/2021
Field of study

Lancaster E-Prints

SOLD: Sinhala offensive language dataset

Author: Anuradha Isuri
Hettiarachchi Hansi
Premasiri Damith
Ranasinghe Tharindu
Silva Kanishka
Uyangodage Lasitha
Zampieri Marcos
Publication venue
Publication date: 06/03/2024
Field of study

The widespread of offensive content online, such as hate speech and cyber-bullying, is a global phenomenon. This has sparked interest in the artificial intelligence (AI) and natural language processing (NLP) communities, motivating the development of various systems trained to detect potentially harmful content automatically. These systems require annotated datasets to train the machine learning (ML) models. However, with a few notable exceptions, most datasets on this topic have dealt with English and a few other high-resource languages. As a result, the research in offensive language identification has been limited to these languages. This paper addresses this gap by tackling offensive language identification in Sinhala, a low-resource Indo-Aryan language spoken by over 17 million people in Sri Lanka. We introduce the Sinhala Offensive Language Dataset (SOLD) and present multiple experiments on this dataset. SOLD is a manually annotated dataset containing 10,000 posts from Twitter annotated as offensive and not offensive at both sentence-level and token-level, improving the explainability of the ML models. SOLD is the first large publicly available offensive language dataset compiled for Sinhala. We also introduce SemiSOLD, a larger dataset containing more than 145,000 Sinhala tweets, annotated following a semi-supervised approach

Aston Publications Explorer

Transformers to Fight the COVID-19 Infodemic

Author: Hettiarachchi Hansi
Ranasinghe Tharindu
Uyangodage Lasitha
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/06/2021
Field of study

Lancaster E-Prints

Can Multilingual Transformers Fight the COVID-19 Infodemic?

Author: Hettiarachchi Hansi
Ranasinghe Tharindu
Uyangodage Lasitha
Publication venue
Publication date: 01/09/2021
Field of study

The massive spread of false information on social media has become a global risk especially in a global pandemic situation like COVID-19. False information detection has thus become a surging research topic in recent months. In recent years, supervised machine learning models have been used to automatically identify false information in social media. However, most of these machine learning models focus only on the language they were trained on. Given the fact that social media platforms are being used in different languages, managing machine learning models for each and every language separately would be chaotic. In this research, we experiment with multilingual models to identify false information in social media by using two recently released multilingual false information detection datasets. We show that multilingual models perform on par with the monolingual models and sometimes even better than the monolingual models to detect false information in social media making them more useful in real-world scenarios

Birmingham City University Open Access Repository

BCU Open Access

Lancaster E-Prints

SOLD: Sinhala offensive language dataset

Author: Dola Mullage Damith
Hettiarachchi Hansi
Nanomi Arachchige Isuri
Ranasinghe Tharindu
Silva Kanishka
Uyangodage Lasitha
Zampieri Marcos
Publication venue
Publication date: 06/03/2024
Field of study

Lancaster E-Prints