67 research outputs found
UrduFake@FIRE2021: Shared Track on Fake News Identification in Urdu
This study reports the second shared task named as UrduFake@FIRE2021 on
identifying fake news detection in Urdu language. This is a binary
classification problem in which the task is to classify a given news article
into two classes: (i) real news, or (ii) fake news. In this shared task, 34
teams from 7 different countries (China, Egypt, Israel, India, Mexico,
Pakistan, and UAE) registered to participate in the shared task, 18 teams
submitted their experimental results and 11 teams submitted their technical
reports. The proposed systems were based on various count-based features and
used different classifiers as well as neural network architectures. The
stochastic gradient descent (SGD) algorithm outperformed other classifiers and
achieved 0.679 F-score
PoliTo at MULTI-Fake-DetectiVE: Improving FND-CLIP for Multimodal Italian Fake News Detection
The MULTI-Fake-DetectiVE challenge addresses the automatic detection of Italian fake news in a multimodal setting, where both textual and visual components contribute as potential sources of fake content. This paper describes the PoliTO approach to the tasks of fake news detection and analysis of the modality contributions. Our solution turns out to be the best performer on both tasks. It leverages the established FND-CLIP multimodal architecture and proposes ad hoc extensions including sentiment-based text encoding, image transformation in the frequency domain, and data augmentation via back-translation. Thanks to its effectiveness in combining visual and textual content, our solution contributes to fighting the spread of disinformation in the Italian news flow
UATTA-EB: Uncertainty-Aware Test-Time Augmented Ensemble of BERTs for Classifying Common Mental Illnesses on Social Media Posts
Given the current state of the world, because of existing situations around
the world, millions of people suffering from mental illnesses feel isolated and
unable to receive help in person. Psychological studies have shown that our
state of mind can manifest itself in the linguistic features we use to
communicate. People have increasingly turned to online platforms to express
themselves and seek help with their conditions. Deep learning methods have been
commonly used to identify and analyze mental health conditions from various
sources of information, including social media. Still, they face challenges,
including a lack of reliability and overconfidence in predictions resulting in
the poor calibration of the models. To solve these issues, We propose UATTA-EB:
Uncertainty-Aware Test-Time Augmented Ensembling of BERTs for producing
reliable and well-calibrated predictions to classify six possible types of
mental illnesses- None, Depression, Anxiety, Bipolar Disorder, ADHD, and PTSD
by analyzing unstructured user data on Reddit.Comment: Accepted at Tiny Papers @ ICLR 202
A Comprehensive Review of Sentiment Analysis on Indian Regional Languages: Techniques, Challenges, and Trends
Sentiment analysis (SA) is the process of understanding emotion within a text. It helps identify the opinion, attitude, and tone of a text categorizing it into positive, negative, or neutral. SA is frequently used today as more and more people get a chance to put out their thoughts due to the advent of social media. Sentiment analysis benefits industries around the globe, like finance, advertising, marketing, travel, hospitality, etc. Although the majority of work done in this field is on global languages like English, in recent years, the importance of SA in local languages has also been widely recognized. This has led to considerable research in the analysis of Indian regional languages. This paper comprehensively reviews SA in the following major Indian Regional languages: Marathi, Hindi, Tamil, Telugu, Malayalam, Bengali, Gujarati, and Urdu. Furthermore, this paper presents techniques, challenges, findings, recent research trends, and future scope for enhancing results accuracy
Transfer Learning for Low-Resource Sentiment Analysis
Sentiment analysis is the process of identifying and extracting subjective
information from text. Despite the advances to employ cross-lingual approaches
in an automatic way, the implementation and evaluation of sentiment analysis
systems require language-specific data to consider various sociocultural and
linguistic peculiarities. In this paper, the collection and annotation of a
dataset are described for sentiment analysis of Central Kurdish. We explore a
few classical machine learning and neural network-based techniques for this
task. Additionally, we employ an approach in transfer learning to leverage
pretrained models for data augmentation. We demonstrate that data augmentation
achieves a high F score and accuracy despite the difficulty of the task.Comment: 14 pages - under review at ACM TALLI
Focused Crawling and Model Evaluation in the field of Conversational Agents and Motivational Interviewing
The exploitation of Motivational Interviewing concepts when analysing individuals’ speech contributes to gaining
valuable insights into their perspectives and attitudes towards
behaviour change. The scarcity of labelled user data poses
a persistent challenge and impedes technical advancements
in research in non-English language scenarios. To address
the limitations of manual data labelling, we propose a semisupervised learning method as a means to augment an existing
training corpus. Our approach leverages machine-translated
user-generated data sourced from social media communities
and employs self-training techniques for annotation. We conduct an evaluation of multiple classifiers trained on various
augmented datasets. To that end, we consider diverse source
contexts and employ different effectiveness metrics. The results
indicate that this weak labelling approach does not yield significant improvements in the overall classification capabilities
of the models. However, notable enhancements were observed
for the minority classes. As part of future work, we propose
to enlarge the datasets only with new examples from the
minority classes. We conclude that several factors, including
the quality of machine translation, can potentially bias the
pseudo-labelling models. The imbalanced nature of the data
and the impact of a strict pre-filtering threshold are other
important aspects that need to be taken into account.Universidade de Santiago de Compostela. Escola Técnica Superior de EnxeñarÃ
Explainable Misinformation Detection Across Multiple Social Media Platforms
In this work, the integration of two machine learning approaches, namely
domain adaptation and explainable AI, is proposed to address these two issues
of generalized detection and explainability. Firstly the Domain Adversarial
Neural Network (DANN) develops a generalized misinformation detector across
multiple social media platforms DANN is employed to generate the classification
results for test domains with relevant but unseen data. The DANN-based model, a
traditional black-box model, cannot justify its outcome, i.e., the labels for
the target domain. Hence a Local Interpretable Model-Agnostic Explanations
(LIME) explainable AI model is applied to explain the outcome of the DANN mode.
To demonstrate these two approaches and their integration for effective
explainable generalized detection, COVID-19 misinformation is considered a case
study. We experimented with two datasets, namely CoAID and MiSoVac, and
compared results with and without DANN implementation. DANN significantly
improves the accuracy measure F1 classification score and increases the
accuracy and AUC performance. The results obtained show that the proposed
framework performs well in the case of domain shift and can learn
domain-invariant features while explaining the target labels with LIME
implementation enabling trustworthy information processing and extraction to
combat misinformation effectively.Comment: 28 pages,4 figure
- …