2,091 research outputs found
ArPanEmo: An Open-Source Dataset for Fine-Grained Emotion Recognition in Arabic Online Content during COVID-19 Pandemic
Emotion recognition is a crucial task in Natural Language Processing (NLP)
that enables machines to comprehend the feelings conveyed in the text. The
applications of emotion recognition are diverse, including mental health
diagnosis, student support, and the detection of online suspicious behavior.
Despite the substantial amount of literature available on emotion recognition
in various languages, Arabic emotion recognition has received relatively little
attention, leading to a scarcity of emotion-annotated corpora. This paper
presents the ArPanEmo dataset, a novel dataset for fine-grained emotion
recognition of online posts in Arabic. The dataset comprises 11,128 online
posts manually labeled for ten emotion categories or neutral, with Fleiss'
kappa of 0.71. It targets a specific Arabic dialect and addresses topics
related to the COVID-19 pandemic, making it the first and largest of its kind.
Python's packages were utilized to collect online posts related to the COVID-19
pandemic from three sources: Twitter, YouTube, and online newspaper comments
between March 2020 and March 2022. Upon collection of the online posts, each
one underwent a semi-automatic classification process using a lexicon of
emotion-related terms to determine whether it belonged to the neutral or
emotional category. Subsequently, manual labeling was conducted to further
categorize the emotional data into fine-grained emotion categories
Triaging Content Severity in Online Mental Health Forums
Mental health forums are online communities where people express their issues
and seek help from moderators and other users. In such forums, there are often
posts with severe content indicating that the user is in acute distress and
there is a risk of attempted self-harm. Moderators need to respond to these
severe posts in a timely manner to prevent potential self-harm. However, the
large volume of daily posted content makes it difficult for the moderators to
locate and respond to these critical posts. We present a framework for triaging
user content into four severity categories which are defined based on
indications of self-harm ideation. Our models are based on a feature-rich
classification framework which includes lexical, psycholinguistic, contextual
and topic modeling features. Our approaches improve the state of the art in
triaging the content severity in mental health forums by large margins (up to
17% improvement over the F-1 scores). Using the proposed model, we analyze the
mental state of users and we show that overall, long-term users of the forum
demonstrate a decreased severity of risk over time. Our analysis on the
interaction of the moderators with the users further indicates that without an
automatic way to identify critical content, it is indeed challenging for the
moderators to provide timely response to the users in need.Comment: Accepted for publication in Journal of the Association for
Information Science and Technology (2017
Social Media Analysis for Social Good
Data on social media is abundant and offers valuable information that can be utilised for a range of purposes. Users share their experiences and opinions on various topics, ranging from their personal life to the community and the world, in real-time. In comparison to conventional data sources, social media is cost-effective to obtain, is up-to-date and reaches a larger audience. By analysing this rich data source, it can contribute to solving societal issues and promote social impact in an equitable manner. In this thesis, I present my research in exploring innovative applications using \ac{NLP} and machine learning to identify patterns and extract actionable insights from social media data to ultimately make a positive impact on society.
First, I evaluate the impact of an intervention program aimed at promoting inclusive and equitable learning opportunities for underrepresented communities using social media data. Second, I develop EmoBERT, an emotion-based variant of the BERT model, for detecting fine-grained emotions to gauge the well-being of a population during significant disease outbreaks. Third, to improve public health surveillance on social media, I demonstrate how emotions expressed in social media posts can be incorporated into health mention classification using an intermediate task fine-tuning and multi-feature fusion approach. I also propose a multi-task learning framework to model the literal meanings of disease and symptom words to enhance the classification of health mentions. Fourth, I create a new health mention dataset to address the imbalance in health data availability between developing and developed countries, providing a benchmark alternative to the traditional standards used in digital health research. Finally, I leverage the power of pretrained language models to analyse religious activities, recognised as social determinants of health, during disease outbreaks
Incorporating Emotions into Health Mention Classification Task on Social Media
The health mention classification (HMC) task is the process of identifying
and classifying mentions of health-related concepts in text. This can be useful
for identifying and tracking the spread of diseases through social media posts.
However, this is a non-trivial task. Here we build on recent studies suggesting
that using emotional information may improve upon this task. Our study results
in a framework for health mention classification that incorporates affective
features. We present two methods, an intermediate task fine-tuning approach
(implicit) and a multi-feature fusion approach (explicit) to incorporate
emotions into our target task of HMC. We evaluated our approach on 5
HMC-related datasets from different social media platforms including three from
Twitter, one from Reddit and another from a combination of social media
sources. Extensive experiments demonstrate that our approach results in
statistically significant performance gains on HMC tasks. By using the
multi-feature fusion approach, we achieve at least a 3% improvement in F1 score
over BERT baselines across all datasets. We also show that considering only
negative emotions does not significantly affect performance on the HMC task.
Additionally, our results indicate that HMC models infused with emotional
knowledge are an effective alternative, especially when other HMC datasets are
unavailable for domain-specific fine-tuning. The source code for our models is
freely available at https://github.com/tahirlanre/Emotion_PHM
Measuring Emotions in the COVID-19 Real World Worry Dataset
The COVID-19 pandemic is having a dramatic impact on societies and economies
around the world. With various measures of lockdowns and social distancing in
place, it becomes important to understand emotional responses on a large scale.
In this paper, we present the first ground truth dataset of emotional responses
to COVID-19. We asked participants to indicate their emotions and express these
in text. This resulted in the Real World Worry Dataset of 5,000 texts (2,500
short + 2,500 long texts). Our analyses suggest that emotional responses
correlated with linguistic measures. Topic modeling further revealed that
people in the UK worry about their family and the economic situation.
Tweet-sized texts functioned as a call for solidarity, while longer texts shed
light on worries and concerns. Using predictive modeling approaches, we were
able to approximate the emotional responses of participants from text within
14% of their actual value. We encourage others to use the dataset and improve
how we can use automated methods to learn about emotional responses and worries
about an urgent problem.Comment: Accepted to ACL 2020 COVID-19 worksho
Data-driven Social Mood Analysis through the Conceptualization of Emotional Fingerprints
Abstract A body of knowledge shows the emerging of evidence according to a better account for the emotional spectrum is achievable by employing a complete selection of emotion keywords. Basic emotions, such as Ekman's ones, cannot be considered universal, but are related to with implicit thematic affairs within the corpus under analysis. The paper tracks some preliminary experiments obtained by employing a data-driven methodology that captures emotions, relying on domain data that you want to model. The experimentation consists of investigating the corresponding conceptual space based on a set of terms (i.e., keywords) that are representative of the domain and the determination. Furthermore, the conceptual space is exploited as a bridge between the textual content and its sub-symbolic mapping as an "emotional fingerprint" into a six-dimensional hyperspace
- …