306 research outputs found
Aggregating pairwise semantic differences for few-shot claim verification
We introduce SEED, a novel vector-based method to few-shot claim verification that aggregates pairwise semantic differences for claim-evidence pairs
Capturing stance dynamics in social media: open challenges and research directions
Social media platforms provide a goldmine for mining public opinion on issues
of wide societal interest and impact. Opinion mining is a problem that can be
operationalised by capturing and aggregating the stance of individual social
media posts as supporting, opposing or being neutral towards the issue at hand.
While most prior work in stance detection has investigated datasets that cover
short periods of time, interest in investigating longitudinal datasets has
recently increased. Evolving dynamics in linguistic and behavioural patterns
observed in new data require adapting stance detection systems to deal with the
changes. In this survey paper, we investigate the intersection between
computational linguistics and the temporal evolution of human communication in
digital media. We perform a critical review of emerging research considering
dynamics, exploring different semantic and pragmatic factors that impact
linguistic data in general, and stance in particular. We further discuss
current directions in capturing stance dynamics in social media. We discuss the
challenges encountered when dealing with stance dynamics, identify open
challenges and discuss future directions in three key dimensions: utterance,
context and influence
Qmul-sds at exist: Leveraging pre-trained semantics and lexical features for multilingual sexism detection in social networks
Online sexism is an increasing concern for those who experi- ence gender-based abuse in social media platforms as it has affected the healthy development of the Internet with negative impacts in society. The EXIST shared task proposes the first task on sEXism Identifica- tion in Social neTworks (EXIST) at IberLEF 2021 [30]. It provides a benchmark sexism dataset with Twitter and Gab posts in both English and Spanish, along with a task articulated in two subtasks consisting in sexism detection at different levels of granularity: Subtask 1 Sexism Iden- tification is a classical binary classification task to determine whether a given text is sexist or not, while Subtask 2 Sexism Categorisation is a finer-grained classification task focused on distinguishing different types of sexism. In this paper, we describe the participation of the QMUL-SDS team in EXIST. We propose an architecture made of the last 4 hidden states of XLM-RoBERTa and a TextCNN with 3 kernels. Our model also exploits lexical features relying on the use of new and existing lexicons of abusive words, with a special focus on sexist slurs and abusive words targeting women. Our team ranked 11th in Subtask 1 and 4th in Sub- task 2 among all the teams on the leaderboard, clearly outperforming the baselines offered by EXIST
QMUL-SDS @ SardiStance2020: Leveraging network interactions to boost performance on stance detection using knowledge graphs
This paper presents our submission to the SardiStance 2020 shared task, describing the architecture used for Task A and Task B. While our submission for Task A did not exceed the baseline, retraining our model using all the training tweets, showed promising results leading to (f-avg 0.601) using bidirectional LSTM with BERT multilingual embedding for Task A. For our submission for Task B, we ranked 6th (f-avg 0.709). With further investigation, our best experimented settings increased performance from (f-avg 0.573) to (f-avg 0.733) with same architecture and parameter settings and after only incorporating social interaction features- highlighting the impact of social interaction on the model's performance
QMUL-SDS at CheckThat! 2021: Enriching pre-trained language models for the estimation of check-worthiness of Arabic tweets
This paper describes our submission to the CheckThat! Lab at CLEF 2021, where we participated in Subtask 1A (check-worthy claim detection) in Arabic. We introduce our approach to estimate the checkworthiness of tweets as a ranking task. In our approach, we propose to fine-tune state-of-art transformer based models for Arabic such as AraBERTv0.2-base as well as to leverage additional training data from last year's shared task (CheckThat! Lab 2020) along with the dataset provided this year. According to the official evaluation, our submission obtained a joint 4th position in the competition where seven other groups participated
- …