11 research outputs found
QMUL-SDS @ SardiStance: Leveraging Network Interactions to Boost Performance on Stance Detection using Knowledge Graphs
This paper presents our submission to the SardiStance 2020 shared task,
describing the architecture used for Task A and Task B. While our submission
for Task A did not exceed the baseline, retraining our model using all the
training tweets, showed promising results leading to (f-avg 0.601) using
bidirectional LSTM with BERT multilingual embedding for Task A. For our
submission for Task B, we ranked 6th (f-avg 0.709). With further investigation,
our best experimented settings increased performance from (f-avg 0.573) to
(f-avg 0.733) with same architecture and parameter settings and after only
incorporating social interaction features -- highlighting the impact of social
interaction on the model's performance
QMUL-SDS at CheckThat! 2020: Determining COVID-19 Tweet Check-Worthiness Using an Enhanced CT-BERT with Numeric Expressions
This paper describes the participation of the QMUL-SDS team for Task 1 of the
CLEF 2020 CheckThat! shared task. The purpose of this task is to determine the
check-worthiness of tweets about COVID-19 to identify and prioritise tweets
that need fact-checking. The overarching aim is to further support ongoing
efforts to protect the public from fake news and help people find reliable
information. We describe and analyse the results of our submissions. We show
that a CNN using COVID-Twitter-BERT (CT-BERT) enhanced with numeric expressions
can effectively boost performance from baseline results. We also show results
of training data augmentation with rumours on other topics. Our best system
ranked fourth in the task with encouraging outcomes showing potential for
improved results in the future
Extended overview of the CLEF 2024 LongEval Lab on Longitudinal Evaluation of Model Performance
We describe the second edition of the LongEval CLEF 2024 shared task. This lab evaluates the temporal persistence of Information Retrieval (IR) systems and Text Classifiers. Task 1 requires IR systems to run on corpora acquired at several timestamps, and evaluates the drop in system quality (NDCG) along these timestamps. Task 2 tackles binary sentiment classification at different points in time, and evaluates the performance drop for different temporal gaps. Overall, 37 teams registered for Task 1 and 25 for Task 2. Ultimately, 14 and 4 teams participated in Task 1 and Task 2, respectively.</p
Extended overview of the CLEF 2024 LongEval Lab on Longitudinal Evaluation of Model Performance
We describe the second edition of the LongEval CLEF 2024 shared task. This lab evaluates the temporal persistence of Information Retrieval (IR) systems and Text Classifiers. Task 1 requires IR systems to run on corpora acquired at several timestamps, and evaluates the drop in system quality (NDCG) along these timestamps. Task 2 tackles binary sentiment classification at different points in time, and evaluates the performance drop for different temporal gaps. Overall, 37 teams registered for Task 1 and 25 for Task 2. Ultimately, 14 and 4 teams participated in Task 1 and Task 2, respectively.</p
LongEval: Longitudinal Evaluation of Model Performance at CLEF 2023
In this paper, we describe the plans for the first LongEval CLEF 2023 shared task dedicated to evaluating the temporal persistence of Information Retrieval (IR) systems and Text Classifiers. The task is motivated by recent research showing that the performance of these models drops as the test data becomes more distant, with respect to time, from the training data. LongEval differs from traditional shared IR and classification tasks by giving special consideration to evaluating models aiming to mitigate performance drop over time. We envisage that this task will draw attention from the IR community and NLP researchers to the problem of temporal persistence of models, what enables or prevents it, potential solutions and their limitations.</p
Extended Overview of the CLEF-2023 LongEval Lab on Longitudinal Evaluation of Model Performance
We describe the first edition of the LongEval CLEF 2023 shared task. This lab evaluates the temporal persistence of Information Retrieval (IR) systems and Text Classifiers. Task 1 requires IR systems to run on corpora acquired at several timestamps, and evaluates the drop in system quality (NDCG) along these timestamps. Task 2 tackles binary sentiment classification at different points in time, and evaluates the performance drop for different temporal gaps. Overall, 37 teams registered for Task 1 and 25 for Task 2. Ultimately, 14 and 4 teams participated in Task 1 and Task 2, respectively.</p
EVALITA Evaluation of NLP and Speech Tools for Italian - December 17th, 2020
Welcome to EVALITA 2020! EVALITA is the evaluation campaign of Natural Language Processing and Speech Tools for Italian. EVALITA is an initiative of the Italian Association for Computational Linguistics (AILC, http://www.ai-lc.it) and it is endorsed by the Italian Association for Artificial Intelligence (AIxIA, http://www.aixia.it) and the Italian Association for Speech Sciences (AISV, http://www.aisv.it)
LongEval: Longitudinal Evaluation of Model Performance at CLEF 2024
International audienceThis paper introduces the planned second LongEval Lab, part of the CLEF 2024 conference. The aim of the lab's two tasks is to give researchers test data for addressing temporal effectiveness persistence challenges in both information retrieval and text classification, motivated by the fact that model performance degrades as the test data becomes temporally distant from the training data. LongEval distinguishes itself from traditional IR and classification tasks by emphasizing the evaluation of models designed to mitigate performance drop over time using evolving data. The second LongEval edition will further engage the IR community and NLP researchers in addressing the crucial challenge of temporal persistence in models, exploring the factors that enable or hinder it, and identifying potential solutions along with their limitations
Overview of the CLEF-2023 LongEval Lab on Longitudinal Evaluation of Model Performance
International audienc