188 research outputs found
RED-DOT: Multimodal Fact-checking via Relevant Evidence Detection
Online misinformation is often multimodal in nature, i.e., it is caused by
misleading associations between texts and accompanying images. To support the
fact-checking process, researchers have been recently developing automatic
multimodal methods that gather and analyze external information, evidence,
related to the image-text pairs under examination. However, prior works assumed
all external information collected from the web to be relevant. In this study,
we introduce a "Relevant Evidence Detection" (RED) module to discern whether
each piece of evidence is relevant, to support or refute the claim.
Specifically, we develop the "Relevant Evidence Detection Directed Transformer"
(RED-DOT) and explore multiple architectural variants (e.g., single or
dual-stage) and mechanisms (e.g., "guided attention"). Extensive ablation and
comparative experiments demonstrate that RED-DOT achieves significant
improvements over the state-of-the-art (SotA) on the VERITE benchmark by up to
33.7%. Furthermore, our evidence re-ranking and element-wise modality fusion
led to RED-DOT surpassing the SotA on NewsCLIPings+ by up to 3% without the
need for numerous evidence or multiple backbone encoders. We release our code
at: https://github.com/stevejpapad/relevant-evidence-detectio
VERITE: A Robust Benchmark for Multimodal Misinformation Detection Accounting for Unimodal Bias
Multimedia content has become ubiquitous on social media platforms, leading
to the rise of multimodal misinformation (MM) and the urgent need for effective
strategies to detect and prevent its spread. In recent years, the challenge of
multimodal misinformation detection (MMD) has garnered significant attention by
researchers and has mainly involved the creation of annotated, weakly
annotated, or synthetically generated training datasets, along with the
development of various deep learning MMD models. However, the problem of
unimodal bias in MMD benchmarks -- where biased or unimodal methods outperform
their multimodal counterparts on an inherently multimodal task -- has been
overlooked. In this study, we systematically investigate and identify the
presence of unimodal bias in widely-used MMD benchmarks (VMU-Twitter, COSMOS),
raising concerns about their suitability for reliable evaluation. To address
this issue, we introduce the "VERification of Image-TExtpairs" (VERITE)
benchmark for MMD which incorporates real-world data, excludes "asymmetric
multimodal misinformation" and utilizes "modality balancing". We conduct an
extensive comparative study with a Transformer-based architecture that shows
the ability of VERITE to effectively address unimodal bias, rendering it a
robust evaluation framework for MMD. Furthermore, we introduce a new method --
termed Crossmodal HArd Synthetic MisAlignment (CHASMA) -- for generating
realistic synthetic training data that preserve crossmodal relations between
legitimate images and false human-written captions. By leveraging CHASMA in the
training process, we observe consistent and notable improvements in predictive
performance on VERITE; with a 9.2% increase in accuracy. We release our code
at: https://github.com/stevejpapad/image-text-verificatio
ReSEED: Social Event dEtection Dataset
Reuter T, Papadopoulos S, Mezaris V, Cimiano P. ReSEED: Social Event dEtection Dataset. In: MMSys '14. Proceedings of the 5th ACM Multimedia Systems Conference . New York: ACM; 2014: 35-40.Nowadays, digital cameras are very popular among people and quite every mobile phone has a build-in camera. Social events have a prominent role in people’s life. Thus, people take pictures of events they take part in and more and more of them upload these to well-known online photo community sites like Flickr. The number of pictures uploaded to these sites is still proliferating and there is a great interest in automatizing the process of event clustering so that every incoming (picture) document can be assigned to the corresponding event without the need of human interaction. These social events are defined as events that are planned by people, attended by people and for which the social multimedia are also captured by people. There is an urgent need to develop algorithms which are capable of grouping media by the social events they depict or are related to. In order to train, test, and evaluate such algorithms and frameworks, we present a dataset that consists of about 430,000 photos from Flickr together with the underlying ground truth consisting of about 21,000 social events. All the photos are accompanied by their textual metadata. The ground truth for the event groupings has been derived from event calendars on the Web that have been created collaboratively by people. The dataset has been used in the Social Event Detection (SED) task that was part of the MediaEval Benchmark for Multimedia Evaluation 2013. This task required participants to discover social events and organize the related media items in event-specific clusters within a collection of Web multimedia documents. In this paper we describe how the dataset has been collected and the creation of the ground truth together with a proposed evaluation methodology and a brief description of the corresponding task challenge as applied in the context of the Social Event Detection task
Mitigating Viewer Impact from Disturbing Imagery using AI Filters: A User-Study
Exposure to disturbing imagery can significantly impact individuals,
especially professionals who encounter such content as part of their work. This
paper presents a user study, involving 107 participants, predominantly
journalists and human rights investigators, that explores the capability of
Artificial Intelligence (AI)-based image filters to potentially mitigate the
emotional impact of viewing such disturbing content. We tested five different
filter styles, both traditional (Blurring and Partial Blurring) and AI-based
(Drawing, Colored Drawing, and Painting), and measured their effectiveness in
terms of conveying image information while reducing emotional distress. Our
findings suggest that the AI-based Drawing style filter demonstrates the best
performance, offering a promising solution for reducing negative feelings
(-30.38%) while preserving the interpretability of the image (97.19%). Despite
the requirement for many professionals to eventually inspect the original
images, participants suggested potential strategies for integrating AI filters
into their workflow, such as using AI filters as an initial, preparatory step
before viewing the original image. Overall, this paper contributes to the
development of a more ethically considerate and effective visual environment
for professionals routinely engaging with potentially disturbing imagery
AdaCC: cumulative cost-sensitive boosting for imbalanced classification
Class imbalance poses a major challenge for machine learning as most supervised learning models might exhibit bias towards the majority class and under-perform in the minority class. Cost-sensitive learning tackles this problem by treating the classes differently, formulated typically via a user-defined fixed misclassification cost matrix provided as input to the learner. Such parameter tuning is a challenging task that requires domain knowledge and moreover, wrong adjustments might lead to overall predictive performance deterioration. In this work, we propose a novel cost-sensitive boosting approach for imbalanced data that dynamically adjusts the misclassification costs over the boosting rounds in response to model’s performance instead of using a fixed misclassification cost matrix. Our method, called AdaCC, is parameter-free as it relies on the cumulative behavior of the boosting model in order to adjust the misclassification costs for the next boosting round and comes with theoretical guarantees regarding the training error. Experiments on 27 real-world datasets from different domains with high class imbalance demonstrate the superiority of our method over 12 state-of-the-art cost-sensitive boosting approaches exhibiting consistent improvements in different measures, for instance, in the range of [0.3–28.56%] for AUC, [3.4–21.4%] for balanced accuracy, [4.8–45%] for gmean and [7.4–85.5%] for recall
AdaCC: Cumulative Cost-Sensitive Boosting for Imbalanced Classification
Class imbalance poses a major challenge for machine learning as most
supervised learning models might exhibit bias towards the majority class and
under-perform in the minority class. Cost-sensitive learning tackles this
problem by treating the classes differently, formulated typically via a
user-defined fixed misclassification cost matrix provided as input to the
learner. Such parameter tuning is a challenging task that requires domain
knowledge and moreover, wrong adjustments might lead to overall predictive
performance deterioration. In this work, we propose a novel cost-sensitive
boosting approach for imbalanced data that dynamically adjusts the
misclassification costs over the boosting rounds in response to model's
performance instead of using a fixed misclassification cost matrix. Our method,
called AdaCC, is parameter-free as it relies on the cumulative behavior of the
boosting model in order to adjust the misclassification costs for the next
boosting round and comes with theoretical guarantees regarding the training
error. Experiments on 27 real-world datasets from different domains with high
class imbalance demonstrate the superiority of our method over 12
state-of-the-art cost-sensitive boosting approaches exhibiting consistent
improvements in different measures, for instance, in the range of [0.3%-28.56%]
for AUC, [3.4%-21.4%] for balanced accuracy, [4.8%-45%] for gmean and
[7.4%-85.5%] for recall.Comment: 30 page
- …