197 research outputs found
RED-DOT: Multimodal Fact-checking via Relevant Evidence Detection
Online misinformation is often multimodal in nature, i.e., it is caused by
misleading associations between texts and accompanying images. To support the
fact-checking process, researchers have been recently developing automatic
multimodal methods that gather and analyze external information, evidence,
related to the image-text pairs under examination. However, prior works assumed
all external information collected from the web to be relevant. In this study,
we introduce a "Relevant Evidence Detection" (RED) module to discern whether
each piece of evidence is relevant, to support or refute the claim.
Specifically, we develop the "Relevant Evidence Detection Directed Transformer"
(RED-DOT) and explore multiple architectural variants (e.g., single or
dual-stage) and mechanisms (e.g., "guided attention"). Extensive ablation and
comparative experiments demonstrate that RED-DOT achieves significant
improvements over the state-of-the-art (SotA) on the VERITE benchmark by up to
33.7%. Furthermore, our evidence re-ranking and element-wise modality fusion
led to RED-DOT surpassing the SotA on NewsCLIPings+ by up to 3% without the
need for numerous evidence or multiple backbone encoders. We release our code
at: https://github.com/stevejpapad/relevant-evidence-detectio
Credible, Unreliable or Leaked?: Evidence Verification for Enhanced Automated Fact-checking
Automated fact-checking (AFC) is garnering increasing attention by
researchers aiming to help fact-checkers combat the increasing spread of
misinformation online. While many existing AFC methods incorporate external
information from the Web to help examine the veracity of claims, they often
overlook the importance of verifying the source and quality of collected
"evidence". One overlooked challenge involves the reliance on "leaked
evidence", information gathered directly from fact-checking websites and used
to train AFC systems, resulting in an unrealistic setting for early
misinformation detection. Similarly, the inclusion of information from
unreliable sources can undermine the effectiveness of AFC systems. To address
these challenges, we present a comprehensive approach to evidence verification
and filtering. We create the "CREDible, Unreliable or LEaked" (CREDULE)
dataset, which consists of 91,632 articles classified as Credible, Unreliable
and Fact checked (Leaked). Additionally, we introduce the EVidence VERification
Network (EVVER-Net), trained on CREDULE to detect leaked and unreliable
evidence in both short and long texts. EVVER-Net can be used to filter evidence
collected from the Web, thus enhancing the robustness of end-to-end AFC
systems. We experiment with various language models and show that EVVER-Net can
demonstrate impressive performance of up to 91.5% and 94.4% accuracy, while
leveraging domain credibility scores along with short or long texts,
respectively. Finally, we assess the evidence provided by widely-used
fact-checking datasets including LIAR-PLUS, MOCHEG, FACTIFY, NewsCLIPpings+ and
VERITE, some of which exhibit concerning rates of leaked and unreliable
evidence
VERITE: A Robust Benchmark for Multimodal Misinformation Detection Accounting for Unimodal Bias
Multimedia content has become ubiquitous on social media platforms, leading
to the rise of multimodal misinformation (MM) and the urgent need for effective
strategies to detect and prevent its spread. In recent years, the challenge of
multimodal misinformation detection (MMD) has garnered significant attention by
researchers and has mainly involved the creation of annotated, weakly
annotated, or synthetically generated training datasets, along with the
development of various deep learning MMD models. However, the problem of
unimodal bias in MMD benchmarks -- where biased or unimodal methods outperform
their multimodal counterparts on an inherently multimodal task -- has been
overlooked. In this study, we systematically investigate and identify the
presence of unimodal bias in widely-used MMD benchmarks (VMU-Twitter, COSMOS),
raising concerns about their suitability for reliable evaluation. To address
this issue, we introduce the "VERification of Image-TExtpairs" (VERITE)
benchmark for MMD which incorporates real-world data, excludes "asymmetric
multimodal misinformation" and utilizes "modality balancing". We conduct an
extensive comparative study with a Transformer-based architecture that shows
the ability of VERITE to effectively address unimodal bias, rendering it a
robust evaluation framework for MMD. Furthermore, we introduce a new method --
termed Crossmodal HArd Synthetic MisAlignment (CHASMA) -- for generating
realistic synthetic training data that preserve crossmodal relations between
legitimate images and false human-written captions. By leveraging CHASMA in the
training process, we observe consistent and notable improvements in predictive
performance on VERITE; with a 9.2% increase in accuracy. We release our code
at: https://github.com/stevejpapad/image-text-verificatio
Sampling Strategies for Mitigating Bias in Face Synthesis Methods
Synthetically generated images can be used to create media content or to
complement datasets for training image analysis models. Several methods have
recently been proposed for the synthesis of high-fidelity face images; however,
the potential biases introduced by such methods have not been sufficiently
addressed. This paper examines the bias introduced by the widely popular
StyleGAN2 generative model trained on the Flickr Faces HQ dataset and proposes
two sampling strategies to balance the representation of selected attributes in
the generated face images. We focus on two protected attributes, gender and
age, and reveal that biases arise in the distribution of randomly sampled
images against very young and very old age groups, as well as against female
faces. These biases are also assessed for different image quality levels based
on the GIQA score. To mitigate bias, we propose two alternative methods for
sampling on selected lines or spheres of the latent space to increase the
number of generated samples from the under-represented classes. The
experimental results show a decrease in bias against underrepresented groups
and a more uniform distribution of the protected features at different levels
of image quality.Comment: Accepted to the BIAS 2023 ECML-PKDD Worksho
ReSEED: Social Event dEtection Dataset
Reuter T, Papadopoulos S, Mezaris V, Cimiano P. ReSEED: Social Event dEtection Dataset. In: MMSys '14. Proceedings of the 5th ACM Multimedia Systems Conference . New York: ACM; 2014: 35-40.Nowadays, digital cameras are very popular among people and quite every mobile phone has a build-in camera. Social events have a prominent role in people’s life. Thus, people take pictures of events they take part in and more and more of them upload these to well-known online photo community sites like Flickr. The number of pictures uploaded to these sites is still proliferating and there is a great interest in automatizing the process of event clustering so that every incoming (picture) document can be assigned to the corresponding event without the need of human interaction. These social events are defined as events that are planned by people, attended by people and for which the social multimedia are also captured by people. There is an urgent need to develop algorithms which are capable of grouping media by the social events they depict or are related to. In order to train, test, and evaluate such algorithms and frameworks, we present a dataset that consists of about 430,000 photos from Flickr together with the underlying ground truth consisting of about 21,000 social events. All the photos are accompanied by their textual metadata. The ground truth for the event groupings has been derived from event calendars on the Web that have been created collaboratively by people. The dataset has been used in the Social Event Detection (SED) task that was part of the MediaEval Benchmark for Multimedia Evaluation 2013. This task required participants to discover social events and organize the related media items in event-specific clusters within a collection of Web multimedia documents. In this paper we describe how the dataset has been collected and the creation of the ground truth together with a proposed evaluation methodology and a brief description of the corresponding task challenge as applied in the context of the Social Event Detection task
Mitigating Viewer Impact from Disturbing Imagery using AI Filters: A User-Study
Exposure to disturbing imagery can significantly impact individuals,
especially professionals who encounter such content as part of their work. This
paper presents a user study, involving 107 participants, predominantly
journalists and human rights investigators, that explores the capability of
Artificial Intelligence (AI)-based image filters to potentially mitigate the
emotional impact of viewing such disturbing content. We tested five different
filter styles, both traditional (Blurring and Partial Blurring) and AI-based
(Drawing, Colored Drawing, and Painting), and measured their effectiveness in
terms of conveying image information while reducing emotional distress. Our
findings suggest that the AI-based Drawing style filter demonstrates the best
performance, offering a promising solution for reducing negative feelings
(-30.38%) while preserving the interpretability of the image (97.19%). Despite
the requirement for many professionals to eventually inspect the original
images, participants suggested potential strategies for integrating AI filters
into their workflow, such as using AI filters as an initial, preparatory step
before viewing the original image. Overall, this paper contributes to the
development of a more ethically considerate and effective visual environment
for professionals routinely engaging with potentially disturbing imagery
- …