Search CORE

3 research outputs found

Prompt Me Up: Unleashing the Power of Alignments for Multimodal Entity and Relation Extraction

Author: Chen Junzhe
Hu Xuming
Liu Aiwei
Meng Shiao
Wen Lijie
Yu Philip S.
Publication venue
Publication date: 25/10/2023
Field of study

How can we better extract entities and relations from text? Using multimodal extraction with images and text obtains more signals for entities and relations, and aligns them through graphs or hierarchical fusion, aiding in extraction. Despite attempts at various fusions, previous works have overlooked many unlabeled image-caption pairs, such as NewsCLIPing. This paper proposes innovative pre-training objectives for entity-object and relation-image alignment, extracting objects from images and aligning them with entity and relation prompts for soft pseudo-labels. These labels are used as self-supervised signals for pre-training, enhancing the ability to extract entities and relations. Experiments on three datasets show an average 3.41% F1 improvement over prior SOTA. Additionally, our method is orthogonal to previous multimodal fusions, and using it on prior SOTA fusions further improves 5.47% F1.Comment: Accepted to ACM Multimedia 202

arXiv.org e-Print Archive

Interactive video retrieval evaluation at a distance: comparing sixteen interactive video search systems in a remote setting at the 10th Video Browser Showdown

Author: Bailer Werner
Gsteiger Viktor
Gurrin Cathal
Heiko Schuldt
Heller Silvan
Jónsson Björn Þór
Leibetseder Andreas
Lokoč Jakub
Mejzlík František
Peska Ladislav
Rossetto Luca
Schall Konstantin
Schoeffmann Klaus
Schuldt Spiess
Tran Ly-Duyen
Vadicamo Lucia
Veselý Patrik
Vrochidis Stefanos
Wu Jiaxin
Publication venue: Springer
Publication date: 01/01/2022
Field of study

The Video Browser Showdown addresses difficult video search challenges through an annual interactive evaluation campaign attracting research teams focusing on interactive video retrieval. The campaign aims to provide insights into the performance of participating interactive video retrieval systems, tested by selected search tasks on large video collections. For the first time in its ten year history, the Video Browser Showdown 2021 was organized in a fully remote setting and hosted a record number of sixteen scoring systems. In this paper, we describe the competition setting, tasks and results and give an overview of state-of-the-art methods used by the competing systems. By looking at query result logs provided by ten systems, we analyze differences in retrieval model performances and browsing times before a correct submission. Through advances in data gathering methodology and tools, we provide a comprehensive analysis of ad-hoc video search tasks, discuss results, task design and methodological challenges. We highlight that almost all top performing systems utilize some sort of joint embedding for text-image retrieval and enable specification of temporal context in queries for known-item search. Whereas a combination of these techniques drive the currently top performing systems, we identify several future challenges for interactive video search engines and the Video Browser Showdown competition itself

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY