Search CORE

407,203 research outputs found

Helping Hands: An Object-Aware Ego-Centric Video Recognition Model

Author: Gupta Ankush
Zhang Chuhan
Zisserman Andrew
Publication venue
Publication date: 15/08/2023
Field of study

We introduce an object-aware decoder for improving the performance of spatio-temporal representations on ego-centric videos. The key idea is to enhance object-awareness during training by tasking the model to predict hand positions, object positions, and the semantic label of the objects using paired captions when available. At inference time the model only requires RGB frames as inputs, and is able to track and ground objects (although it has not been trained explicitly for this). We demonstrate the performance of the object-aware representations learnt by our model, by: (i) evaluating it for strong transfer, i.e. through zero-shot testing, on a number of downstream video-text retrieval and classification benchmarks; and (ii) by using the representations learned as input for long-term video understanding tasks (e.g. Episodic Memory in Ego4D). In all cases the performance improves over the state of the art -- even compared to networks trained with far larger batch sizes. We also show that by using noisy image-level detection as pseudo-labels in training, the model learns to provide better bounding boxes using video consistency, as well as grounding the words in the associated text descriptions. Overall, we show that the model can act as a drop-in replacement for an ego-centric video model to improve performance through visual-text grounding.Comment: ICCV202

arXiv.org e-Print Archive

Helping hands: an object-aware ego-centric video recognition model

Author: Gupta A
Zhang C
Zisserman Andrew
Publication venue: IEEE
Publication date: 15/01/2024
Field of study

We introduce an object-aware decoder for improving the performance of spatio-temporal representations on egocentric videos. The key idea is to enhance object-awareness during training by tasking the model to predict hand positions, object positions, and the semantic label of the objects using paired captions when available. At inference time the model only requires RGB frames as inputs, and is able to track and ground objects (although it has not been trained explicitly for this).We demonstrate the performance of the object-aware representations learnt by our model, by: (i) evaluating it for strong transfer, i.e. through zero-shot testing, on a number of downstream video-text retrieval and classification benchmarks; and (ii) by using the representations learned as input for long-term video understanding tasks (e.g. Episodic Memory in Ego4D). In all cases the performance improves over the state of the art—even compared to networks trained with far larger batch sizes. We also show that by using noisy image-level detection as pseudo-labels in training, the model learns to provide better bounding boxes using video consistency, as well as grounding the words in the associated text descriptions.Overall, we show that the model can act as a drop-in replacement for an ego-centric video model to improve performance through visual-text grounding

Oxford University Research Archive

The branding of female authorship in the Enlightenment:A paratextual and iconographical study of a European best-seller, Les Journées amusantes by Madeleine-Angélique de Gomez

Author: Genieys-Kirk Severine
Publication venue
Publication date: 01/01/2023
Field of study

As revisionist studies have recently shown in the wake of Gérard Genette’s Seuils (1987), editorial paratexts in translated works, such as prefaces and illustrations, are valuable documents for capturing the ideological parameters which early modern publishers and translators had to skilfully exploit to promote their work. A case in point is the little known yet important eighteenth-century collection of framed-novelle Les Journées amusantes (1722–31) by Madeleine-Angélique Poisson de Gomez (1684–1776). Through the lens of intertextuality and intericonicity, this article offers a two-part analysis of the paratextual material (verbal and visual) contained in the foreign editions of this work. It evaluates the strategies which ‘image-makers’ used to ensure the legitimacy of a text which was originally written by a woman. In particular, it highlights transnational instances of dialogic interplay and cultural transfer, allowing for a better understanding of the female writer’s status across Europe and revealing the cultural and pedagogical parts which translators, publishers and engravers played in the formation of eighteenth-century European readerships

Edinburgh Research Explorer

Multimodal Grounding for Language Processing

Author: Beinborn Lisa
Botschen Teresa
Gurevych Iryna
Publication venue
Publication date: 01/01/2018
Field of study

This survey discusses how recent developments in multimodal processing facilitate conceptual grounding of language. We categorize the information flow in multimodal processing with respect to cognitive models of human information processing and analyze different methods for combining multimodal representations. Based on this methodological inventory, we discuss the benefit of multimodal grounding for a variety of language processing tasks and the challenges that arise. We particularly focus on multimodal grounding of verbs which play a crucial role for the compositional power of language.Comment: The paper has been published in the Proceedings of the 27 Conference of Computational Linguistics. Please refer to this version for citations: https://www.aclweb.org/anthology/papers/C/C18/C18-1197

arXiv.org e-Print Archive

TUbiblio

VU Research Portal

International Migration, Integration and Social Cohesion online publications

UvA-DARE