Search CORE

227,462 research outputs found

Understanding Video Scenes through Text: Insights from Text-based Video Question Answering

Author: Jahagirdar Soumya
Jawahar C. V.
Karatzas Dimosthenis
Mathew Minesh
Publication venue
Publication date: 04/09/2023
Field of study

Researchers have extensively studied the field of vision and language, discovering that both visual and textual content is crucial for understanding scenes effectively. Particularly, comprehending text in videos holds great significance, requiring both scene text understanding and temporal reasoning. This paper focuses on exploring two recently introduced datasets, NewsVideoQA and M4-ViteVQA, which aim to address video question answering based on textual content. The NewsVideoQA dataset contains question-answer pairs related to the text in news videos, while M4-ViteVQA comprises question-answer pairs from diverse categories like vlogging, traveling, and shopping. We provide an analysis of the formulation of these datasets on various levels, exploring the degree of visual understanding and multi-frame comprehension required for answering the questions. Additionally, the study includes experimentation with BERT-QA, a text-only model, which demonstrates comparable performance to the original methods on both datasets, indicating the shortcomings in the formulation of these datasets. Furthermore, we also look into the domain adaptation aspect by examining the effectiveness of training on M4-ViteVQA and evaluating on NewsVideoQA and vice-versa, thereby shedding light on the challenges and potential benefits of out-of-domain training

arXiv.org e-Print Archive

Going Deeper into Action Recognition: A Survey

Author: Harandi Mehrtash
Herath Samitha
Porikli Fatih
Publication venue
Publication date: 01/01/2017
Field of study

Understanding human actions in visual data is tied to advances in complementary research areas including object recognition, human dynamics, domain adaptation and semantic segmentation. Over the last decade, human action analysis evolved from earlier schemes that are often limited to controlled environments to nowadays advanced solutions that can learn from millions of videos and apply to almost all daily activities. Given the broad range of applications from video surveillance to human-computer interaction, scientific milestones in action recognition are achieved more rapidly, eventually leading to the demise of what used to be good in a short time. This motivated us to provide a comprehensive review of the notable steps taken towards recognizing human actions. To this end, we start our discussion with the pioneering methods that use handcrafted representations, and then, navigate into the realm of deep learning based approaches. We aim to remain objective throughout this survey, touching upon encouraging improvements as well as inevitable fallbacks, in the hope of raising fresh questions and motivating new research directions for the reader

arXiv.org e-Print Archive

The Australian National University

Self-Supervised Learning Across Domains

Author: Bucci S.
Caputo B.
Carlucci F. M.
D'Innocente A.
Liao Y.
Tommasi T.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2022
Field of study

Human adaptability relies crucially on learning and merging knowledge from both supervised and unsupervised tasks: the parents point out few important concepts, but then the children fill in the gaps on their own. This is particularly effective, because supervised learning can never be exhaustive and thus learning autonomously allows to discover invariances and regularities that help to generalize. In this paper we propose to apply a similar approach to the problem of object recognition across domains: our model learns the semantic labels in a supervised fashion, and broadens its understanding of the data by learning from self-supervised signals on the same images. This secondary task helps the network to focus on object shapes, learning concepts like spatial orientation and part correlation, while acting as a regularizer for the classification task over multiple visual domains. Extensive experiments confirm our intuition and show that our multi-task method combining supervised and self-supervised knowledge shows competitive results with respect to more complex domain generalization and adaptation solutions. It also proves its potential in the novel and challenging predictive and partial domain adaptation scenarios

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

Learning the Roots of Visual Domain Shift

Author: A Gretton
B Fernando
J Hoffman
K Saenko
MD Zeiler
Y Gong
Publication venue
Publication date: 01/01/2016
Field of study

In this paper we focus on the spatial nature of visual domain shift, attempting to learn where domain adaptation originates in each given image of the source and target set. We borrow concepts and techniques from the CNN visualization literature, and learn domainnes maps able to localize the degree of domain specificity in images. We derive from these maps features related to different domainnes levels, and we show that by considering them as a preprocessing step for a domain adaptation algorithm, the final classification performance is strongly improved. Combined with the whole image representation, these features provide state of the art results on the Office dataset.Comment: Extended Abstrac

arXiv.org e-Print Archive

Crossref

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

Archivio della ricerca- Università di Roma La Sapienza

Adaptive behavior of neighboring neurons during adaptation-induced plasticity of orientation tuning in V1

Author: Ghisovan Narcis
Molotchnikoff Stéphane
Nemri Abdellatif
Shumikhina Svetlana
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background Sensory neurons display transient changes of their response properties following prolonged exposure to an appropriate stimulus (adaptation). In adult cat primary visual cortex, orientation-selective neurons shift their preferred orientation after being adapted to a non-preferred orientation. The direction of those shifts, towards (attractive) or away (repulsive) from the adapter depends mostly on adaptation duration. How the adaptive behavior of a neuron is related to that of its neighbors remains unclear. Results Here we show that in most cases (75%), cells shift their preferred orientation in the same direction as their neighbors. We also found that cells shifting preferred orientation differently from their neighbors (25%) display three interesting properties: (i) larger variance of absolute shift amplitude, (ii) wider tuning bandwidth and (iii) larger range of preferred orientations among the cluster of cells. Several response properties of V1 neurons depend on their location within the cortical orientation map. Our results suggest that recording sites with both attractive and repulsive shifts following adaptation may be located in close proximity to iso-orientation domain boundaries or pinwheel centers. Indeed, those regions have a more diverse orientation distribution of local inputs that could account for the three properties above. On the other hand, sites with all cells shifting their preferred orientation in the same direction could be located within iso-orientation domains. Conclusions Our results suggest that the direction and amplitude of orientation preference shifts in V1 depend on location within the orientation map. This anisotropy of adaptation-induced plasticity, comparable to that of the visual cortex itself, could have important implications for our understanding of visual adaptation at the psychophysical level.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central