227,462 research outputs found
Understanding Video Scenes through Text: Insights from Text-based Video Question Answering
Researchers have extensively studied the field of vision and language,
discovering that both visual and textual content is crucial for understanding
scenes effectively. Particularly, comprehending text in videos holds great
significance, requiring both scene text understanding and temporal reasoning.
This paper focuses on exploring two recently introduced datasets, NewsVideoQA
and M4-ViteVQA, which aim to address video question answering based on textual
content. The NewsVideoQA dataset contains question-answer pairs related to the
text in news videos, while M4-ViteVQA comprises question-answer pairs from
diverse categories like vlogging, traveling, and shopping. We provide an
analysis of the formulation of these datasets on various levels, exploring the
degree of visual understanding and multi-frame comprehension required for
answering the questions. Additionally, the study includes experimentation with
BERT-QA, a text-only model, which demonstrates comparable performance to the
original methods on both datasets, indicating the shortcomings in the
formulation of these datasets. Furthermore, we also look into the domain
adaptation aspect by examining the effectiveness of training on M4-ViteVQA and
evaluating on NewsVideoQA and vice-versa, thereby shedding light on the
challenges and potential benefits of out-of-domain training
Going Deeper into Action Recognition: A Survey
Understanding human actions in visual data is tied to advances in
complementary research areas including object recognition, human dynamics,
domain adaptation and semantic segmentation. Over the last decade, human action
analysis evolved from earlier schemes that are often limited to controlled
environments to nowadays advanced solutions that can learn from millions of
videos and apply to almost all daily activities. Given the broad range of
applications from video surveillance to human-computer interaction, scientific
milestones in action recognition are achieved more rapidly, eventually leading
to the demise of what used to be good in a short time. This motivated us to
provide a comprehensive review of the notable steps taken towards recognizing
human actions. To this end, we start our discussion with the pioneering methods
that use handcrafted representations, and then, navigate into the realm of deep
learning based approaches. We aim to remain objective throughout this survey,
touching upon encouraging improvements as well as inevitable fallbacks, in the
hope of raising fresh questions and motivating new research directions for the
reader
Self-Supervised Learning Across Domains
Human adaptability relies crucially on learning and merging knowledge from both supervised and unsupervised tasks: the parents point out few important concepts, but then the children fill in the gaps on their own. This is particularly effective, because supervised learning can never be exhaustive and thus learning autonomously allows to discover invariances and regularities that help to generalize. In this paper we propose to apply a similar approach to the problem of object recognition across domains: our model learns the semantic labels in a supervised fashion, and broadens its understanding of the data by learning from self-supervised signals on the same images. This secondary task helps the network to focus on object shapes, learning concepts like spatial orientation and part correlation, while acting as a regularizer for the classification task over multiple visual domains. Extensive experiments confirm our intuition and show that our multi-task method combining supervised and self-supervised knowledge shows competitive results with respect to more complex domain generalization and adaptation solutions. It also proves its potential in the novel and challenging predictive and partial domain adaptation scenarios
Learning the Roots of Visual Domain Shift
In this paper we focus on the spatial nature of visual domain shift,
attempting to learn where domain adaptation originates in each given image of
the source and target set. We borrow concepts and techniques from the CNN
visualization literature, and learn domainnes maps able to localize the degree
of domain specificity in images. We derive from these maps features related to
different domainnes levels, and we show that by considering them as a
preprocessing step for a domain adaptation algorithm, the final classification
performance is strongly improved. Combined with the whole image representation,
these features provide state of the art results on the Office dataset.Comment: Extended Abstrac
Adaptive behavior of neighboring neurons during adaptation-induced plasticity of orientation tuning in V1
<p>Abstract</p> <p>Background</p> <p>Sensory neurons display transient changes of their response properties following prolonged exposure to an appropriate stimulus (adaptation). In adult cat primary visual cortex, orientation-selective neurons shift their preferred orientation after being adapted to a non-preferred orientation. The direction of those shifts, towards (attractive) or away (repulsive) from the adapter depends mostly on adaptation duration. How the adaptive behavior of a neuron is related to that of its neighbors remains unclear.</p> <p>Results</p> <p>Here we show that in most cases (75%), cells shift their preferred orientation in the same direction as their neighbors. We also found that cells shifting preferred orientation differently from their neighbors (25%) display three interesting properties: (i) larger variance of absolute shift amplitude, (ii) wider tuning bandwidth and (iii) larger range of preferred orientations among the cluster of cells. Several response properties of V1 neurons depend on their location within the cortical orientation map. Our results suggest that recording sites with both attractive and repulsive shifts following adaptation may be located in close proximity to iso-orientation domain boundaries or pinwheel centers. Indeed, those regions have a more diverse orientation distribution of local inputs that could account for the three properties above. On the other hand, sites with all cells shifting their preferred orientation in the same direction could be located within iso-orientation domains.</p> <p>Conclusions</p> <p>Our results suggest that the direction and amplitude of orientation preference shifts in V1 depend on location within the orientation map. This anisotropy of adaptation-induced plasticity, comparable to that of the visual cortex itself, could have important implications for our understanding of visual adaptation at the psychophysical level.</p
- …