760 research outputs found
Anticipating Visual Representations from Unlabeled Video
Anticipating actions and objects before they start or appear is a difficult
problem in computer vision with several real-world applications. This task is
challenging partly because it requires leveraging extensive knowledge of the
world that is difficult to write down. We believe that a promising resource for
efficiently learning this knowledge is through readily available unlabeled
video. We present a framework that capitalizes on temporal structure in
unlabeled video to learn to anticipate human actions and objects. The key idea
behind our approach is that we can train deep networks to predict the visual
representation of images in the future. Visual representations are a promising
prediction target because they encode images at a higher semantic level than
pixels yet are automatic to compute. We then apply recognition algorithms on
our predicted representation to anticipate objects and actions. We
experimentally validate this idea on two datasets, anticipating actions one
second in the future and objects five seconds in the future.Comment: CVPR 201
Contextual Influences on Saliency
This article describes a model for including scene/context priors in attention guidance. In the proposed scheme, visual context information can be available early in the visual processing chain, in order to modulate the saliency of image regions and to provide an efficient short cut for object detection and recognition. The scene is represented by means of a low-dimensional global description obtained from low-level features. The global scene features are then used to predict the probability of presence of the target object in the scene, and its location and scale, before exploring the image. Scene information can then be used to modulate the saliency of image regions early during the visual processing in order to provide an efficient short cut for object detection and recognition
Predicting Motivations of Actions by Leveraging Text
Understanding human actions is a key problem in computer vision. However,
recognizing actions is only the first step of understanding what a person is
doing. In this paper, we introduce the problem of predicting why a person has
performed an action in images. This problem has many applications in human
activity understanding, such as anticipating or explaining an action. To study
this problem, we introduce a new dataset of people performing actions annotated
with likely motivations. However, the information in an image alone may not be
sufficient to automatically solve this task. Since humans can rely on their
lifetime of experiences to infer motivation, we propose to give computer vision
systems access to some of these experiences by using recently developed natural
language models to mine knowledge stored in massive amounts of text. While we
are still far away from fully understanding motivation, our results suggest
that transferring knowledge from language into vision can help machines
understand why people in images might be performing an action.Comment: CVPR 201
Interpreting Deep Visual Representations via Network Dissection
The success of recent deep convolutional neural networks (CNNs) depends on
learning hidden representations that can summarize the important factors of
variation behind the data. However, CNNs often criticized as being black boxes
that lack interpretability, since they have millions of unexplained model
parameters. In this work, we describe Network Dissection, a method that
interprets networks by providing labels for the units of their deep visual
representations. The proposed method quantifies the interpretability of CNN
representations by evaluating the alignment between individual hidden units and
a set of visual semantic concepts. By identifying the best alignments, units
are given human interpretable labels across a range of objects, parts, scenes,
textures, materials, and colors. The method reveals that deep representations
are more transparent and interpretable than expected: we find that
representations are significantly more interpretable than they would be under a
random equivalently powerful basis. We apply the method to interpret and
compare the latent representations of various network architectures trained to
solve different supervised and self-supervised training tasks. We then examine
factors affecting the network interpretability such as the number of the
training iterations, regularizations, different initializations, and the
network depth and width. Finally we show that the interpreted units can be used
to provide explicit explanations of a prediction given by a CNN for an image.
Our results highlight that interpretability is an important property of deep
neural networks that provides new insights into their hierarchical structure.Comment: *B. Zhou and D. Bau contributed equally to this work. 15 pages, 27
figure
Learning to Act Properly: Predicting and Explaining Affordances from Images
We address the problem of affordance reasoning in diverse scenes that appear
in the real world. Affordances relate the agent's actions to their effects when
taken on the surrounding objects. In our work, we take the egocentric view of
the scene, and aim to reason about action-object affordances that respect both
the physical world as well as the social norms imposed by the society. We also
aim to teach artificial agents why some actions should not be taken in certain
situations, and what would likely happen if these actions would be taken. We
collect a new dataset that builds upon ADE20k, referred to as ADE-Affordance,
which contains annotations enabling such rich visual reasoning. We propose a
model that exploits Graph Neural Networks to propagate contextual information
from the scene in order to perform detailed affordance reasoning about each
object. Our model is showcased through various ablation studies, pointing to
successes and challenges in this complex task
Reseña: Boudot et al . 2009. Atlas of the Odonata of the Mediterranean and North Africa
Comentarios y reseña del trabajo Boudot et al. 2009. Atlas of the Odonata of the Mediterranean and North Africa. Libellula Supplement, 9: 1-256 pp
Reseña: Clave dicotómica para la identificación de macroinvertebrados de la cuenca del Ebro. Oscoz et al. 2011.
Comentarios y reseña de la obra: Clave dicotómica para la identificación de macroinvertebrados de la cuenca del Ebro. Oscoz et al. 2011. Confederación Hidrográfica del Ebro, Zaragoza, 66 pp
- …