9,167 research outputs found
Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps
This paper addresses the visualisation of image classification models, learnt
using deep Convolutional Networks (ConvNets). We consider two visualisation
techniques, based on computing the gradient of the class score with respect to
the input image. The first one generates an image, which maximises the class
score [Erhan et al., 2009], thus visualising the notion of the class, captured
by a ConvNet. The second technique computes a class saliency map, specific to a
given image and class. We show that such maps can be employed for weakly
supervised object segmentation using classification ConvNets. Finally, we
establish the connection between the gradient-based ConvNet visualisation
methods and deconvolutional networks [Zeiler et al., 2013]
Verb Physics: Relative Physical Knowledge of Actions and Objects
Learning commonsense knowledge from natural language text is nontrivial due
to reporting bias: people rarely state the obvious, e.g., "My house is bigger
than me." However, while rarely stated explicitly, this trivial everyday
knowledge does influence the way people talk about the world, which provides
indirect clues to reason about the world. For example, a statement like, "Tyler
entered his house" implies that his house is bigger than Tyler.
In this paper, we present an approach to infer relative physical knowledge of
actions and objects along five dimensions (e.g., size, weight, and strength)
from unstructured natural language text. We frame knowledge acquisition as
joint inference over two closely related problems: learning (1) relative
physical knowledge of object pairs and (2) physical implications of actions
when applied to those object pairs. Empirical results demonstrate that it is
possible to extract knowledge of actions and objects from language and that
joint inference over different types of knowledge improves performance.Comment: 11 pages, published in Proceedings of ACL 201
Temporally coherent 4D reconstruction of complex dynamic scenes
This paper presents an approach for reconstruction of 4D temporally coherent
models of complex dynamic scenes. No prior knowledge is required of scene
structure or camera calibration allowing reconstruction from multiple moving
cameras. Sparse-to-dense temporal correspondence is integrated with joint
multi-view segmentation and reconstruction to obtain a complete 4D
representation of static and dynamic objects. Temporal coherence is exploited
to overcome visual ambiguities resulting in improved reconstruction of complex
scenes. Robust joint segmentation and reconstruction of dynamic objects is
achieved by introducing a geodesic star convexity constraint. Comparative
evaluation is performed on a variety of unstructured indoor and outdoor dynamic
scenes with hand-held cameras and multiple people. This demonstrates
reconstruction of complete temporally coherent 4D scene models with improved
nonrigid object segmentation and shape reconstruction.Comment: To appear in The IEEE Conference on Computer Vision and Pattern
Recognition (CVPR) 2016 . Video available at:
https://www.youtube.com/watch?v=bm_P13_-Ds
A graphical model based solution to the facial feature point tracking problem
In this paper a facial feature point tracker that is motivated by applications
such as human-computer interfaces and facial expression analysis systems is
proposed. The proposed tracker is based on a graphical model framework. The
facial features are tracked through video streams by incorporating statistical relations in time as well as spatial relations between feature points. By exploiting the spatial relationships between feature points, the proposed method provides robustness in real-world conditions such as arbitrary head movements and occlusions. A Gabor feature-based occlusion detector is developed and used to handle occlusions. The performance of the proposed tracker has been evaluated
on real video data under various conditions including occluded facial gestures
and head movements. It is also compared to two popular methods, one based
on Kalman filtering exploiting temporal relations, and the other based on active
appearance models (AAM). Improvements provided by the proposed approach
are demonstrated through both visual displays and quantitative analysis
- …