183 research outputs found
Human Detection and Tracking for Video Surveillance A Cognitive Science Approach
With crimes on the rise all around the world, video surveillance is becoming
more important day by day. Due to the lack of human resources to monitor this
increasing number of cameras manually new computer vision algorithms to perform
lower and higher level tasks are being developed. We have developed a new
method incorporating the most acclaimed Histograms of Oriented Gradients the
theory of Visual Saliency and the saliency prediction model Deep Multi Level
Network to detect human beings in video sequences. Furthermore we implemented
the k Means algorithm to cluster the HOG feature vectors of the positively
detected windows and determined the path followed by a person in the video. We
achieved a detection precision of 83.11% and a recall of 41.27%. We obtained
these results 76.866 times faster than classification on normal images.Comment: ICCV 2017 Venice, Italy Pages 5 Figures
Persistent Evidence of Local Image Properties in Generic ConvNets
Supervised training of a convolutional network for object classification
should make explicit any information related to the class of objects and
disregard any auxiliary information associated with the capture of the image or
the variation within the object class. Does this happen in practice? Although
this seems to pertain to the very final layers in the network, if we look at
earlier layers we find that this is not the case. Surprisingly, strong spatial
information is implicit. This paper addresses this, in particular, exploiting
the image representation at the first fully connected layer, i.e. the global
image descriptor which has been recently shown to be most effective in a range
of visual recognition tasks. We empirically demonstrate evidences for the
finding in the contexts of four different tasks: 2d landmark detection, 2d
object keypoints prediction, estimation of the RGB values of input image, and
recovery of semantic label of each pixel. We base our investigation on a simple
framework with ridge rigression commonly across these tasks, and show results
which all support our insight. Such spatial information can be used for
computing correspondence of landmarks to a good accuracy, but should
potentially be useful for improving the training of the convolutional nets for
classification purposes
Articulated Pose Estimation by a Graphical Model with Image Dependent Pairwise Relations
We present a method for estimating articulated human pose from a single
static image based on a graphical model with novel pairwise relations that make
adaptive use of local image measurements. More precisely, we specify a
graphical model for human pose which exploits the fact the local image
measurements can be used both to detect parts (or joints) and also to predict
the spatial relationships between them (Image Dependent Pairwise Relations).
These spatial relationships are represented by a mixture model. We use Deep
Convolutional Neural Networks (DCNNs) to learn conditional probabilities for
the presence of parts and their spatial relationships within image patches.
Hence our model combines the representational flexibility of graphical models
with the efficiency and statistical power of DCNNs. Our method significantly
outperforms the state of the art methods on the LSP and FLIC datasets and also
performs very well on the Buffy dataset without any training.Comment: NIPS 2014 Camera Read
End-to-end weakly-supervised semantic alignment
We tackle the task of semantic alignment where the goal is to compute dense
semantic correspondence aligning two images depicting objects of the same
category. This is a challenging task due to large intra-class variation,
changes in viewpoint and background clutter. We present the following three
principal contributions. First, we develop a convolutional neural network
architecture for semantic alignment that is trainable in an end-to-end manner
from weak image-level supervision in the form of matching image pairs. The
outcome is that parameters are learnt from rich appearance variation present in
different but semantically related images without the need for tedious manual
annotation of correspondences at training time. Second, the main component of
this architecture is a differentiable soft inlier scoring module, inspired by
the RANSAC inlier scoring procedure, that computes the quality of the alignment
based on only geometrically consistent correspondences thereby reducing the
effect of background clutter. Third, we demonstrate that the proposed approach
achieves state-of-the-art performance on multiple standard benchmarks for
semantic alignment.Comment: In 2018 IEEE Conference on Computer Vision and Pattern Recognition
(CVPR 2018
Heterogeneous Multi-task Learning for Human Pose Estimation with Deep Convolutional Neural Network
We propose an heterogeneous multi-task learning framework for human pose
estimation from monocular image with deep convolutional neural network. In
particular, we simultaneously learn a pose-joint regressor and a sliding-window
body-part detector in a deep network architecture. We show that including the
body-part detection task helps to regularize the network, directing it to
converge to a good solution. We report competitive and state-of-art results on
several data sets. We also empirically show that the learned neurons in the
middle layer of our network are tuned to localized body parts
Multi-Person Pose Estimation with Local Joint-to-Person Associations
Despite of the recent success of neural networks for human pose estimation,
current approaches are limited to pose estimation of a single person and cannot
handle humans in groups or crowds. In this work, we propose a method that
estimates the poses of multiple persons in an image in which a person can be
occluded by another person or might be truncated. To this end, we consider
multi-person pose estimation as a joint-to-person association problem. We
construct a fully connected graph from a set of detected joint candidates in an
image and resolve the joint-to-person association and outlier detection using
integer linear programming. Since solving joint-to-person association jointly
for all persons in an image is an NP-hard problem and even approximations are
expensive, we solve the problem locally for each person. On the challenging
MPII Human Pose Dataset for multiple persons, our approach achieves the
accuracy of a state-of-the-art method, but it is 6,000 to 19,000 times faster.Comment: Accepted to European Conference on Computer Vision (ECCV) Workshops,
Crowd Understanding, 201
- …