237,825 research outputs found
Segmentation Driven Object Detection with Fisher Vectors
International audienceWe present an object detection system based on the Fisher vector (FV) image representation computed over SIFT and color descriptors. For computational and storage efficiency, we use a recent segmentation-based method to generate class-independent object detection hypotheses, in combination with data compression techniques. Our main contribution is a method to produce tentative object segmentation masks to suppress background clutter in the features. Re-weighting the local image features based on these masks is shown to improve object detection significantly. We also exploit contextual features in the form of a full-image FV descriptor, and an inter-category rescoring mechanism. Our experiments on the VOC 2007 and 2010 datasets show that our detector improves over the current state-of-the-art detection results
SemanticLoop: loop closure with 3D semantic graph matching
Loop closure can effectively correct the accumulated error in robot
localization, which plays a critical role in the long-term navigation of the
robot. Traditional appearance-based methods rely on local features and are
prone to failure in ambiguous environments. On the other hand, object
recognition can infer objects' category, pose, and extent. These objects can
serve as stable semantic landmarks for viewpoint-independent and non-ambiguous
loop closure. However, there is a critical object-level data association
problem due to the lack of efficient and robust algorithms.
We introduce a novel object-level data association algorithm, which
incorporates IoU, instance-level embedding, and detection uncertainty,
formulated as a linear assignment problem. Then, we model the objects as TSDF
volumes and represent the environment as a 3D graph with semantics and
topology. Next, we propose a graph matching-based loop detection based on the
reconstructed 3D semantic graphs and correct the accumulated error by aligning
the matched objects. Finally, we refine the object poses and camera trajectory
in an object-level pose graph optimization.
Experimental results show that the proposed object-level data association
method significantly outperforms the commonly used nearest-neighbor method in
accuracy. Our graph matching-based loop closure is more robust to environmental
appearance changes than existing appearance-based methods
Subspace discovery for video anomaly detection
PhDIn automated video surveillance anomaly detection is a challenging task. We address
this task as a novelty detection problem where pattern description is limited
and labelling information is available only for a small sample of normal instances.
Classification under these conditions is prone to over-fitting. The contribution of this
work is to propose a novel video abnormality detection method that does not need
object detection and tracking. The method is based on subspace learning to discover
a subspace where abnormality detection is easier to perform, without the need of
detailed annotation and description of these patterns. The problem is formulated as
one-class classification utilising a low dimensional subspace, where a novelty classifier
is used to learn normal actions automatically and then to detect abnormal actions
from low-level features extracted from a region of interest. The subspace is discovered
(using both labelled and unlabelled data) by a locality preserving graph-based algorithm
that utilises the Graph Laplacian of a specially designed parameter-less nearest
neighbour graph.
The methodology compares favourably with alternative subspace learning algorithms
(both linear and non-linear) and direct one-class classification schemes commonly
used for off-line abnormality detection in synthetic and real data. Based on
these findings, the framework is extended to on-line abnormality detection in video
sequences, utilising multiple independent detectors deployed over the image frame to
learn the local normal patterns and infer abnormality for the complete scene. The
method is compared with an alternative linear method to establish advantages and
limitations in on-line abnormality detection scenarios. Analysis shows that the alternative
approach is better suited for cases where the subspace learning is restricted on
the labelled samples, while in the presence of additional unlabelled data the proposed
approach using graph-based subspace learning is more appropriate
Grid Loss: Detecting Occluded Faces
Detection of partially occluded objects is a challenging computer vision
problem. Standard Convolutional Neural Network (CNN) detectors fail if parts of
the detection window are occluded, since not every sub-part of the window is
discriminative on its own. To address this issue, we propose a novel loss layer
for CNNs, named grid loss, which minimizes the error rate on sub-blocks of a
convolution layer independently rather than over the whole feature map. This
results in parts being more discriminative on their own, enabling the detector
to recover if the detection window is partially occluded. By mapping our loss
layer back to a regular fully connected layer, no additional computational cost
is incurred at runtime compared to standard CNNs. We demonstrate our method for
face detection on several public face detection benchmarks and show that our
method outperforms regular CNNs, is suitable for realtime applications and
achieves state-of-the-art performance.Comment: accepted to ECCV 201
Occlusion Coherence: Detecting and Localizing Occluded Faces
The presence of occluders significantly impacts object recognition accuracy.
However, occlusion is typically treated as an unstructured source of noise and
explicit models for occluders have lagged behind those for object appearance
and shape. In this paper we describe a hierarchical deformable part model for
face detection and landmark localization that explicitly models part occlusion.
The proposed model structure makes it possible to augment positive training
data with large numbers of synthetically occluded instances. This allows us to
easily incorporate the statistics of occlusion patterns in a discriminatively
trained model. We test the model on several benchmarks for landmark
localization and detection including challenging new data sets featuring
significant occlusion. We find that the addition of an explicit occlusion model
yields a detection system that outperforms existing approaches for occluded
instances while maintaining competitive accuracy in detection and landmark
localization for unoccluded instances
Backtracking Spatial Pyramid Pooling (SPP)-based Image Classifier for Weakly Supervised Top-down Salient Object Detection
Top-down saliency models produce a probability map that peaks at target
locations specified by a task/goal such as object detection. They are usually
trained in a fully supervised setting involving pixel-level annotations of
objects. We propose a weakly supervised top-down saliency framework using only
binary labels that indicate the presence/absence of an object in an image.
First, the probabilistic contribution of each image region to the confidence of
a CNN-based image classifier is computed through a backtracking strategy to
produce top-down saliency. From a set of saliency maps of an image produced by
fast bottom-up saliency approaches, we select the best saliency map suitable
for the top-down task. The selected bottom-up saliency map is combined with the
top-down saliency map. Features having high combined saliency are used to train
a linear SVM classifier to estimate feature saliency. This is integrated with
combined saliency and further refined through a multi-scale
superpixel-averaging of saliency map. We evaluate the performance of the
proposed weakly supervised topdown saliency and achieve comparable performance
with fully supervised approaches. Experiments are carried out on seven
challenging datasets and quantitative results are compared with 40 closely
related approaches across 4 different applications.Comment: 14 pages, 7 figure
- …