5,289 research outputs found
Learning non-maximum suppression
Object detectors have hugely profited from moving towards an end-to-end
learning paradigm: proposals, features, and the classifier becoming one neural
network improved results two-fold on general object detection. One
indispensable component is non-maximum suppression (NMS), a post-processing
algorithm responsible for merging all detections that belong to the same
object. The de facto standard NMS algorithm is still fully hand-crafted,
suspiciously simple, and -- being based on greedy clustering with a fixed
distance threshold -- forces a trade-off between recall and precision. We
propose a new network architecture designed to perform NMS, using only boxes
and their score. We report experiments for person detection on PETS and for
general object categories on the COCO dataset. Our approach shows promise
providing improved localization and occlusion handling.Comment: Added "Supplementary material" titl
Unconstrained salient object detection via proposal subset optimization
We aim at detecting salient objects in unconstrained images. In unconstrained images, the number of salient objects (if any) varies from image to image, and is not given. We present a salient object detection system that directly outputs a compact set of detection windows, if any, for an input image. Our system leverages a Convolutional-Neural-Network model to generate location proposals of salient objects. Location proposals tend to be highly overlapping and noisy. Based on the Maximum a Posteriori principle, we propose a novel subset optimization framework to generate a compact set of detection windows out of noisy proposals. In experiments, we show that our subset optimization formulation greatly enhances the performance of our system, and our system attains 16-34% relative improvement in Average Precision compared with the state-of-the-art on three challenging salient object datasets.http://openaccess.thecvf.com/content_cvpr_2016/html/Zhang_Unconstrained_Salient_Object_CVPR_2016_paper.htmlPublished versio
ViP-CNN: Visual Phrase Guided Convolutional Neural Network
As the intermediate level task connecting image captioning and object
detection, visual relationship detection started to catch researchers'
attention because of its descriptive power and clear structure. It detects the
objects and captures their pair-wise interactions with a
subject-predicate-object triplet, e.g. person-ride-horse. In this paper, each
visual relationship is considered as a phrase with three components. We
formulate the visual relationship detection as three inter-connected
recognition problems and propose a Visual Phrase guided Convolutional Neural
Network (ViP-CNN) to address them simultaneously. In ViP-CNN, we present a
Phrase-guided Message Passing Structure (PMPS) to establish the connection
among relationship components and help the model consider the three problems
jointly. Corresponding non-maximum suppression method and model training
strategy are also proposed. Experimental results show that our ViP-CNN
outperforms the state-of-art method both in speed and accuracy. We further
pretrain ViP-CNN on our cleansed Visual Genome Relationship dataset, which is
found to perform better than the pretraining on the ImageNet for this task.Comment: 10 pages, 5 figures, accepted by CVPR 201
Confluence: A Robust Non-IoU Alternative to Non-Maxima Suppression in Object Detection
Confluence is a novel non-Intersection over Union (IoU) alternative to
Non-Maxima Suppression (NMS) in bounding box post-processing in object
detection. It overcomes the inherent limitations of IoU-based NMS variants to
provide a more stable, consistent predictor of bounding box clustering by using
a normalized Manhattan Distance inspired proximity metric to represent bounding
box clustering. Unlike Greedy and Soft NMS, it does not rely solely on
classification confidence scores to select optimal bounding boxes, instead
selecting the box which is closest to every other box within a given cluster
and removing highly confluent neighboring boxes. Confluence is experimentally
validated on the MS COCO and CrowdHuman benchmarks, improving Average Precision
by up to 2.3-3.8% and Average Recall by up to 5.3-7.2% when compared against
de-facto standard and state of the art NMS variants. Quantitative results are
supported by extensive qualitative analysis and threshold sensitivity analysis
experiments support the conclusion that Confluence is more robust than NMS
variants. Confluence represents a paradigm shift in bounding box processing,
with potential to replace IoU in bounding box regression processes.Comment: 13 page
- …