313,530 research outputs found
Learning Detection with Diverse Proposals
To predict a set of diverse and informative proposals with enriched
representations, this paper introduces a differentiable Determinantal Point
Process (DPP) layer that is able to augment the object detection architectures.
Most modern object detection architectures, such as Faster R-CNN, learn to
localize objects by minimizing deviations from the ground-truth but ignore
correlation between multiple proposals and object categories. Non-Maximum
Suppression (NMS) as a widely used proposal pruning scheme ignores label- and
instance-level relations between object candidates resulting in multi-labeled
detections. In the multi-class case, NMS selects boxes with the largest
prediction scores ignoring the semantic relation between categories of
potential election. In contrast, our trainable DPP layer, allowing for Learning
Detection with Diverse Proposals (LDDP), considers both label-level contextual
information and spatial layout relationships between proposals without
increasing the number of parameters of the network, and thus improves location
and category specifications of final detected bounding boxes substantially
during both training and inference schemes. Furthermore, we show that LDDP
keeps it superiority over Faster R-CNN even if the number of proposals
generated by LDPP is only ~30% as many as those for Faster R-CNN.Comment: Accepted to CVPR 201
Object Detection based on Region Decomposition and Assembly
Region-based object detection infers object regions for one or more
categories in an image. Due to the recent advances in deep learning and region
proposal methods, object detectors based on convolutional neural networks
(CNNs) have been flourishing and provided the promising detection results.
However, the detection accuracy is degraded often because of the low
discriminability of object CNN features caused by occlusions and inaccurate
region proposals. In this paper, we therefore propose a region decomposition
and assembly detector (R-DAD) for more accurate object detection.
In the proposed R-DAD, we first decompose an object region into multiple
small regions. To capture an entire appearance and part details of the object
jointly, we extract CNN features within the whole object region and decomposed
regions. We then learn the semantic relations between the object and its parts
by combining the multi-region features stage by stage with region assembly
blocks, and use the combined and high-level semantic features for the object
classification and localization. In addition, for more accurate region
proposals, we propose a multi-scale proposal layer that can generate object
proposals of various scales. We integrate the R-DAD into several feature
extractors, and prove the distinct performance improvement on PASCAL07/12 and
MSCOCO18 compared to the recent convolutional detectors.Comment: Accepted to 2019 AAAI Conference on Artificial Intelligence (AAAI
DeepBox: Learning Objectness with Convolutional Networks
Existing object proposal approaches use primarily bottom-up cues to rank
proposals, while we believe that objectness is in fact a high level construct.
We argue for a data-driven, semantic approach for ranking object proposals. Our
framework, which we call DeepBox, uses convolutional neural networks (CNNs) to
rerank proposals from a bottom-up method. We use a novel four-layer CNN
architecture that is as good as much larger networks on the task of evaluating
objectness while being much faster. We show that DeepBox significantly improves
over the bottom-up ranking, achieving the same recall with 500 proposals as
achieved by bottom-up methods with 2000. This improvement generalizes to
categories the CNN has never seen before and leads to a 4.5-point gain in
detection mAP. Our implementation achieves this performance while running at
260 ms per image.Comment: ICCV 2015 Camera-ready versio
Video Saliency Detection Using Object Proposals
In this paper, we introduce a novel approach to identify salient object regions in videos via object proposals. The core idea is to solve the saliency detection problem by ranking and selecting the salient proposals based on object-level saliency cues. Object proposals offer a more complete and high-level representation, which naturally caters to the needs of salient object detection. As well as introducing this novel solution for video salient object detection, we reorganize various discriminative saliency cues and traditional saliency assumptions on object proposals. With object candidates, a proposal ranking and voting scheme, based on various object-level saliency cues, is designed to screen out nonsalient parts, select salient object regions, and to infer an initial saliency estimate. Then a saliency optimization process that considers temporal consistency and appearance differences between salient and nonsalient regions is used to refine the initial saliency estimates. Our experiments on public datasets (SegTrackV2, Freiburg-Berkeley Motion Segmentation Dataset, and Densely Annotated Video Segmentation) validate the effectiveness, and the proposed method produces significant improvements over state-of-the-art algorithms
Reversible Recursive Instance-level Object Segmentation
In this work, we propose a novel Reversible Recursive Instance-level Object
Segmentation (R2-IOS) framework to address the challenging instance-level
object segmentation task. R2-IOS consists of a reversible proposal refinement
sub-network that predicts bounding box offsets for refining the object proposal
locations, and an instance-level segmentation sub-network that generates the
foreground mask of the dominant object instance in each proposal. By being
recursive, R2-IOS iteratively optimizes the two sub-networks during joint
training, in which the refined object proposals and improved segmentation
predictions are alternately fed into each other to progressively increase the
network capabilities. By being reversible, the proposal refinement sub-network
adaptively determines an optimal number of refinement iterations required for
each proposal during both training and testing. Furthermore, to handle multiple
overlapped instances within a proposal, an instance-aware denoising autoencoder
is introduced into the segmentation sub-network to distinguish the dominant
object from other distracting instances. Extensive experiments on the
challenging PASCAL VOC 2012 benchmark well demonstrate the superiority of
R2-IOS over other state-of-the-art methods. In particular, the
over classes at IoU achieves , which significantly
outperforms the results of by PFN~\cite{PFN} and
by~\cite{liu2015multi}.Comment: 9 page
- …