225 research outputs found
Co-projection-plane based 3-D padding for polyhedron projection for 360-degree video
The polyhedron projection for 360-degree video is becoming more and more
popular since it can lead to much less geometry distortion compared with the
equirectangular projection. However, in the polyhedron projection, we can
observe very obvious texture discontinuity in the area near the face boundary.
Such a texture discontinuity may lead to serious quality degradation when
motion compensation crosses the discontinuous face boundary. To solve this
problem, in this paper, we first propose to fill the corresponding neighboring
faces in the suitable positions as the extension of the current face to keep
approximated texture continuity. Then a co-projection-plane based 3-D padding
method is proposed to project the reference pixels in the neighboring face to
the current face to guarantee exact texture continuity. Under the proposed
scheme, the reference pixel is always projected to the same plane with the
current pixel when performing motion compensation so that the texture
discontinuity problem can be solved. The proposed scheme is implemented in the
reference software of High Efficiency Video Coding. Compared with the existing
method, the proposed algorithm can significantly improve the rate-distortion
performance. The experimental results obviously demonstrate that the texture
discontinuity in the face boundary can be well handled by the proposed
algorithm.Comment: 6 pages, 9 figure
Spatial and Temporal Mutual Promotion for Video-based Person Re-identification
Video-based person re-identification is a crucial task of matching video
sequences of a person across multiple camera views. Generally, features
directly extracted from a single frame suffer from occlusion, blur,
illumination and posture changes. This leads to false activation or missing
activation in some regions, which corrupts the appearance and motion
representation. How to explore the abundant spatial-temporal information in
video sequences is the key to solve this problem. To this end, we propose a
Refining Recurrent Unit (RRU) that recovers the missing parts and suppresses
noisy parts of the current frame's features by referring historical frames.
With RRU, the quality of each frame's appearance representation is improved.
Then we use the Spatial-Temporal clues Integration Module (STIM) to mine the
spatial-temporal information from those upgraded features. Meanwhile, the
multi-level training objective is used to enhance the capability of RRU and
STIM. Through the cooperation of those modules, the spatial and temporal
features mutually promote each other and the final spatial-temporal feature
representation is more discriminative and robust. Extensive experiments are
conducted on three challenging datasets, i.e., iLIDS-VID, PRID-2011 and MARS.
The experimental results demonstrate that our approach outperforms existing
state-of-the-art methods of video-based person re-identification on iLIDS-VID
and MARS and achieves favorable results on PRID-2011.Comment: Accepted by AAAI19 as spotligh
Feature Selective Networks for Object Detection
Objects for detection usually have distinct characteristics in different
sub-regions and different aspect ratios. However, in prevalent two-stage object
detection methods, Region-of-Interest (RoI) features are extracted by RoI
pooling with little emphasis on these translation-variant feature components.
We present feature selective networks to reform the feature representations of
RoIs by exploiting their disparities among sub-regions and aspect ratios. Our
network produces the sub-region attention bank and aspect ratio attention bank
for the whole image. The RoI-based sub-region attention map and aspect ratio
attention map are selectively pooled from the banks, and then used to refine
the original RoI features for RoI classification. Equipped with a light-weight
detection subnetwork, our network gets a consistent boost in detection
performance based on general ConvNet backbones (ResNet-101, GoogLeNet and
VGG-16). Without bells and whistles, our detectors equipped with ResNet-101
achieve more than 3% mAP improvement compared to counterparts on PASCAL VOC
2007, PASCAL VOC 2012 and MS COCO datasets
- …