881 research outputs found
Multi-scale 3D Convolution Network for Video Based Person Re-Identification
This paper proposes a two-stream convolution network to extract spatial and
temporal cues for video based person Re-Identification (ReID). A temporal
stream in this network is constructed by inserting several Multi-scale 3D (M3D)
convolution layers into a 2D CNN network. The resulting M3D convolution network
introduces a fraction of parameters into the 2D CNN, but gains the ability of
multi-scale temporal feature learning. With this compact architecture, M3D
convolution network is also more efficient and easier to optimize than existing
3D convolution networks. The temporal stream further involves Residual
Attention Layers (RAL) to refine the temporal features. By jointly learning
spatial-temporal attention masks in a residual manner, RAL identifies the
discriminative spatial regions and temporal cues. The other stream in our
network is implemented with a 2D CNN for spatial feature extraction. The
spatial and temporal features from two streams are finally fused for the video
based person ReID. Evaluations on three widely used benchmarks datasets, i.e.,
MARS, PRID2011, and iLIDS-VID demonstrate the substantial advantages of our
method over existing 3D convolution networks and state-of-art methods.Comment: AAAI, 201
Adaptive multi-channel MAC protocol for dense VANET with directional antennas
Directional antennas in Ad hoc networks offer more benefits than the traditional antennas with omni-directional mode. With directional antennas, it can increase the spatial reuse of the wireless channel. A higher gain of directional antennas makes terminals a further transmission range and fewer hops to the destination. This paper presents the design, implementation and simulation results of a multi-channel Medium Access Control (MAC) protocols for dense Vehicular Ad hoc Networks using directional antennas with local beam tables. Numeric results show that our protocol performs better than the existing multichannel protocols in vehicular environment
Incorporating Intra-Class Variance to Fine-Grained Visual Recognition
Fine-grained visual recognition aims to capture discriminative
characteristics amongst visually similar categories. The state-of-the-art
research work has significantly improved the fine-grained recognition
performance by deep metric learning using triplet network. However, the impact
of intra-category variance on the performance of recognition and robust feature
representation has not been well studied. In this paper, we propose to leverage
intra-class variance in metric learning of triplet network to improve the
performance of fine-grained recognition. Through partitioning training images
within each category into a few groups, we form the triplet samples across
different categories as well as different groups, which is called Group
Sensitive TRiplet Sampling (GS-TRS). Accordingly, the triplet loss function is
strengthened by incorporating intra-class variance with GS-TRS, which may
contribute to the optimization objective of triplet network. Extensive
experiments over benchmark datasets CompCar and VehicleID show that the
proposed GS-TRS has significantly outperformed state-of-the-art approaches in
both classification and retrieval tasks.Comment: 6 pages, 5 figure
Improving Object Detection with Region Similarity Learning
Object detection aims to identify instances of semantic objects of a certain
class in images or videos. The success of state-of-the-art approaches is
attributed to the significant progress of object proposal and convolutional
neural networks (CNNs). Most promising detectors involve multi-task learning
with an optimization objective of softmax loss and regression loss. The first
is for multi-class categorization, while the latter is for improving
localization accuracy. However, few of them attempt to further investigate the
hardness of distinguishing different sorts of distracting background regions
(i.e., negatives) from true object regions (i.e., positives). To improve the
performance of classifying positive object regions vs. a variety of negative
background regions, we propose to incorporate triplet embedding into learning
objective. The triplet units are formed by assigning each negative region to a
meaningful object class and establishing class- specific negatives, followed by
triplets construction. Over the benchmark PASCAL VOC 2007, the proposed triplet
em- bedding has improved the performance of well-known FastRCNN model with a
mAP gain of 2.1%. In particular, the state-of-the-art approach OHEM can benefit
from the triplet embedding and has achieved a mAP improvement of 1.2%.Comment: 6 pages, 5 figure
- …