66 research outputs found
Hierarchical Feature Alignment Network for Unsupervised Video Object Segmentation
Optical flow is an easily conceived and precious cue for advancing
unsupervised video object segmentation (UVOS). Most of the previous methods
directly extract and fuse the motion and appearance features for segmenting
target objects in the UVOS setting. However, optical flow is intrinsically an
instantaneous velocity of all pixels among consecutive frames, thus making the
motion features not aligned well with the primary objects among the
corresponding frames. To solve the above challenge, we propose a concise,
practical, and efficient architecture for appearance and motion feature
alignment, dubbed hierarchical feature alignment network (HFAN). Specifically,
the key merits in HFAN are the sequential Feature AlignMent (FAM) module and
the Feature AdaptaTion (FAT) module, which are leveraged for processing the
appearance and motion features hierarchically. FAM is capable of aligning both
appearance and motion features with the primary object semantic
representations, respectively. Further, FAT is explicitly designed for the
adaptive fusion of appearance and motion features to achieve a desirable
trade-off between cross-modal features. Extensive experiments demonstrate the
effectiveness of the proposed HFAN, which reaches a new state-of-the-art
performance on DAVIS-16, achieving 88.7 Mean, i.e.,
a relative improvement of 3.5% over the best published result.Comment: Accepted by ECCV-202
Deep Learning for Person Reidentification Using Support Vector Machines
© 2017 Mengyu Xu et al. Due to the variations of viewpoint, pose, and illumination, a given individual may appear considerably different across different camera views. Tracking individuals across camera networks with no overlapping fields is still a challenging problem. Previous works mainly focus on feature representation and metric learning individually which tend to have a suboptimal solution. To address this issue, in this work, we propose a novel framework to do the feature representation learning and metric learning jointly. Different from previous works, we represent the pairs of pedestrian images as new resized input and use linear Support Vector Machine to replace softmax activation function for similarity learning. Particularly, dropout and data augmentation techniques are also employed in this model to prevent the network from overfitting. Extensive experiments on two publically available datasets VIPeR and CUHK01 demonstrate the effectiveness of our proposed approach
- …