109 research outputs found

    Camera Style Adaptation for Person Re-identification

    Get PDF
    CVPR(IEEE Conference on Computer Vision and Pattern Recognition)即“国际计算机视觉与模式识别会议”,是由IEEE举办的计算机视觉领域三大顶级国际会议之一,被中国计算机学会(CCF)推荐为计算机学科领域A类国际会议。与其他理工科学科不同,在全国学科评估中,唯有“计算机科学与技术”一级学科将CCF推荐的A类国际会议计入成果评估。CVPR有着严苛的录用标准,论文录用率一般在20%左右。2018年总的投稿量达4000多篇,最终录取了900多篇,录取率不到23%。 智能科学系2015届博士研究生钟准作为第一作者,导师李绍滋教授作为通讯作者,发表题为“Camera Style Adaptation for Person Re-identification”的论文。在多摄像机检索任务中,身份重识别受到由不同摄像机导致的不同风格的图像干扰。之前的解决方法通过隐式地学习一个摄像机无关的描述子空间。该论文显式地引入摄像机风格适应方法。该方法可以看成是一种数据扩充。有标签的训练样本的风格可以被转换到不同摄像机的风格,并和原来的样本形成扩充后的训练集。通过这个方法不但增加了数据集的差异性,也加入了一定的噪声。为了减少噪声,在学习过程加入样本平滑正则化。因为过度拟合, 原始的样本平滑正则化只能在很少的摄像机系统里取得好结果。实验结果表明, 该论文提出的新方法在加入了样本平滑正则化后在所有摄像机系统里都取得了一致的性能改进, 性能明显优于现有的其它方法。【Abstract】Being a cross-camera retrieval task, person reidentification suffers from image style variations caused by different cameras. The art implicitly addresses this problem by learning a camera-invariant descriptor subspace. In this paper, we explicitly consider this challenge by introducing camera style (CamStyle) adaptation. CamStyle can serve as a data augmentation approach that smooths the camera style disparities. Specifically, with CycleGAN, labeled training images can be style-transferred to each camera, and, along with the original training samples, form the augmented training set. This method, while increasing data diversity against over-fitting, also incurs a considerable level of noise. In the effort to alleviate the impact of noise, the label smooth regularization (LSR) is adopted. The vanilla version of our method (without LSR) performs reasonably well on few-camera systems in which over-fitting often occurs. With LSR, we demonstrate consistent improvement in all systems regardless of the extent of over-fitting. We also report competitive accuracy compared with the state of the art.This work is supported by the National Nature Science Foundation of China (No. 61572409, No. U1705286 & No. 61571188),Fujian Province 2011 Collaborative Innovation Center of TCM Health Management, Collaborative Innovation Center of Chinese Oolong Tea Industry-Collaborative Innovation Center (2011) of Fujian Province, Fund for Integration of Cloud Computing and Big Data, Innovation of Science and Education, the Data to Decisions CRC (D2D CRC) and the Cooperative Research Centres Programme.Yi Yang is the recipient of a Google Faculty Research Award. Liang Zheng is the recipient of a SIEF STEM+ Bussiness fellowship. We thank Wenjing Li for encouragement

    Camera Style Adaptation for Person Re-identification

    Full text link
    © 2018 IEEE. Being a cross-camera retrieval task, person re-identification suffers from image style variations caused by different cameras. The art implicitly addresses this problem by learning a camera-invariant descriptor subspace. In this paper, we explicitly consider this challenge by introducing camera style (CamStyle) adaptation. CamStyle can serve as a data augmentation approach that smooths the camera style disparities. Specifically, with CycleGAN, labeled training images can be style-transferred to each camera, and, along with the original training samples, form the augmented training set. This method, while increasing data diversity against over-fitting, also incurs a considerable level of noise. In the effort to alleviate the impact of noise, the label smooth regularization (LSR) is adopted. The vanilla version of our method (without LSR) performs reasonably well on few-camera systems in which over-fitting often occurs. With LSR, we demonstrate consistent improvement in all systems regardless of the extent of over-fitting. We also report competitive accuracy compared with the state of the art. Code is available at: Https://github.com/zhunzhong07/CamStyle

    Backbone Can Not be Trained at Once: Rolling Back to Pre-trained Network for Person Re-Identification

    Full text link
    In person re-identification (ReID) task, because of its shortage of trainable dataset, it is common to utilize fine-tuning method using a classification network pre-trained on a large dataset. However, it is relatively difficult to sufficiently fine-tune the low-level layers of the network due to the gradient vanishing problem. In this work, we propose a novel fine-tuning strategy that allows low-level layers to be sufficiently trained by rolling back the weights of high-level layers to their initial pre-trained weights. Our strategy alleviates the problem of gradient vanishing in low-level layers and robustly trains the low-level layers to fit the ReID dataset, thereby increasing the performance of ReID tasks. The improved performance of the proposed strategy is validated via several experiments. Furthermore, without any add-ons such as pose estimation or segmentation, our strategy exhibits state-of-the-art performance using only vanilla deep convolutional neural network architecture.Comment: Accepted to AAAI 201

    Spatial and Temporal Mutual Promotion for Video-based Person Re-identification

    Full text link
    Video-based person re-identification is a crucial task of matching video sequences of a person across multiple camera views. Generally, features directly extracted from a single frame suffer from occlusion, blur, illumination and posture changes. This leads to false activation or missing activation in some regions, which corrupts the appearance and motion representation. How to explore the abundant spatial-temporal information in video sequences is the key to solve this problem. To this end, we propose a Refining Recurrent Unit (RRU) that recovers the missing parts and suppresses noisy parts of the current frame's features by referring historical frames. With RRU, the quality of each frame's appearance representation is improved. Then we use the Spatial-Temporal clues Integration Module (STIM) to mine the spatial-temporal information from those upgraded features. Meanwhile, the multi-level training objective is used to enhance the capability of RRU and STIM. Through the cooperation of those modules, the spatial and temporal features mutually promote each other and the final spatial-temporal feature representation is more discriminative and robust. Extensive experiments are conducted on three challenging datasets, i.e., iLIDS-VID, PRID-2011 and MARS. The experimental results demonstrate that our approach outperforms existing state-of-the-art methods of video-based person re-identification on iLIDS-VID and MARS and achieves favorable results on PRID-2011.Comment: Accepted by AAAI19 as spotligh
    corecore