5,481 research outputs found

    Joint Learning of Body and Part Representation for Person Re-Identification

    Full text link
    Β© 2013 IEEE. Person re-identification (ReID), aiming to identify people among multiple camera views, has attracted an increasing attention due to the potential of application in surveillance security. Large variations in subjects' postures, view angles, and illuminating conditions as well as non-ideal human detection significantly increase the difficulty of person ReID. Learning a robust metric for measuring the similarity between different person images is another under-addressed problem. In this paper, following the recent success of part-based models, in order to generate a discriminative and robust feature representation, we first propose to learn global and weighted local body-part features from pedestrian images. Then, in the training phase, angular loss and part-level classification loss are employed jointly as a similarity measure to train the network, which significantly improves the robustness of the resultant network against feature variance. Experimental results on several benchmark data sets demonstrate that our method outperforms the state-of-the-art methods

    Energy Confused Adversarial Metric Learning for Zero-Shot Image Retrieval and Clustering

    Full text link
    Deep metric learning has been widely applied in many computer vision tasks, and recently, it is more attractive in \emph{zero-shot image retrieval and clustering}(ZSRC) where a good embedding is requested such that the unseen classes can be distinguished well. Most existing works deem this 'good' embedding just to be the discriminative one and thus race to devise powerful metric objectives or hard-sample mining strategies for leaning discriminative embedding. However, in this paper, we first emphasize that the generalization ability is a core ingredient of this 'good' embedding as well and largely affects the metric performance in zero-shot settings as a matter of fact. Then, we propose the Energy Confused Adversarial Metric Learning(ECAML) framework to explicitly optimize a robust metric. It is mainly achieved by introducing an interesting Energy Confusion regularization term, which daringly breaks away from the traditional metric learning idea of discriminative objective devising, and seeks to 'confuse' the learned model so as to encourage its generalization ability by reducing overfitting on the seen classes. We train this confusion term together with the conventional metric objective in an adversarial manner. Although it seems weird to 'confuse' the network, we show that our ECAML indeed serves as an efficient regularization technique for metric learning and is applicable to various conventional metric methods. This paper empirically and experimentally demonstrates the importance of learning embedding with good generalization, achieving state-of-the-art performances on the popular CUB, CARS, Stanford Online Products and In-Shop datasets for ZSRC tasks. \textcolor[rgb]{1, 0, 0}{Code available at http://www.bhchen.cn/}.Comment: AAAI 2019, Spotligh

    μ˜μƒ 기반 동일인 νŒλ³„μ„ μœ„ν•œ λΆ€λΆ„ μ •ν•© ν•™μŠ΅

    Get PDF
    ν•™μœ„λ…Όλ¬Έ (박사)-- μ„œμšΈλŒ€ν•™κ΅ λŒ€ν•™μ› : κ³΅κ³ΌλŒ€ν•™ 전기·정보곡학뢀, 2019. 2. 이경무.Person re-identification is a problem of identifying the same individuals among the persons captured from different cameras. It is a challenging problem because the same person captured from non-overlapping cameras usually shows dramatic appearance change due to the viewpoint, pose, and illumination changes. Since it is an essential tool for many surveillance applications, various research directions have been exploredhowever, it is far from being solved. The goal of this thesis is to solve person re-identification problem under the surveillance system. In particular, we focus on two critical components: designing 1) a better image representation model using human poses and 2) a better training method using hard sample mining. First, we propose a part-aligned representation model which represents an image as the bilinear pooling between appearance and part maps. Since the image similarity is independently calculated from the locations of body parts, it addresses the body part misalignment issue and effectively distinguishes different people by discriminating fine-grained local differences. Second, we propose a stochastic hard sample mining method that exploits class information to generate diverse and hard examples to use for training. It efficiently explores the training samples while avoiding stuck in a small subset of hard samples, thereby effectively training the model. Finally, we propose an integrated system that combines the two approaches, which is benefited from both components. Experimental results show that the proposed method works robustly on five datasets with diverse conditions and its potential extension to the more general conditions.동일인 νŒλ³„λ¬Έμ œλŠ” λ‹€λ₯Έ μΉ΄λ©”λΌλ‘œ 촬영된 각각의 μ˜μƒμ— 찍힌 두 μ‚¬λžŒμ΄ 같은 μ‚¬λžŒμΈμ§€ μ—¬λΆ€λ₯Ό νŒλ‹¨ν•˜λŠ” λ¬Έμ œμ΄λ‹€. μ΄λŠ” κ°μ‹œμΉ΄λ©”λΌμ™€ λ³΄μ•ˆμ— κ΄€λ ¨λœ λ‹€μ–‘ν•œ μ‘μš© λΆ„μ•Όμ—μ„œ μ€‘μš”ν•œ λ„κ΅¬λ‘œ ν™œμš©λ˜κΈ° λ•Œλ¬Έμ— μ΅œκ·ΌκΉŒμ§€ λ§Žμ€ 연ꡬ가 이루어지고 μžˆλ‹€. κ·ΈλŸ¬λ‚˜ 같은 μ‚¬λžŒμ΄λ”λΌλ„ μ‹œκ°„, μž₯μ†Œ, 촬영 각도, μ‘°λͺ… μƒνƒœκ°€ λ‹€λ₯Έ ν™˜κ²½μ—μ„œ 찍히면 μ˜μƒλ§ˆλ‹€ λ³΄μ΄λŠ” λͺ¨μŠ΅μ΄ λ‹¬λΌμ§€λ―€λ‘œ νŒλ³„μ„ μžλ™ν™”ν•˜κΈ° μ–΄λ ΅λ‹€λŠ” λ¬Έμ œκ°€ μžˆλ‹€. λ³Έ λ…Όλ¬Έμ—μ„œλŠ” 주둜 κ°μ‹œμΉ΄λ©”λΌ μ˜μƒμ— λŒ€ν•΄μ„œ, 각 μ˜μƒμ—μ„œ μžλ™μœΌλ‘œ μ‚¬λžŒμ„ κ²€μΆœν•œ 후에 κ²€μΆœν•œ 결과듀이 μ„œλ‘œ 같은 μ‚¬λžŒμΈμ§€ μ—¬λΆ€λ₯Ό νŒλ‹¨ν•˜λŠ” 문제λ₯Ό ν’€κ³ μž ν•œλ‹€. 이λ₯Ό μœ„ν•΄ 1) μ–΄λ–€ λͺ¨λΈμ΄ μ˜μƒμ„ 잘 ν‘œν˜„ν• κ²ƒμΈμ§€ 2) 주어진 λͺ¨λΈμ„ μ–΄λ–»κ²Œ 잘 ν•™μŠ΅μ‹œν‚¬μˆ˜ μžˆμ„μ§€ 두 가지 μ§ˆλ¬Έμ— λŒ€ν•΄μ„œ μ—°κ΅¬ν•œλ‹€. λ¨Όμ € 벑터 곡간 μƒμ—μ„œμ˜ 거리가 이미지 μƒμ—μ„œ λŒ€μ‘λ˜λŠ” νŒŒνŠΈλ“€ μ‚¬μ΄μ˜ μƒκΉ€μƒˆ 차이의 ν•©κ³Ό 같아지도둝 ν•˜λŠ” 맀핑 ν•¨μˆ˜λ₯Ό μ„€κ³„ν•¨μœΌλ‘œμ¨ κ²€μΆœλœ μ‚¬λžŒλ“€ 사이에 신체 λΆ€λΆ„λ³„λ‘œ μƒκΉ€μƒˆλ₯Ό 비ꡐλ₯Ό 톡해 효과적인 νŒλ³„μ„ κ°€λŠ₯ν•˜κ²Œ ν•˜λŠ” λͺ¨λΈμ„ μ œμ•ˆν•œλ‹€. λ‘λ²ˆμ§Έλ‘œ ν•™μŠ΅ κ³Όμ •μ—μ„œ 클래슀 정보λ₯Ό ν™œμš©ν•΄μ„œ 적은 κ³„μ‚°λŸ‰μœΌλ‘œ μ–΄λ €μš΄ μ˜ˆμ‹œλ₯Ό 많이 보도둝 ν•¨μœΌλ‘œμ¨ 효과적으둜 ν•¨μˆ˜μ˜ νŒŒλΌλ―Έν„°λ₯Ό ν•™μŠ΅ν•˜λŠ” 방법을 μ œμ•ˆν•œλ‹€. μ΅œμ’…μ μœΌλ‘œλŠ” 두 μš”μ†Œλ₯Ό κ²°ν•©ν•΄μ„œ μƒˆλ‘œμš΄ 동일인 νŒλ³„ μ‹œμŠ€ν…œμ„ μ œμ•ˆν•˜κ³ μž ν•œλ‹€. λ³Έ λ…Όλ¬Έμ—μ„œλŠ” μ‹€ν—˜κ²°κ³Όλ₯Ό 톡해 μ œμ•ˆν•˜λŠ” 방법이 λ‹€μ–‘ν•œ ν™˜κ²½μ—μ„œ κ°•μΈν•˜κ³  효과적으둜 λ™μž‘ν•¨μ„ 증λͺ…ν•˜μ˜€κ³  보닀 일반적인 ν™˜κ²½μœΌλ‘œμ˜ ν™•μž₯ κ°€λŠ₯성도 확인 ν•  수 μžˆμ„ 것이닀.Abstract i Contents ii List of Tables v List of Figures vii 1. Introduction 1 1.1 Part-Aligned Bilinear Representations . . . . . . . . . . . . . . . . . 3 1.2 Stochastic Class-Based Hard Sample Mining . . . . . . . . . . . . . 4 1.3 Integrated System for Person Re-identification . . . . . . . . . . . . . 5 2. Part-Aligned Bilinear Represenatations 6 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.3 Our Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.3.1 Two-Stream Network . . . . . . . . . . . . . . . . . . . . . . 10 2.3.2 Bilinear Pooling . . . . . . . . . . . . . . . . . . . . . . . . 12 2.3.3 Loss . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.4 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.4.1 Part-Aware Image Similarity . . . . . . . . . . . . . . . . . . 13 2.4.2 Relationship to the Baseline Models . . . . . . . . . . . . . . 15 2.4.3 Decomposition of Appearance and Part Maps . . . . . . . . . 15 2.4.4 Part-Alignment Effects on Reducing Misalignment Issue . . . 19 2.5 Implementation Details . . . . . . . . . . . . . . . . . . . . . . . . . 21 2.6 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 2.6.1 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 2.6.2 Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . . 23 2.6.3 Comparison with the Baselines . . . . . . . . . . . . . . . . . 24 2.6.4 Comparison with State-of-the-Art Methods . . . . . . . . . . 25 2.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 3. Stochastic Class-Based Hard Sample Mining 35 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 3.2 Related Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 3.3 Deep Metric Learning with Triplet Loss . . . . . . . . . . . . . . . . 40 3.3.1 Triplet Loss . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 3.3.2 Efficient Learning with Triplet Loss . . . . . . . . . . . . . . 41 3.4 Batch Construction for Metric Learning . . . . . . . . . . . . . . . . 42 3.4.1 Neighbor Class Mining by Class Signatures . . . . . . . . . . 42 3.4.2 Batch Construction . . . . . . . . . . . . . . . . . . . . . . . 44 3.4.3 Scalable Extension to the Number of Classes . . . . . . . . . 50 3.5 Loss . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 3.6 Feature Extractor . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 3.7 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 3.7.1 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 3.7.2 Implementation Details . . . . . . . . . . . . . . . . . . . . . 55 3.7.3 Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . . 56 3.7.4 Effect of the Stochastic Hard Example Mining . . . . . . . . 59 3.7.5 Comparison with the Existing Methods on Image Retrieval Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 3.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .70 4. Integrated System for Person Re-identification 71 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 4.2 Hard Positive Mining . . . . . . . . . . . . . . . . . . . . . . . . . . 72 4.3 Integrated System for Person Re-identification . . . . . . . . . . . . . 75 4.4 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 4.4.1 Comparison with the baselines . . . . . . . . . . . . . . . . . 75 4.4.2 Comparison with the existing works . . . . . . . . . . . . . . 80 4.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 5.Conclusion 83 5.1 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 5.2 Future Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 Abstract (In Korean) 94Docto

    Hard-Aware Point-to-Set Deep Metric for Person Re-identification

    Full text link
    Person re-identification (re-ID) is a highly challenging task due to large variations of pose, viewpoint, illumination, and occlusion. Deep metric learning provides a satisfactory solution to person re-ID by training a deep network under supervision of metric loss, e.g., triplet loss. However, the performance of deep metric learning is greatly limited by traditional sampling methods. To solve this problem, we propose a Hard-Aware Point-to-Set (HAP2S) loss with a soft hard-mining scheme. Based on the point-to-set triplet loss framework, the HAP2S loss adaptively assigns greater weights to harder samples. Several advantageous properties are observed when compared with other state-of-the-art loss functions: 1) Accuracy: HAP2S loss consistently achieves higher re-ID accuracies than other alternatives on three large-scale benchmark datasets; 2) Robustness: HAP2S loss is more robust to outliers than other losses; 3) Flexibility: HAP2S loss does not rely on a specific weight function, i.e., different instantiations of HAP2S loss are equally effective. 4) Generality: In addition to person re-ID, we apply the proposed method to generic deep metric learning benchmarks including CUB-200-2011 and Cars196, and also achieve state-of-the-art results.Comment: Accepted to ECCV 201
    • …
    corecore