5,481 research outputs found
Joint Learning of Body and Part Representation for Person Re-Identification
Β© 2013 IEEE. Person re-identification (ReID), aiming to identify people among multiple camera views, has attracted an increasing attention due to the potential of application in surveillance security. Large variations in subjects' postures, view angles, and illuminating conditions as well as non-ideal human detection significantly increase the difficulty of person ReID. Learning a robust metric for measuring the similarity between different person images is another under-addressed problem. In this paper, following the recent success of part-based models, in order to generate a discriminative and robust feature representation, we first propose to learn global and weighted local body-part features from pedestrian images. Then, in the training phase, angular loss and part-level classification loss are employed jointly as a similarity measure to train the network, which significantly improves the robustness of the resultant network against feature variance. Experimental results on several benchmark data sets demonstrate that our method outperforms the state-of-the-art methods
Energy Confused Adversarial Metric Learning for Zero-Shot Image Retrieval and Clustering
Deep metric learning has been widely applied in many computer vision tasks,
and recently, it is more attractive in \emph{zero-shot image retrieval and
clustering}(ZSRC) where a good embedding is requested such that the unseen
classes can be distinguished well. Most existing works deem this 'good'
embedding just to be the discriminative one and thus race to devise powerful
metric objectives or hard-sample mining strategies for leaning discriminative
embedding. However, in this paper, we first emphasize that the generalization
ability is a core ingredient of this 'good' embedding as well and largely
affects the metric performance in zero-shot settings as a matter of fact. Then,
we propose the Energy Confused Adversarial Metric Learning(ECAML) framework to
explicitly optimize a robust metric. It is mainly achieved by introducing an
interesting Energy Confusion regularization term, which daringly breaks away
from the traditional metric learning idea of discriminative objective devising,
and seeks to 'confuse' the learned model so as to encourage its generalization
ability by reducing overfitting on the seen classes. We train this confusion
term together with the conventional metric objective in an adversarial manner.
Although it seems weird to 'confuse' the network, we show that our ECAML indeed
serves as an efficient regularization technique for metric learning and is
applicable to various conventional metric methods. This paper empirically and
experimentally demonstrates the importance of learning embedding with good
generalization, achieving state-of-the-art performances on the popular CUB,
CARS, Stanford Online Products and In-Shop datasets for ZSRC tasks.
\textcolor[rgb]{1, 0, 0}{Code available at http://www.bhchen.cn/}.Comment: AAAI 2019, Spotligh
μμ κΈ°λ° λμΌμΈ νλ³μ μν λΆλΆ μ ν© νμ΅
νμλ
Όλ¬Έ (λ°μ¬)-- μμΈλνκ΅ λνμ : 곡과λν μ κΈ°Β·μ 보곡νλΆ, 2019. 2. μ΄κ²½λ¬΄.Person re-identification is a problem of identifying the same individuals among the persons captured from different cameras. It is a challenging problem because the same person captured from non-overlapping cameras usually shows dramatic appearance change due to the viewpoint, pose, and illumination changes. Since it is an essential tool for many surveillance applications, various research directions have been exploredhowever, it is far from being solved.
The goal of this thesis is to solve person re-identification problem under the surveillance system. In particular, we focus on two critical components: designing 1) a better image representation model using human poses and 2) a better training method using hard sample mining. First, we propose a part-aligned representation model which represents an image as the bilinear pooling between appearance and part maps. Since the image similarity is independently calculated from the locations of body parts, it addresses the body part misalignment issue and effectively distinguishes different people by discriminating fine-grained local differences. Second, we propose a stochastic hard sample mining method that exploits class information to generate diverse and hard examples to use for training. It efficiently explores the training samples while avoiding stuck in a small subset of hard samples, thereby effectively training the model. Finally, we propose an integrated system that combines the two approaches, which is benefited from both components. Experimental results show that the proposed method works robustly on five datasets with diverse conditions and its potential extension to the more general conditions.λμΌμΈ νλ³λ¬Έμ λ λ€λ₯Έ μΉ΄λ©λΌλ‘ 촬μλ κ°κ°μ μμμ μ°ν λ μ¬λμ΄ κ°μ μ¬λμΈμ§ μ¬λΆλ₯Ό νλ¨νλ λ¬Έμ μ΄λ€. μ΄λ κ°μμΉ΄λ©λΌμ 보μμ κ΄λ ¨λ λ€μν μμ© λΆμΌμμ μ€μν λκ΅¬λ‘ νμ©λκΈ° λλ¬Έμ μ΅κ·ΌκΉμ§ λ§μ μ°κ΅¬κ° μ΄λ£¨μ΄μ§κ³ μλ€. κ·Έλ¬λ κ°μ μ¬λμ΄λλΌλ μκ°, μ₯μ, 촬μ κ°λ, μ‘°λͺ
μνκ° λ€λ₯Έ νκ²½μμ μ°νλ©΄ μμλ§λ€ 보μ΄λ λͺ¨μ΅μ΄ λ¬λΌμ§λ―λ‘ νλ³μ μλννκΈ° μ΄λ ΅λ€λ λ¬Έμ κ° μλ€.
λ³Έ λ
Όλ¬Έμμλ μ£Όλ‘ κ°μμΉ΄λ©λΌ μμμ λν΄μ, κ° μμμμ μλμΌλ‘ μ¬λμ κ²μΆν νμ κ²μΆν κ²°κ³Όλ€μ΄ μλ‘ κ°μ μ¬λμΈμ§ μ¬λΆλ₯Ό νλ¨νλ λ¬Έμ λ₯Ό νκ³ μ νλ€. μ΄λ₯Ό μν΄ 1) μ΄λ€ λͺ¨λΈμ΄ μμμ μ ννν κ²μΈμ§ 2) μ£Όμ΄μ§ λͺ¨λΈμ μ΄λ»κ² μ νμ΅μν¬μ μμμ§ λ κ°μ§ μ§λ¬Έμ λν΄μ μ°κ΅¬νλ€. λ¨Όμ λ²‘ν° κ³΅κ° μμμμ κ±°λ¦¬κ° μ΄λ―Έμ§ μμμ λμλλ ννΈλ€ μ¬μ΄μ μκΉμ μ°¨μ΄μ ν©κ³Ό κ°μμ§λλ‘ νλ 맀ν ν¨μλ₯Ό μ€κ³ν¨μΌλ‘μ¨ κ²μΆλ μ¬λλ€ μ¬μ΄μ μ 체 λΆλΆλ³λ‘ μκΉμλ₯Ό λΉκ΅λ₯Ό ν΅ν΄ ν¨κ³Όμ μΈ νλ³μ κ°λ₯νκ² νλ λͺ¨λΈμ μ μνλ€. λλ²μ§Έλ‘ νμ΅ κ³Όμ μμ ν΄λμ€ μ 보λ₯Ό νμ©ν΄μ μ μ κ³μ°λμΌλ‘ μ΄λ €μ΄ μμλ₯Ό λ§μ΄ 보λλ‘ ν¨μΌλ‘μ¨ ν¨κ³Όμ μΌλ‘ ν¨μμ νλΌλ―Έν°λ₯Ό νμ΅νλ λ°©λ²μ μ μνλ€. μ΅μ’
μ μΌλ‘λ λ μμλ₯Ό κ²°ν©ν΄μ μλ‘μ΄ λμΌμΈ νλ³ μμ€ν
μ μ μνκ³ μ νλ€. λ³Έ λ
Όλ¬Έμμλ μ€νκ²°κ³Όλ₯Ό ν΅ν΄ μ μνλ λ°©λ²μ΄ λ€μν νκ²½μμ κ°μΈνκ³ ν¨κ³Όμ μΌλ‘ λμν¨μ μ¦λͺ
νμκ³ λ³΄λ€ μΌλ°μ μΈ νκ²½μΌλ‘μ νμ₯ κ°λ₯μ±λ νμΈ ν μ μμ κ²μ΄λ€.Abstract i
Contents ii
List of Tables v
List of Figures vii
1. Introduction 1
1.1 Part-Aligned Bilinear Representations . . . . . . . . . . . . . . . . . 3
1.2 Stochastic Class-Based Hard Sample Mining . . . . . . . . . . . . . 4
1.3 Integrated System for Person Re-identification . . . . . . . . . . . . . 5
2. Part-Aligned Bilinear Represenatations 6
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.3 Our Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.3.1 Two-Stream Network . . . . . . . . . . . . . . . . . . . . . . 10
2.3.2 Bilinear Pooling . . . . . . . . . . . . . . . . . . . . . . . . 12
2.3.3 Loss . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.4 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.4.1 Part-Aware Image Similarity . . . . . . . . . . . . . . . . . . 13
2.4.2 Relationship to the Baseline Models . . . . . . . . . . . . . . 15
2.4.3 Decomposition of Appearance and Part Maps . . . . . . . . . 15
2.4.4 Part-Alignment Effects on Reducing Misalignment Issue . . . 19
2.5 Implementation Details . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.6 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.6.1 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.6.2 Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . . 23
2.6.3 Comparison with the Baselines . . . . . . . . . . . . . . . . . 24
2.6.4 Comparison with State-of-the-Art Methods . . . . . . . . . . 25
2.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3. Stochastic Class-Based Hard Sample Mining 35
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.2 Related Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.3 Deep Metric Learning with Triplet Loss . . . . . . . . . . . . . . . . 40
3.3.1 Triplet Loss . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.3.2 Efficient Learning with Triplet Loss . . . . . . . . . . . . . . 41
3.4 Batch Construction for Metric Learning . . . . . . . . . . . . . . . . 42
3.4.1 Neighbor Class Mining by Class Signatures . . . . . . . . . . 42
3.4.2 Batch Construction . . . . . . . . . . . . . . . . . . . . . . . 44
3.4.3 Scalable Extension to the Number of Classes . . . . . . . . . 50
3.5 Loss . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
3.6 Feature Extractor . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
3.7 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
3.7.1 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
3.7.2 Implementation Details . . . . . . . . . . . . . . . . . . . . . 55
3.7.3 Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . . 56
3.7.4 Effect of the Stochastic Hard Example Mining . . . . . . . . 59
3.7.5 Comparison with the Existing Methods on Image Retrieval
Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
3.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .70
4. Integrated System for Person Re-identification 71
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
4.2 Hard Positive Mining . . . . . . . . . . . . . . . . . . . . . . . . . . 72
4.3 Integrated System for Person Re-identification . . . . . . . . . . . . . 75
4.4 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
4.4.1 Comparison with the baselines . . . . . . . . . . . . . . . . . 75
4.4.2 Comparison with the existing works . . . . . . . . . . . . . . 80
4.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
5.Conclusion 83
5.1 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
5.2 Future Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
Abstract (In Korean) 94Docto
Hard-Aware Point-to-Set Deep Metric for Person Re-identification
Person re-identification (re-ID) is a highly challenging task due to large
variations of pose, viewpoint, illumination, and occlusion. Deep metric
learning provides a satisfactory solution to person re-ID by training a deep
network under supervision of metric loss, e.g., triplet loss. However, the
performance of deep metric learning is greatly limited by traditional sampling
methods. To solve this problem, we propose a Hard-Aware Point-to-Set (HAP2S)
loss with a soft hard-mining scheme. Based on the point-to-set triplet loss
framework, the HAP2S loss adaptively assigns greater weights to harder samples.
Several advantageous properties are observed when compared with other
state-of-the-art loss functions: 1) Accuracy: HAP2S loss consistently achieves
higher re-ID accuracies than other alternatives on three large-scale benchmark
datasets; 2) Robustness: HAP2S loss is more robust to outliers than other
losses; 3) Flexibility: HAP2S loss does not rely on a specific weight function,
i.e., different instantiations of HAP2S loss are equally effective. 4)
Generality: In addition to person re-ID, we apply the proposed method to
generic deep metric learning benchmarks including CUB-200-2011 and Cars196, and
also achieve state-of-the-art results.Comment: Accepted to ECCV 201
- β¦