2 research outputs found
3μ°¨μ μ¬λ μμΈ μΆμ μ μν 3μ°¨μ 볡μ, μ½μ§λνμ΅, μ§λνμ΅ λ°©λ²
νμλ
Όλ¬Έ (λ°μ¬)-- μμΈλνκ΅ λνμ : μ΅ν©κ³ΌνκΈ°μ λνμ μ΅ν©κ³ΌνλΆ(μ§λ₯νμ΅ν©μμ€ν
μ 곡), 2019. 2. κ³½λ
Έμ€.Estimating human poses from images is one of the fundamental tasks in computer vision, which leads to lots of applications such as action recognition, human-computer interaction, and virtual reality. Especially, estimating 3D human poses from 2D inputs is a challenging problem since it is inherently under-constrained. In addition, obtaining 3D ground truth data for human poses is only possible under the limited and restricted environments. In this dissertation, 3D human pose estimation is studied in different aspects focusing on various types of the availability of the data. To this end, three different methods to retrieve 3D human poses from 2D observations or from RGB images---algorithms of 3D reconstruction, weakly-supervised learning, and supervised learning---are proposed.
First, a non-rigid structure from motion (NRSfM) algorithm that reconstructs 3D structures of non-rigid objects such as human bodies from 2D observations is proposed. In the proposed framework which is named as Procrustean Regression, the 3D shapes are regularized based on their aligned shapes. We show that the cost function of the Procrustean Regression can be casted into an unconstrained problem or a problem with simple bound constraints, which can be efficiently solved by existing gradient descent solvers. This framework can be easily integrated with numerous existing models and assumptions, which makes it more practical for various real situations. The experimental results show that the proposed method gives competitive result to the state-of-the-art methods for orthographic projection with much less time complexity and memory requirement, and outperforms the existing methods for perspective projection.
Second, a weakly-supervised learning method that is capable of learning 3D structures when only 2D ground truth data is available as a training set is presented. Extending the Procrustean Regression framework, we suggest Procrustean Regression Network, a learning method that trains neural networks to learn 3D structures using training data with 2D ground truths. This is the first attempt that directly integrates an NRSfM algorithm into neural network training. The cost function that contains a low-rank function is also firstly used as a cost function of neural networks that reconstructs 3D shapes. During the test phase, 3D structures of human bodies can be obtained via a feed-forward operation, which enables the framework to have much faster inference time compared to the 3D reconstruction algorithms.
Third, a supervised learning method that infers 3D poses from 2D inputs using neural networks is suggested. The method exploits a relational unit which captures the relations between different body parts. In the method, each pair of different body parts generates relational features, and the average of the features from all the pairs are used for 3D pose estimation. We also suggest a dropout method called relational dropout, which can be used in relational modules to impose robustness to the occlusions. The experimental results validate that the performance of the proposed algorithm does not degrade much when missing points exist while maintaining state-of-the-art performance when every point is visible.RGB μμμμμ μ¬λ μμΈ μΆμ λ°©λ²μ μ»΄ν¨ν° λΉμ λΆμΌμμ μ€μνλ©° μ¬λ¬ μ΄ν리μΌμ΄μ
μ κΈ°λ³Έμ΄ λλ κΈ°μ μ΄λ€. μ¬λ μμΈ μΆμ μ λμ μΈμ, μΈκ°-μ»΄ν¨ν° μνΈμμ©, κ°μ νμ€, μ¦κ° νμ€ λ± κ΄λ²μν λΆμΌμμ κΈ°λ° κΈ°μ λ‘ μ¬μ©λ μ μλ€. νΉν, 2μ°¨μ μ
λ ₯μΌλ‘λΆν° 3μ°¨μ μ¬λ μμΈλ₯Ό μΆμ νλ λ¬Έμ λ 무μν λ§μ ν΄λ₯Ό κ°μ§ μ μλ λ¬Έμ μ΄κΈ° λλ¬Έμ νκΈ° μ΄λ €μ΄ λ¬Έμ λ‘ μλ €μ Έ μλ€. λν, 3μ°¨μ μ€μ λ°μ΄ν°μ μ΅λμ λͺ¨μ
μΊ‘μ² μ€νλμ€ λ± μ νλ νκ²½νμμλ§ κ°λ₯νκΈ° λλ¬Έμ μ»μ μ μλ λ°μ΄ν°μ μμ΄ νμ μ μ΄λ€. λ³Έ λ
Όλ¬Έμμλ, μ»μ μ μλ νμ΅ λ°μ΄ν°μ μ’
λ₯μ λ°λΌ μ¬λ¬ λ°©λ©΄μΌλ‘ 3μ°¨μ μ¬λ μμΈλ₯Ό μΆμ νλ λ°©λ²μ μ°κ΅¬νμλ€. ꡬ체μ μΌλ‘, 2μ°¨μ κ΄μΈ‘κ° λλ RGB μμμ λ°νμΌλ‘ 3μ°¨μ μ¬λ μμΈλ₯Ό μΆμ , 볡μνλ μΈ κ°μ§ λ°©λ²--3μ°¨μ 볡μ, μ½μ§λνμ΅, μ§λνμ΅--μ μ μνμλ€.
첫 λ²μ§Έλ‘, μ¬λμ μ 체μ κ°μ΄ λΉμ ν κ°μ²΄μ 2μ°¨μ κ΄μΈ‘κ°μΌλ‘λΆν° 3μ°¨μ ꡬ쑰λ₯Ό 볡μνλ λΉμ ν μμ§μ κΈ°λ° κ΅¬μ‘° (Non-rigid structure from motion) μκ³ λ¦¬μ¦μ μ μνμλ€. νλ‘ν¬λ£¨μ€ν
μ€ νκ· (Procrustean regression)μΌλ‘ λͺ
λͺ
ν μ μλ νλ μμν¬μμ, 3μ°¨μ ννλ€μ κ·Έλ€μ μ λ ¬λ ννμ λν ν¨μλ‘ μ κ·νλλ€. μ μλ νλ‘ν¬λ£¨μ€ν
μ€ νκ·μ λΉμ© ν¨μλ 3μ°¨μ νν μ λ ¬κ³Ό κ΄λ ¨λ μ μ½μ λΉμ© ν¨μμ ν¬ν¨μμΌ κ²½μ¬ νκ°λ²μ μ΄μ©ν μ΅μ νκ° κ°λ₯νλ€. μ μλ λ°©λ²μ λ€μν λͺ¨λΈκ³Ό κ°μ μ ν¬ν¨μν¬ μ μμ΄ μ€μ©μ μ΄κ³ μ μ°ν νλ μμν¬μ΄λ€. λ€μν μ€νμ ν΅ν΄ μ μλ λ°©λ²μ μΈκ³ μ΅κ³ μμ€μ λ°©λ²λ€κ³Ό λΉκ΅ν΄ μ μ¬ν μ±λ₯μ 보μ΄λ©΄μ, λμμ μκ°, κ³΅κ° λ³΅μ‘λ λ©΄μμ κΈ°μ‘΄ λ°©λ²μ λΉν΄ μ°μν¨μ 보μλ€.
λ λ²μ§Έλ‘ μ μλ λ°©λ²μ, 2μ°¨μ νμ΅ λ°μ΄ν°λ§ μ£Όμ΄μ‘μ λ 2μ°¨μ μ
λ ₯μμ 3μ°¨μ ꡬ쑰λ₯Ό 볡μνλ μ½μ§λνμ΅ λ°©λ²μ΄λ€. νλ‘ν¬λ£¨μ€ν
μ€ νκ· μ κ²½λ§ (Procrustean regression network)λ‘ λͺ
λͺ
ν μ μλ νμ΅ λ°©λ²μ μ κ²½λ§ λλ 컨볼루μ
μ κ²½λ§μ ν΅ν΄ μ¬λμ 2μ°¨μ μμΈλ‘λΆν° 3μ°¨μ μμΈλ₯Ό μΆμ νλ λ°©λ²μ νμ΅νλ€. νλ‘ν¬λ£¨μ€ν
μ€ νκ·μ μ¬μ©λ λΉμ© ν¨μλ₯Ό μμ νμ¬ μ κ²½λ§μ νμ΅μν€λ λ³Έ λ°©λ²μ, λΉμ ν μμ§μ κΈ°λ° κ΅¬μ‘°μ μ¬μ©λ λΉμ© ν¨μλ₯Ό μ κ²½λ§ νμ΅μ μ μ©ν μ΅μ΄μ μλμ΄λ€. λν λΉμ©ν¨μμ μ¬μ©λ μ κ³μ ν¨μ (low-rank function)λ₯Ό μ κ²½λ§ νμ΅μ μ²μμΌλ‘ μ¬μ©νμλ€. ν
μ€νΈ λ°μ΄ν°μ λν΄μ 3μ°¨μ μ¬λ μμΈλ μ κ²½λ§μ μ λ°©μ λ¬(feed forward)μ°μ°μ μν΄ μ»μ΄μ§λ―λ‘, 3μ°¨μ 볡μ λ°©λ²μ λΉν΄ ν¨μ¬ λΉ λ₯Έ 3μ°¨μ μμΈ μΆμ μ΄ κ°λ₯νλ€.
λ§μ§λ§μΌλ‘, μ κ²½λ§μ μ΄μ©ν΄ 2μ°¨μ μ
λ ₯μΌλ‘λΆν° 3μ°¨μ μ¬λ μμΈλ₯Ό μΆμ νλ μ§λνμ΅ λ°©λ²μ μ μνμλ€. λ³Έ λ°©λ²μ κ΄κ³ μ κ²½λ§ λͺ¨λ(relational modules)μ νμ©ν΄ μ 체μ λ€λ₯Έ λΆμκ°μ κ΄κ³λ₯Ό νμ΅νλ€. μλ‘ λ€λ₯Έ λΆμμ μλ§λ€ κ΄κ³ νΉμ§μ μΆμΆν΄ λͺ¨λ κ΄κ³ νΉμ§μ νκ· μ μ΅μ’
3μ°¨μ μμΈ μΆμ μ μ¬μ©νλ€. λν κ΄κ³ν λλμμ(relational dropout)μ΄λΌλ μλ‘μ΄ νμ΅ λ°©λ²μ μ μν΄ κ°λ €μ§μ μν΄ λνλμ§ μμ 2μ°¨μ κ΄μΈ‘κ°μ΄ μλ μν©μμ, κ°μΈνκ² λμν μ μλ 3μ°¨μ μμΈ μΆμ λ°©λ²μ μ μνμλ€. μ€νμ ν΅ν΄ ν΄λΉ λ°©λ²μ΄ 2μ°¨μ κ΄μΈ‘κ°μ΄ μΌλΆλ§ μ£Όμ΄μ§ μν©μμλ ν° μ±λ₯ νλ½μ΄ μμ΄ ν¨κ³Όμ μΌλ‘ 3μ°¨μ μμΈλ₯Ό μΆμ ν¨μ μ¦λͺ
νμλ€.Abstract i
Contents iii
List of Tables vi
List of Figures viii
1 Introduction 1
1.1 Problem Definition . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3 Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.4 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.4.1 3D Reconstruction of Human Bodies . . . . . . . . . . 9
1.4.2 Weakly-Supervised Learning for 3D HPE . . . . . . . . 11
1.4.3 Supervised Learning for 3D HPE . . . . . . . . . . . . 11
1.5 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2 Related Works 14
2.1 2D Human Pose Estimation . . . . . . . . . . . . . . . . . . . . 14
2.2 3D Human Pose Estimation . . . . . . . . . . . . . . . . . . . . 16
2.3 Non-rigid Structure from Motion . . . . . . . . . . . . . . . . . 18
2.4 Learning to Reconstruct 3D Structures via Neural Networks . . 23
3 3D Reconstruction of Human Bodies via Procrustean Regression 25
3.1 Formalization of NRSfM . . . . . . . . . . . . . . . . . . . . . 27
3.2 Procrustean Regression . . . . . . . . . . . . . . . . . . . . . . 28
3.2.1 The Cost Function of Procrustean Regression . . . . . . 29
3.2.2 Derivatives of the Cost Function . . . . . . . . . . . . . 32
3.2.3 Example Functions for f and g . . . . . . . . . . . . . . 38
3.2.4 Handling Missing Points . . . . . . . . . . . . . . . . . 43
3.2.5 Optimization . . . . . . . . . . . . . . . . . . . . . . . 44
3.2.6 Initialization . . . . . . . . . . . . . . . . . . . . . . . 44
3.3 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . 45
3.3.1 Orthographic Projection . . . . . . . . . . . . . . . . . 46
3.3.2 Perspective Projection . . . . . . . . . . . . . . . . . . 56
3.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
3.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
4 Weakly-Supervised Learning of 3D Human Pose via Procrustean Regression Networks 69
4.1 The Cost Function for Procrustean Regression Network . . . . . 70
4.2 Choosing f and g for Procrustean Regression Network . . . . . 74
4.3 Implementation Details . . . . . . . . . . . . . . . . . . . . . . 75
4.4 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . 77
4.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
5 Supervised Learning of 3D Human Pose via Relational Networks 86
5.1 Relational Networks . . . . . . . . . . . . . . . . . . . . . . . 88
5.2 Relational Networks for 3D HPE . . . . . . . . . . . . . . . . . 88
5.3 Extensions to Multi-Frame Inputs . . . . . . . . . . . . . . . . 91
5.4 Relational Dropout . . . . . . . . . . . . . . . . . . . . . . . . 93
5.5 Implementation Details . . . . . . . . . . . . . . . . . . . . . . 94
5.6 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . 95
5.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
6 Concluding Remarks 105
6.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
6.2 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
6.3 Future Directions . . . . . . . . . . . . . . . . . . . . . . . . . 108
Abstract (In Korean) 128Docto