1 research outputs found

    ํ•™์Šต๋œ ๋ชจ๋ธ ๊ฐฑ์‹  ๊ธฐ๋ฒ•์„ ์‚ฌ์šฉํ•œ ๊ฐ•์ธํ•œ ๋ฌผ์ฒด ์ถ”์ 

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ (๋ฐ•์‚ฌ) -- ์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› : ๊ณต๊ณผ๋Œ€ํ•™ ์ „๊ธฐยท์ปดํ“จํ„ฐ๊ณตํ•™๋ถ€, 2021. 2. ์ด๊ฒฝ๋ฌด.๋ณธ ํ•™์œ„ ๋…ผ๋ฌธ์—์„œ๋Š” ๋ฌผ์ฒด ์ถ”์  ์•Œ๊ณ ๋ฆฌ์ฆ˜๋“ค์—์„œ ์‚ฌ์šฉ๋˜๋Š” ๋ชจ๋ธ ๊ฐฑ์‹  ๊ธฐ๋ฒ•์˜ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•œ ๋ฐฉ๋ฒ•๋ก ์„ ์ œ์•ˆํ•œ๋‹ค. ๊ธฐ์กด ๋ฌผ์ฒด ์ถ”์  ์•Œ๊ณ ๋ฆฌ์ฆ˜๋“ค์€ ๋ฌผ์ฒด ์ถ”์  ๋ฌธ์ œ๋ฅผ ๊ฐ์ฒด ๊ฒ€์ถœ์„ ํ†ตํ•œ ์ถ”์  ๋ฌธ์ œ (tracking-by-detection) ๋กœ ๊ฐ„์ฃผํ•˜์—ฌ ์™”์œผ๋ฉฐ, ์ด๋“ค์€ ํŠน์ • ๋ฌผ์ฒด๋ฅผ ๊ฒ€์ถœํ•  ์ˆ˜ ์žˆ๋Š” ๊ฒ€์ถœ๊ธฐ ๋ชจ๋ธ์„ ์ฃผ์–ด์ง„ ๋น„๋””์˜ค์˜ ์ฒซ๋ฒˆ์งธ ํ”„๋ ˆ์ž„์—์„œ ํ•™์Šตํ•˜์—ฌ ์ด ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•˜์—ฌ ๋น„๋””์˜ค์˜ ์ฐจํ›„ ํ”„๋ ˆ์ž„๋“ค์—์„œ ๋ชฉํ‘œ ๋ฌผ์ฒด๋ฅผ ๊ฒ€์ถœํ•˜๋Š” ๋ฐฉ์‹์œผ๋กœ ๋ฌผ์ฒด ์ถ”์  ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜์—ฌ ์™”๋‹ค. ํ•˜์ง€๋งŒ ์ด๋Ÿฌํ•œ ๋ชจ๋ธ์€ ๋ฌผ์ฒด์˜ ๋ณ€ํ˜•, ํฌ๊ธฐ ๋ณ€ํ™”, ๊ฐ€๋ ค์ง, ์กฐ๋ช… ๋ณ€ํ™”, ๋ฐฐ๊ฒฝ ๋ฌผ์ฒด์˜ ๋“ฑ์žฅ ๋“ฑ์˜ ๋‹ค์–‘ํ•œ ์ƒํ™ฉ๋ณ€ํ™”์™€ ๋ฌผ์ฒด์˜ ์™ธ์–‘๋ณ€ํ™”์— ๋”ฐ๋ผ ์ถ”์ ์— ์–ด๋ ค์›€์ด ์กด์žฌํ•œ๋‹ค. ์ด๋Ÿฌํ•œ ๋ฌธ์ œ๋“ค์„ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ๊ธฐ์กด ๋ฌผ์ฒด ์ถ”์  ์•Œ๊ณ ๋ฆฌ์ฆ˜๋“ค์€ ์ถ”์  ๋„์ค‘์— ๋ฌผ์ฒด์˜ ๋ณ€ํ™”ํ•œ ์™ธ์–‘๊ณผ ๋ฐฐ๊ฒฝ ๋ฌผ์ฒด๋“ค์— ๋Œ€ํ•œ ์ƒˆ๋กœ์šด ์ •๋ณด๋ฅผ ์ถ”์  ๊ณผ์ •์— ๋ฐ˜์˜ํ•˜๊ธฐ ์œ„ํ•ด ๋ชจ๋ธ ๊ฐฑ์‹  ๊ธฐ๋ฒ•์„ ํ™œ์šฉํ•˜์—ฌ ์™”๋‹ค. ํ•˜์ง€๋งŒ ์ด๋Ÿฌํ•œ ๊ธฐ๋ฒ•๋“ค์—์„œ ๋ชจ๋ธ ๊ฐฑ์‹  ๊ณผ์ •์€ ์†Œ์ˆ˜์˜ ํ•™์Šต ํ‘œ๋ณธ์„ ์‚ฌ์šฉํ•œ ์ตœ์ ํ™” ๋ฌธ์ œ์˜ ํ•ด๊ฒฐ์„ ํ†ตํ•ด ์ฃผ๋กœ ์ด๋ฃจ์–ด์ง€๋ฉฐ, ๊ฒฝํ—˜์ ์œผ๋กœ ์–ป์–ด์ง„ ์ •๊ทœํ™” ๊ธฐ๋ฒ•์„ ํ™œ์šฉํ•˜๊ธฐ ๋•Œ๋ฌธ์— ๋ชจ๋ธ์˜ ๊ณผ์ ํ•ฉ ๋ฌธ์ œ์™€ ์˜ค๋ฅ˜ ๋ˆ„์  ๋ฌธ์ œ๊ฐ€ ๋ฌผ์ฒด ์ถ”์  ๊ณผ์ •์—์„œ ์ง€์†๋˜๋Š” ๋ฌธ์ œ์ ์ด ์žˆ๋‹ค. ์ „์ˆ ํ•œ ๋ฌธ์ œ์ ๋“ค์„ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ๋ฌผ์ฒด ์ถ”์  ๋ฌธ์ œ์—์„œ ์‚ฌ์šฉ๋˜๋Š” ๋ชจ๋ธ ๊ฐฑ์‹  ๊ธฐ๋ฒ•์— ๋Œ€ํ•œ ์ƒˆ๋กœ์šด ์ ‘๊ทผ๋ฒ•๋“ค์„ ์ œ์‹œํ•œ๋‹ค. ์ด์— ๋Œ€ํ•ด ์„ธ ๊ฐ€์ง€์˜ ๋ชจ๋ธ ๊ฐฑ์‹  ๊ธฐ๋ฒ•๋“ค์„ ์ œ์•ˆํ•˜๋ฉฐ ๊ฐ๊ฐ์€: (1) ๊ฐ•ํ™”ํ•™์Šต ๊ธฐ๋ฒ•์— ๊ธฐ๋ฐ˜ํ•œ ํ‘œ๋ณธ ์„ ํƒ๊ธฐ๋ฒ•, (2) ๋ฉ”ํƒ€ํ•™์Šต์„ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•œ ํ”ผ์ณ ๊ณต๊ฐ„์˜ ๊ฐฑ์‹ ๊ธฐ๋ฒ•, (3) ์ ์‘์  ์ปจํ‹ฐ๋‰ด์–ผ ๋ฉ”ํƒ€ํ•™์Šต ๊ธฐ๋ฐ˜์˜ ๊ฐฑ์‹ ๊ธฐ๋ฒ•์ด๋‹ค. ์ œ์•ˆํ•œ ๋ฐฉ๋ฒ•๋ก ๋“ค์€ ์‹ฌ์ธต ์‹ ๊ฒฝ๋ง ๊ตฌ์กฐ์— ๊ธฐ๋ฐ˜ํ•œ ๋ฉ”ํƒ€๋Ÿฌ๋„ˆ๋ฅผ ๋„์ž…ํ•˜์—ฌ ๋‹ค์–‘ํ•œ ์žฅ๋ฉด๋ณ€ํ™”์™€ ์ƒํ™ฉ๋ณ€ํ™”์— ๋Œ€ํ•ด ํ•™์Šต ๊ณผ์ •์—์„œ์˜ ๊ณผ์ ํ•ฉ ๋ฌธ์ œ์™€ ์˜ค๋ฅ˜ ๋ˆ„์  ๋ฌธ์ œ๋ฅผ ์ค„์ด๊ณ ์ž ํ•˜์˜€์œผ๋ฉฐ, ๋ฉ”ํƒ€๋Ÿฌ๋„ˆ๋Š” ๊ฒฝ๋Ÿ‰ํ™”๋œ ๊ตฌ์กฐ๋กœ ์„ค๊ณ„๋˜์–ด ์ „์ฒด ๋ฌผ์ฒด ์ถ”์  ํ”„๋ ˆ์ž„์›Œํฌ๊ฐ€ ์‹ค์‹œ๊ฐ„ ์†๋„๋กœ ๋™์ž‘ํ•  ์ˆ˜ ์žˆ๊ฒŒ ํ•˜์˜€๋‹ค. ์ฒซ๋ฒˆ์งธ๋กœ, ์ •์ฑ… ๋„คํŠธ์›Œํฌ๋ฅผ ๋ฉ”ํƒ€๋Ÿฌ๋„ˆ๋กœ ํ™œ์šฉํ•˜๋Š” ๊ฐ•ํ™”ํ•™์Šต ๊ธฐ๋ฐ˜์˜ ํ‘œ๋ณธ ์„ ํƒ๊ธฐ๋ฒ•์„ ์ œ์•ˆํ•œ๋‹ค. ์ •์ฑ… ๋„คํŠธ์›Œํฌ๋Š” ์ฃผ์–ด์ง„ ์žฅ๋ฉด์—์„œ ๋ชฉํ‘œ ๋ฌผ์ฒด๋ฅผ ๊ฒ€์ถœํ•˜๊ธฐ ์œ„ํ•ด ์—ฌ๋Ÿฌ ํ‘œ๋ณธ ์ค‘ ์‚ฌ์šฉํ•˜๊ธฐ์— ๊ฐ€์žฅ ์ ํ•ฉํ•œ ํ‘œ๋ณธ์„ ์„ ํƒํ•˜๋Š” ์˜์‚ฌ๊ฒฐ์ •์„ ํ•™์Šตํ•œ๋‹ค. ๋‹ค์Œ์œผ๋กœ, ๋ฉ”ํƒ€๋Ÿฌ๋„ˆ ๋„คํŠธ์›Œํฌ๋ฅผ ํ™œ์šฉํ•œ ๋ฉ”ํƒ€ํ•™์Šต ๊ธฐ๋ฒ•์„ ์ œ์•ˆํ•˜๋ฉฐ, ์—ฌ๊ธฐ์„œ ๋ฉ”ํƒ€๋Ÿฌ๋„ˆ๋Š” ์†์‹คํ•จ์ˆ˜์˜ ๊ทธ๋ž˜๋””์–ธํŠธ ์ •๋ณด๋ฅผ ํ™œ์šฉํ•ด ๋ชฉํ‘œ ๋ฌผ์ฒด์— ํŠนํ™”๋œ ํ”ผ์ณ ๊ณต๊ฐ„์„ ๊ตฌ์ถ•ํ•œ๋‹ค. ๋ฉ”ํƒ€๋Ÿฌ๋„ˆ ๋„คํŠธ์›Œํฌ๋Š” ๋ฌผ์ฒด ์ถ”์ ๊ธฐ์— ๋Œ€ํ•ด ์ ์‘์ ์ธ ๊ฐ€์ค‘์น˜์™€ ์ฑ„๋„ ์–ดํ…์…˜์˜ ํ˜•ํƒœ๋กœ ์ƒˆ๋กœ์šด ์ •๋ณด๋ฅผ ์ œ๊ณตํ•œ๋‹ค. ๋งˆ์ง€๋ง‰์œผ๋กœ, ์ปจํ‹ฐ๋‰ด์–ผ ๋ฉ”ํƒ€ํ•™์Šต ๊ธฐ๋ฐ˜์˜ ๊ธฐ๋ฒ•์—์„œ๋Š” ์ดˆ๊ธฐ ๊ฐฑ์‹ ๊ณผ์ •๊ณผ ์˜จ๋ผ์ธ ๊ฐฑ์‹ ๊ณผ์ • ๋‘ ๊ฐ€์ง€ ๋ชจ๋‘๋ฅผ ์ ์‘ํ˜• ์ปจํ‹ฐ๋‰ด์–ผ ๋ฉ”ํƒ€ํ•™์Šต ํ”„๋ ˆ์ž„์›Œํฌ๋กœ ๋ชจ๋ธํ•œ๋‹ค. ๋ฉ”ํƒ€๋Ÿฌ๋„ˆ๋Š” ๋ฌผ์ฒด ์ถ”์ ๊ธฐ๊ฐ€ ์ƒˆ๋กœ์šด ํ•™์Šต ํ‘œ๋ณธ์„ ๋ฐฐ์šธ์ง€, ์•„๋‹ˆ๋ฉด ๊ธฐ์กด ์ง€์‹์„ ์œ ์ง€ํ• ์ง€๋ฅผ ์„ ํƒํ•  ์ˆ˜ ์žˆ๋„๋ก ์ ์‘์ ์œผ๋กœ ํ•™์Šต๊ณผ์ •์„ ์ œ์–ดํ•˜๋Š” ์—ญํ• ์„ ํ•™์Šตํ•œ๋‹ค. ์ œ์•ˆํ•˜๋Š” ๊ธฐ๋ฒ•๋“ค์„ ๋ฌผ์ฒด ์ถ”์  ์•Œ๊ณ ๋ฆฌ์ฆ˜๋“ค์— ์ ์šฉํ•ด๋ณธ ๊ฒฐ๊ณผ ์œ ์˜๋ฏธํ•œ ์„ฑ๋Šฅ ํ–ฅ์ƒ์„ ์–ป์„ ์ˆ˜ ์žˆ์—ˆ์œผ๋ฉฐ, ๋ฉด๋ฐ€ํ•œ ์‹คํ—˜์  ๋ถ„์„๊ณผ ๊ตฌ์„ฑ์š”์†Œ๋ณ„ ๋ถ„์„์„ ํ†ตํ•ด ์œ ํšจ์„ฑ์„ ๊ฒ€์ฆํ•˜์˜€๋‹ค. ๋˜ํ•œ ์ €๋ช…ํ•˜๋ฉด์„œ ๋„๋ฆฌ ์‚ฌ์šฉ๋˜๋Š” ๋ฌผ์ฒด ์ถ”์  ๋ฒค์น˜๋งˆํฌ๋“ค์„ ํ™œ์šฉํ•œ ๋น„๊ต์‹คํ—˜์„ ํ†ตํ•ด ๋‹ค๋ฅธ ์ตœ์‹  ๋ฌผ์ฒด์ถ”์  ์•Œ๊ณ ๋ฆฌ์ฆ˜๋“ค๊ณผ ๋น„๊ตํ•ด์„œ๋„ ์‹ค์‹œ๊ฐ„ ์†๋„๋กœ ํšจ์œจ์ ์œผ๋กœ ๋™์ž‘ํ•˜๋ฉด์„œ ์šฐ์ˆ˜ํ•œ ์„ฑ๋Šฅ์„ ๋ณด์—ฌ์คŒ์„ ํ™•์ธํ•˜์˜€๋‹ค.In this dissertation, we address the model adaptation problem of visual tracking algorithms. Conventional tracking algorithms regard the visual tracking problem as a tracking-by-detection problem, which can be solved by formulating a target-specific detection model at the initial frame of a given video, and evaluating the model for the subsequent video frames. However, various challenges are associated with the model due to changes in circumstances such as target deformation, scale change, occlusion, illumination change, background clutter, etc. To deal with the aforementioned challenges, conventional tracking algorithms incorporate a model adaptation strategy to provide the model with new information regarding the target appearance and background distractor objects. Nonetheless, since these approaches are often conducted on a handful of self-labeled training examples through solving an optimization task involving hand-crafted regularization schemes, the risk of overfitting and error accumulation persist throughout the course of the tracking process. In order to address the aforementioned problems, we introduce novel approaches to the model adaptation strategy for the visual tracking problem. Three types of model adaptation approaches are proposed, based on the following: (1) reinforcement learning based exemplar selection, (2) deep meta-learning based feature space update, (3) deep adaptive continual meta-learning based adaptation. The proposed approaches introduce deep neural network based meta-learners that can handle various scenes and circumstances with reduced overfitting and error accumulation, while the meta-learners are designed to be light-weight and can achieve real-time speeds for the overall visual tracking framework. First, we propose a deep reinforcement learning based exemplar selection method that incorporates a policy network for its meta-learner. The policy network is trained to make decisions on selecting the adequate target exemplar that can be used to locate the target given a scene. Next, a deep meta-learning based method, which utilizes a meta-learner network to construct the target-specific feature space using the loss gradient information, is proposed. The meta-learner network provides the tracker with new information in the form of adaptive weights and channel attention. Finally, a deep continual meta-learning based method simultaneously models the initial and online adaptations under the adaptive continual meta-learning framework. The meta-learner is trained to adaptively regulate the learning process where the tracker can choose between learning new examples and retaining the previous knowledge. Applying the proposed methods to visual tracking algorithms, significant performance gains are achieved and the effectiveness is validated by the extensive experimental evaluations and component-wise ablation analyses. Additionally, comparisons on well-known, widely used visual tracking benchmarks demonstrate the competitive performance against other state-of-the-art tracking algorithms, while efficiently running at real-time speeds.1 Introduction 1 2 Model Selection by Reinforcement Learning 7 2.1 Introduction 8 2.2 Related Work 10 2.3 Tracking with Reinforced Decisions 13 2.3.1 Proposed Tracking Algorithm 13 2.3.2 Reinforcement Learning Overview and Application to Visual Tracking 16 2.3.3 Network Architectures 19 2.3.4 Training the Policy Network 20 2.4 Experiments 21 2.4.1 Implementation Details 21 2.4.2 Evaluation on OTB dataset 23 2.5 Summary 31 3 Model Update by Meta-Learning 33 3.1 Introduction 34 3.2 Related Work 36 3.3 Tracking with Meta-Learner 39 3.3.1 Overview of Proposed Method 39 3.3.2 Network Implementation and Training 44 3.4 Experimental Results 47 3.4.1 Evaluation Environment 47 3.4.2 Experiments and Analysis 47 3.5 Summary 57 4 Model Update by Continual Meta-Learning 59 4.1 Introduction 61 4.2 Related Work 63 4.3 Tracking with Adaptive Continual Meta-Learner 67 4.3.1 Meta-Training with Simulated Episodes 67 4.3.2 Proposed Tracking Algorithm 73 4.3.3 Baseline Tracking Algorithm: TACT 75 4.4 Experiments 81 4.4.1 Implementation Details 81 4.4.2 Quantitative Evaluation 83 4.4.3 Analysis 88 4.5 Summary 93 5 Conclusion 95 5.1 Summary and Contributions of the Dissertation 95 5.2 Future Work 97 Bibliography 99 ๊ตญ๋ฌธ์ดˆ๋ก 114Docto
    corecore