354 research outputs found

    Multi-body Non-rigid Structure-from-Motion

    Get PDF
    Conventional structure-from-motion (SFM) research is primarily concerned with the 3D reconstruction of a single, rigidly moving object seen by a static camera, or a static and rigid scene observed by a moving camera --in both cases there are only one relative rigid motion involved. Recent progress have extended SFM to the areas of {multi-body SFM} (where there are {multiple rigid} relative motions in the scene), as well as {non-rigid SFM} (where there is a single non-rigid, deformable object or scene). Along this line of thinking, there is apparently a missing gap of "multi-body non-rigid SFM", in which the task would be to jointly reconstruct and segment multiple 3D structures of the multiple, non-rigid objects or deformable scenes from images. Such a multi-body non-rigid scenario is common in reality (e.g. two persons shaking hands, multi-person social event), and how to solve it represents a natural {next-step} in SFM research. By leveraging recent results of subspace clustering, this paper proposes, for the first time, an effective framework for multi-body NRSFM, which simultaneously reconstructs and segments each 3D trajectory into their respective low-dimensional subspace. Under our formulation, 3D trajectories for each non-rigid structure can be well approximated with a sparse affine combination of other 3D trajectories from the same structure (self-expressiveness). We solve the resultant optimization with the alternating direction method of multipliers (ADMM). We demonstrate the efficacy of the proposed framework through extensive experiments on both synthetic and real data sequences. Our method clearly outperforms other alternative methods, such as first clustering the 2D feature tracks to groups and then doing non-rigid reconstruction in each group or first conducting 3D reconstruction by using single subspace assumption and then clustering the 3D trajectories into groups.Comment: 21 pages, 16 figure

    A Generic Framework for Tracking Using Particle Filter With Dynamic Shape Prior

    Get PDF
    Β©2007 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or distribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE. This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder.DOI: 10.1109/TIP.2007.894244Tracking deforming objects involves estimating the global motion of the object and its local deformations as functions of time. Tracking algorithms using Kalman filters or particle filters (PFs) have been proposed for tracking such objects, but these have limitations due to the lack of dynamic shape information. In this paper, we propose a novel method based on employing a locally linear embedding in order to incorporate dynamic shape information into the particle filtering framework for tracking highly deformable objects in the presence of noise and clutter. The PF also models image statistics such as mean and variance of the given data which can be useful in obtaining proper separation of object and backgroun

    A Framework for Image Segmentation Using Shape Models and Kernel Space Shape Priors

    Get PDF
    Β©2008 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or distribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE. This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder.DOI: 10.1109/TPAMI.2007.70774Segmentation involves separating an object from the background in a given image. The use of image information alone often leads to poor segmentation results due to the presence of noise, clutter or occlusion. The introduction of shape priors in the geometric active contour (GAC) framework has proved to be an effective way to ameliorate some of these problems. In this work, we propose a novel segmentation method combining image information with prior shape knowledge, using level-sets. Following the work of Leventon et al., we propose to revisit the use of PCA to introduce prior knowledge about shapes in a more robust manner. We utilize kernel PCA (KPCA) and show that this method outperforms linear PCA by allowing only those shapes that are close enough to the training data. In our segmentation framework, shape knowledge and image information are encoded into two energy functionals entirely described in terms of shapes. This consistent description permits to fully take advantage of the Kernel PCA methodology and leads to promising segmentation results. In particular, our shape-driven segmentation technique allows for the simultaneous encoding of multiple types of shapes, and offers a convincing level of robustness with respect to noise, occlusions, or smearing

    Free-hand sketch synthesis with deformable stroke models

    Get PDF
    We present a generative model which can automatically summarize the stroke composition of free-hand sketches of a given category. When our model is fit to a collection of sketches with similar poses, it discovers and learns the structure and appearance of a set of coherent parts, with each part represented by a group of strokes. It represents both consistent (topology) as well as diverse aspects (structure and appearance variations) of each sketch category. Key to the success of our model are important insights learned from a comprehensive study performed on human stroke data. By fitting this model to images, we are able to synthesize visually similar and pleasant free-hand sketches

    Combining local-physical and global-statistical models for sequential deformable shape from motion

    Get PDF
    The final publication is available at link.springer.comIn this paper, we simultaneously estimate camera pose and non-rigid 3D shape from a monocular video, using a sequential solution that combines local and global representations. We model the object as an ensemble of particles, each ruled by the linear equation of the Newton's second law of motion. This dynamic model is incorporated into a bundle adjustment framework, in combination with simple regularization components that ensure temporal and spatial consistency. The resulting approach allows to sequentially estimate shape and camera poses, while progressively learning a global low-rank model of the shape that is fed back into the optimization scheme, introducing thus, global constraints. The overall combination of local (physical) and global (statistical) constraints yields a solution that is both efficient and robust to several artifacts such as noisy and missing data or sudden camera motions, without requiring any training data at all. Validation is done in a variety of real application domains, including articulated and non-rigid motion, both for continuous and discontinuous shapes. Our on-line methodology yields significantly more accurate reconstructions than competing sequential approaches, being even comparable to the more computationally demanding batch methods.Peer ReviewedPostprint (author's final draft

    Facial Expression Recognition

    Get PDF

    ImageNet Large Scale Visual Recognition Challenge

    Get PDF
    The ImageNet Large Scale Visual Recognition Challenge is a benchmark in object category classification and detection on hundreds of object categories and millions of images. The challenge has been run annually from 2010 to present, attracting participation from more than fifty institutions. This paper describes the creation of this benchmark dataset and the advances in object recognition that have been possible as a result. We discuss the challenges of collecting large-scale ground truth annotation, highlight key breakthroughs in categorical object recognition, provide a detailed analysis of the current state of the field of large-scale image classification and object detection, and compare the state-of-the-art computer vision accuracy with human accuracy. We conclude with lessons learned in the five years of the challenge, and propose future directions and improvements.Comment: 43 pages, 16 figures. v3 includes additional comparisons with PASCAL VOC (per-category comparisons in Table 3, distribution of localization difficulty in Fig 16), a list of queries used for obtaining object detection images (Appendix C), and some additional reference

    3차원 μ‚¬λžŒ μžμ„Έ 좔정을 μœ„ν•œ 3차원 볡원, μ•½μ§€λ„ν•™μŠ΅, μ§€λ„ν•™μŠ΅ 방법

    Get PDF
    ν•™μœ„λ…Όλ¬Έ (박사)-- μ„œμšΈλŒ€ν•™κ΅ λŒ€ν•™μ› : μœ΅ν•©κ³Όν•™κΈ°μˆ λŒ€ν•™μ› μœ΅ν•©κ³Όν•™λΆ€(지λŠ₯ν˜•μœ΅ν•©μ‹œμŠ€ν…œμ „κ³΅), 2019. 2. κ³½λ…Έμ€€.Estimating human poses from images is one of the fundamental tasks in computer vision, which leads to lots of applications such as action recognition, human-computer interaction, and virtual reality. Especially, estimating 3D human poses from 2D inputs is a challenging problem since it is inherently under-constrained. In addition, obtaining 3D ground truth data for human poses is only possible under the limited and restricted environments. In this dissertation, 3D human pose estimation is studied in different aspects focusing on various types of the availability of the data. To this end, three different methods to retrieve 3D human poses from 2D observations or from RGB images---algorithms of 3D reconstruction, weakly-supervised learning, and supervised learning---are proposed. First, a non-rigid structure from motion (NRSfM) algorithm that reconstructs 3D structures of non-rigid objects such as human bodies from 2D observations is proposed. In the proposed framework which is named as Procrustean Regression, the 3D shapes are regularized based on their aligned shapes. We show that the cost function of the Procrustean Regression can be casted into an unconstrained problem or a problem with simple bound constraints, which can be efficiently solved by existing gradient descent solvers. This framework can be easily integrated with numerous existing models and assumptions, which makes it more practical for various real situations. The experimental results show that the proposed method gives competitive result to the state-of-the-art methods for orthographic projection with much less time complexity and memory requirement, and outperforms the existing methods for perspective projection. Second, a weakly-supervised learning method that is capable of learning 3D structures when only 2D ground truth data is available as a training set is presented. Extending the Procrustean Regression framework, we suggest Procrustean Regression Network, a learning method that trains neural networks to learn 3D structures using training data with 2D ground truths. This is the first attempt that directly integrates an NRSfM algorithm into neural network training. The cost function that contains a low-rank function is also firstly used as a cost function of neural networks that reconstructs 3D shapes. During the test phase, 3D structures of human bodies can be obtained via a feed-forward operation, which enables the framework to have much faster inference time compared to the 3D reconstruction algorithms. Third, a supervised learning method that infers 3D poses from 2D inputs using neural networks is suggested. The method exploits a relational unit which captures the relations between different body parts. In the method, each pair of different body parts generates relational features, and the average of the features from all the pairs are used for 3D pose estimation. We also suggest a dropout method called relational dropout, which can be used in relational modules to impose robustness to the occlusions. The experimental results validate that the performance of the proposed algorithm does not degrade much when missing points exist while maintaining state-of-the-art performance when every point is visible.RGB μ˜μƒμ—μ„œμ˜ μ‚¬λžŒ μžμ„Έ μΆ”μ • 방법은 컴퓨터 λΉ„μ „ λΆ„μ•Όμ—μ„œ μ€‘μš”ν•˜λ©° μ—¬λŸ¬ μ–΄ν”Œλ¦¬μΌ€μ΄μ…˜μ˜ 기본이 λ˜λŠ” κΈ°μˆ μ΄λ‹€. μ‚¬λžŒ μžμ„Έ 좔정은 λ™μž‘ 인식, 인간-컴퓨터 μƒν˜Έμž‘μš©, 가상 ν˜„μ‹€, 증강 ν˜„μ‹€ λ“± κ΄‘λ²”μœ„ν•œ λΆ„μ•Όμ—μ„œ 기반 기술둜 μ‚¬μš©λ  수 μžˆλ‹€. 특히, 2차원 μž…λ ₯μœΌλ‘œλΆ€ν„° 3차원 μ‚¬λžŒ μžμ„Έλ₯Ό μΆ”μ •ν•˜λŠ” λ¬Έμ œλŠ” 무수히 λ§Žμ€ ν•΄λ₯Ό κ°€μ§ˆ 수 μžˆλŠ” 문제이기 λ•Œλ¬Έμ— ν’€κΈ° μ–΄λ €μš΄ 문제둜 μ•Œλ €μ Έ μžˆλ‹€. λ˜ν•œ, 3차원 μ‹€μ œ λ°μ΄ν„°μ˜ μŠ΅λ“μ€ λͺ¨μ…˜μΊ‘처 μŠ€νŠœλ””μ˜€ λ“± μ œν•œλœ ν™˜κ²½ν•˜μ—μ„œλ§Œ κ°€λŠ₯ν•˜κΈ° λ•Œλ¬Έμ— 얻을 수 μžˆλŠ” λ°μ΄ν„°μ˜ 양이 ν•œμ •μ μ΄λ‹€. λ³Έ λ…Όλ¬Έμ—μ„œλŠ”, 얻을 수 μžˆλŠ” ν•™μŠ΅ λ°μ΄ν„°μ˜ μ’…λ₯˜μ— 따라 μ—¬λŸ¬ 방면으둜 3차원 μ‚¬λžŒ μžμ„Έλ₯Ό μΆ”μ •ν•˜λŠ” 방법을 μ—°κ΅¬ν•˜μ˜€λ‹€. ꡬ체적으둜, 2차원 κ΄€μΈ‘κ°’ λ˜λŠ” RGB μ˜μƒμ„ λ°”νƒ•μœΌλ‘œ 3차원 μ‚¬λžŒ μžμ„Έλ₯Ό μΆ”μ •, λ³΅μ›ν•˜λŠ” μ„Έ 가지 방법--3차원 볡원, μ•½μ§€λ„ν•™μŠ΅, μ§€λ„ν•™μŠ΅--을 μ œμ‹œν•˜μ˜€λ‹€. 첫 번째둜, μ‚¬λžŒμ˜ 신체와 같이 λΉ„μ •ν˜• 객체의 2차원 κ΄€μΈ‘κ°’μœΌλ‘œλΆ€ν„° 3차원 ꡬ쑰λ₯Ό λ³΅μ›ν•˜λŠ” λΉ„μ •ν˜• μ›€μ§μž„ 기반 ꡬ쑰 (Non-rigid structure from motion) μ•Œκ³ λ¦¬μ¦˜μ„ μ œμ•ˆν•˜μ˜€λ‹€. ν”„λ‘œν¬λ£¨μŠ€ν…ŒμŠ€ νšŒκ·€ (Procrustean regression)으둜 λͺ…λͺ…ν•œ μ œμ•ˆλœ ν”„λ ˆμž„μ›Œν¬μ—μ„œ, 3차원 ν˜•νƒœλ“€μ€ κ·Έλ“€μ˜ μ •λ ¬λœ ν˜•νƒœμ— λŒ€ν•œ ν•¨μˆ˜λ‘œ μ •κ·œν™”λœλ‹€. μ œμ•ˆλœ ν”„λ‘œν¬λ£¨μŠ€ν…ŒμŠ€ νšŒκ·€μ˜ λΉ„μš© ν•¨μˆ˜λŠ” 3차원 ν˜•νƒœ μ •λ ¬κ³Ό κ΄€λ ¨λœ μ œμ•½μ„ λΉ„μš© ν•¨μˆ˜μ— ν¬ν•¨μ‹œμΌœ 경사 ν•˜κ°•λ²•μ„ μ΄μš©ν•œ μ΅œμ ν™”κ°€ κ°€λŠ₯ν•˜λ‹€. μ œμ•ˆλœ 방법은 λ‹€μ–‘ν•œ λͺ¨λΈκ³Ό 가정을 ν¬ν•¨μ‹œν‚¬ 수 μžˆμ–΄ μ‹€μš©μ μ΄κ³  μœ μ—°ν•œ ν”„λ ˆμž„μ›Œν¬μ΄λ‹€. λ‹€μ–‘ν•œ μ‹€ν—˜μ„ 톡해 μ œμ•ˆλœ 방법은 세계 졜고 μˆ˜μ€€μ˜ 방법듀과 비ꡐ해 μœ μ‚¬ν•œ μ„±λŠ₯을 λ³΄μ΄λ©΄μ„œ, λ™μ‹œμ— μ‹œκ°„, 곡간 λ³΅μž‘λ„ λ©΄μ—μ„œ κΈ°μ‘΄ 방법에 λΉ„ν•΄ μš°μˆ˜ν•¨μ„ λ³΄μ˜€λ‹€. 두 번째둜 μ œμ•ˆλœ 방법은, 2차원 ν•™μŠ΅ λ°μ΄ν„°λ§Œ μ£Όμ–΄μ‘Œμ„ λ•Œ 2차원 μž…λ ₯μ—μ„œ 3차원 ꡬ쑰λ₯Ό λ³΅μ›ν•˜λŠ” μ•½μ§€λ„ν•™μŠ΅ 방법이닀. ν”„λ‘œν¬λ£¨μŠ€ν…ŒμŠ€ νšŒκ·€ 신경망 (Procrustean regression network)둜 λͺ…λͺ…ν•œ μ œμ•ˆλœ ν•™μŠ΅ 방법은 신경망 λ˜λŠ” μ»¨λ³Όλ£¨μ…˜ 신경망을 톡해 μ‚¬λžŒμ˜ 2차원 μžμ„Έλ‘œλΆ€ν„° 3차원 μžμ„Έλ₯Ό μΆ”μ •ν•˜λŠ” 방법을 ν•™μŠ΅ν•œλ‹€. ν”„λ‘œν¬λ£¨μŠ€ν…ŒμŠ€ νšŒκ·€μ— μ‚¬μš©λœ λΉ„μš© ν•¨μˆ˜λ₯Ό μˆ˜μ •ν•˜μ—¬ 신경망을 ν•™μŠ΅μ‹œν‚€λŠ” λ³Έ 방법은, λΉ„μ •ν˜• μ›€μ§μž„ 기반 ꡬ쑰에 μ‚¬μš©λœ λΉ„μš© ν•¨μˆ˜λ₯Ό 신경망 ν•™μŠ΅μ— μ μš©ν•œ 졜초의 μ‹œλ„μ΄λ‹€. λ˜ν•œ λΉ„μš©ν•¨μˆ˜μ— μ‚¬μš©λœ μ €κ³„μˆ˜ ν•¨μˆ˜ (low-rank function)λ₯Ό 신경망 ν•™μŠ΅μ— 처음으둜 μ‚¬μš©ν•˜μ˜€λ‹€. ν…ŒμŠ€νŠΈ 데이터에 λŒ€ν•΄μ„œ 3차원 μ‚¬λžŒ μžμ„ΈλŠ” μ‹ κ²½λ§μ˜ 전방전달(feed forward)연산에 μ˜ν•΄ μ–»μ–΄μ§€λ―€λ‘œ, 3차원 볡원 방법에 λΉ„ν•΄ 훨씬 λΉ λ₯Έ 3차원 μžμ„Έ 좔정이 κ°€λŠ₯ν•˜λ‹€. λ§ˆμ§€λ§‰μœΌλ‘œ, 신경망을 μ΄μš©ν•΄ 2차원 μž…λ ₯μœΌλ‘œλΆ€ν„° 3차원 μ‚¬λžŒ μžμ„Έλ₯Ό μΆ”μ •ν•˜λŠ” μ§€λ„ν•™μŠ΅ 방법을 μ œμ‹œν•˜μ˜€λ‹€. λ³Έ 방법은 관계 신경망 λͺ¨λ“ˆ(relational modules)을 ν™œμš©ν•΄ μ‹ μ²΄μ˜ λ‹€λ₯Έ λΆ€μœ„κ°„μ˜ 관계λ₯Ό ν•™μŠ΅ν•œλ‹€. μ„œλ‘œ λ‹€λ₯Έ λΆ€μœ„μ˜ μŒλ§ˆλ‹€ 관계 νŠΉμ§•μ„ μΆ”μΆœν•΄ λͺ¨λ“  관계 νŠΉμ§•μ˜ 평균을 μ΅œμ’… 3차원 μžμ„Έ 좔정에 μ‚¬μš©ν•œλ‹€. λ˜ν•œ κ΄€κ³„ν˜• λ“œλžμ•„μ›ƒ(relational dropout)μ΄λΌλŠ” μƒˆλ‘œμš΄ ν•™μŠ΅ 방법을 μ œμ‹œν•΄ 가렀짐에 μ˜ν•΄ λ‚˜νƒ€λ‚˜μ§€ μ•Šμ€ 2차원 관츑값이 μžˆλŠ” μƒν™©μ—μ„œ, κ°•μΈν•˜κ²Œ λ™μž‘ν•  수 μžˆλŠ” 3차원 μžμ„Έ μΆ”μ • 방법을 μ œμ‹œν•˜μ˜€λ‹€. μ‹€ν—˜μ„ 톡해 ν•΄λ‹Ή 방법이 2차원 관츑값이 μΌλΆ€λ§Œ 주어진 μƒν™©μ—μ„œλ„ 큰 μ„±λŠ₯ ν•˜λ½μ΄ 없이 효과적으둜 3차원 μžμ„Έλ₯Ό 좔정함을 증λͺ…ν•˜μ˜€λ‹€.Abstract i Contents iii List of Tables vi List of Figures viii 1 Introduction 1 1.1 Problem Definition . . . . . . . . . . . . . . . . . . . . . . . . 4 1.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.3 Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 1.4 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 1.4.1 3D Reconstruction of Human Bodies . . . . . . . . . . 9 1.4.2 Weakly-Supervised Learning for 3D HPE . . . . . . . . 11 1.4.3 Supervised Learning for 3D HPE . . . . . . . . . . . . 11 1.5 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2 Related Works 14 2.1 2D Human Pose Estimation . . . . . . . . . . . . . . . . . . . . 14 2.2 3D Human Pose Estimation . . . . . . . . . . . . . . . . . . . . 16 2.3 Non-rigid Structure from Motion . . . . . . . . . . . . . . . . . 18 2.4 Learning to Reconstruct 3D Structures via Neural Networks . . 23 3 3D Reconstruction of Human Bodies via Procrustean Regression 25 3.1 Formalization of NRSfM . . . . . . . . . . . . . . . . . . . . . 27 3.2 Procrustean Regression . . . . . . . . . . . . . . . . . . . . . . 28 3.2.1 The Cost Function of Procrustean Regression . . . . . . 29 3.2.2 Derivatives of the Cost Function . . . . . . . . . . . . . 32 3.2.3 Example Functions for f and g . . . . . . . . . . . . . . 38 3.2.4 Handling Missing Points . . . . . . . . . . . . . . . . . 43 3.2.5 Optimization . . . . . . . . . . . . . . . . . . . . . . . 44 3.2.6 Initialization . . . . . . . . . . . . . . . . . . . . . . . 44 3.3 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . 45 3.3.1 Orthographic Projection . . . . . . . . . . . . . . . . . 46 3.3.2 Perspective Projection . . . . . . . . . . . . . . . . . . 56 3.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 3.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 4 Weakly-Supervised Learning of 3D Human Pose via Procrustean Regression Networks 69 4.1 The Cost Function for Procrustean Regression Network . . . . . 70 4.2 Choosing f and g for Procrustean Regression Network . . . . . 74 4.3 Implementation Details . . . . . . . . . . . . . . . . . . . . . . 75 4.4 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . 77 4.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 5 Supervised Learning of 3D Human Pose via Relational Networks 86 5.1 Relational Networks . . . . . . . . . . . . . . . . . . . . . . . 88 5.2 Relational Networks for 3D HPE . . . . . . . . . . . . . . . . . 88 5.3 Extensions to Multi-Frame Inputs . . . . . . . . . . . . . . . . 91 5.4 Relational Dropout . . . . . . . . . . . . . . . . . . . . . . . . 93 5.5 Implementation Details . . . . . . . . . . . . . . . . . . . . . . 94 5.6 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . 95 5.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 6 Concluding Remarks 105 6.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 6.2 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 6.3 Future Directions . . . . . . . . . . . . . . . . . . . . . . . . . 108 Abstract (In Korean) 128Docto
    • …
    corecore