798 research outputs found

    Complexity Analysis and Efficient Measurement Selection Primitives for High-Rate Graph SLAM

    Get PDF
    Sparsity has been widely recognized as crucial for efficient optimization in graph-based SLAM. Because the sparsity and structure of the SLAM graph reflect the set of incorporated measurements, many methods for sparsification have been proposed in hopes of reducing computation. These methods often focus narrowly on reducing edge count without regard for structure at a global level. Such structurally-naive techniques can fail to produce significant computational savings, even after aggressive pruning. In contrast, simple heuristics such as measurement decimation and keyframing are known empirically to produce significant computation reductions. To demonstrate why, we propose a quantitative metric called elimination complexity (EC) that bridges the existing analytic gap between graph structure and computation. EC quantifies the complexity of the primary computational bottleneck: the factorization step of a Gauss-Newton iteration. Using this metric, we show rigorously that decimation and keyframing impose favorable global structures and therefore achieve computation reductions on the order of r2/9r^2/9 and r3r^3, respectively, where rr is the pruning rate. We additionally present numerical results showing EC provides a good approximation of computation in both batch and incremental (iSAM2) optimization and demonstrate that pruning methods promoting globally-efficient structure outperform those that do not.Comment: Pre-print accepted to ICRA 201

    Affine Approximation for Direct Batch Recovery of Euclidean Motion From Sparse Data

    Get PDF
    We present a batch method for recovering Euclidian camera motion from sparse image data. The main purpose of the algorithm is to recover the motion parameters using as much of the available information and as few computational steps as possible. The algorithmthus places itself in the gap between factorisation schemes, which make use of all available information in the initial recovery step, and sequential approaches which are able to handle sparseness in the image data. Euclidian camera matrices are approximated via the affine camera model, thus making the recovery direct in the sense that no intermediate projective reconstruction is made. Using a little known closure constraint, the FA-closure, we are able to formulate the camera coefficients linearly in the entries of the affine fundamental matrices. The novelty of the presented work is twofold: Firstly the presented formulation allows for a particularly good conditioning of the estimation of the initial motion parameters but also for an unprecedented diversity in the choice of possible regularisation terms. Secondly, the new autocalibration scheme presented here is in practice guaranteed to yield a Least Squares Estimate of the calibration parameters. As a bi-product, the affine camera model is rehabilitated as a useful model for most cameras and scene configurations, e.g. wide angle lenses observing a scene at close range. Experiments on real and synthetic data demonstrate the ability to reconstruct scenes which are very problematic for previous structure from motion techniques due to local ambiguities and error accumulation

    Scaling Up Large-scale Sparse Learning and Its Application to Medical Imaging

    Get PDF
    abstract: Large-scale โ„“1\ell_1-regularized loss minimization problems arise in high-dimensional applications such as compressed sensing and high-dimensional supervised learning, including classification and regression problems. In many applications, it remains challenging to apply the sparse learning model to large-scale problems that have massive data samples with high-dimensional features. One popular and promising strategy is to scaling up the optimization problem in parallel. Parallel solvers run multiple cores on a shared memory system or a distributed environment to speed up the computation, while the practical usage is limited by the huge dimension in the feature space and synchronization problems. In this dissertation, I carry out the research along the direction with particular focuses on scaling up the optimization of sparse learning for supervised and unsupervised learning problems. For the supervised learning, I firstly propose an asynchronous parallel solver to optimize the large-scale sparse learning model in a multithreading environment. Moreover, I propose a distributed framework to conduct the learning process when the dataset is distributed stored among different machines. Then the proposed model is further extended to the studies of risk genetic factors for Alzheimer's Disease (AD) among different research institutions, integrating a group feature selection framework to rank the top risk SNPs for AD. For the unsupervised learning problem, I propose a highly efficient solver, termed Stochastic Coordinate Coding (SCC), scaling up the optimization of dictionary learning and sparse coding problems. The common issue for the medical imaging research is that the longitudinal features of patients among different time points are beneficial to study together. To further improve the dictionary learning model, I propose a multi-task dictionary learning method, learning the different task simultaneously and utilizing shared and individual dictionary to encode both consistent and changing imaging features.Dissertation/ThesisDoctoral Dissertation Computer Science 201

    Real Time Sequential Non Rigid Structure from motion using a single camera

    Get PDF
    En la actualidad las aplicaciones que basan su funcionamiento en una correcta localizaciรณn y reconstrucciรณn dentro de un entorno real en 3D han experimentado un gran interรฉs en los รบltimos aรฑos, tanto por la comunidad investigadora como por la industrial. Estas aplicaciones varรญan desde la realidad aumentada, la robรณtica, la simulaciรณn, los videojuegos, etc. Dependiendo de la aplicaciรณn y del nivel de detalle de la reconstrucciรณn, se emplean diversos dispositivos, algunos especรญficos, mรกs complejos y caros como las cรกmaras estรฉreo, cรกmara y profundidad (RGBD) con Luz estructurada y Time of Flight (ToF), asรญ como lรกser y otros mรกs avanzados. Para aplicaciones sencillas es suficiente con dispositivos de uso comรบn, como los smartphones, en los que aplicando tรฉcnicas de visiรณn artificial, se pueden obtener modelos 3D del entorno para, en el caso de la realidad aumentada, mostrar informaciรณn aumentada en la ubicaciรณn seleccionada.En robรณtica, la localizaciรณn y generaciรณn simultรกneas de un mapa del entorno en 3D es una tarea fundamental para conseguir la navegaciรณn autรณnoma. Este problema se conoce en el estado del arte como Simultaneous Localization And Mapping (SLAM) o Structure from Motion (SfM). Para la aplicaciรณn de estas tรฉcnicas, el objeto no ha de cambiar su forma a lo largo del tiempo. La reconstrucciรณn es unรญvoca salvo factor de escala en captura monocular sin referencia. Si la condiciรณn de rigidez no se cumple, es porque la forma del objeto cambia a lo largo del tiempo. El problema serรญa equivalente a realizar una reconstrucciรณn por fotograma, lo cual no se puede hacer de manera directa, puesto que diferentes formas, combinadas con diferentes poses de cรกmara pueden dar proyecciones similares. Es por esto que el campo de la reconstrucciรณn de objetos deformables es todavรญa un รกrea en desarrollo. Los mรฉtodos de SfM se han adaptado aplicando modelos fรญsicos, restricciones temporales, espaciales, geomรฉtricas o de otros tipos para reducir la ambigรผedad en las soluciones, naciendo asรญ las tรฉcnicas conocidas como Non-Rigid SfM (NRSfM).En esta tesis se propone partir de una tรฉcnica de reconstrucciรณn rรญgida bien conocida en el estado del arte como es PTAM (Parallel Tracking and Mapping) y adaptarla para incluir tรฉcnicas de NRSfM, basadas en modelo de bases lineales para estimar las deformaciones del objeto modelado dinรกmicamente y aplicar restricciones temporales y espaciales para mejorar las reconstrucciones, ademรกs de ir adaptรกndose a cambios de deformaciรณn que se presenten en la secuencia. Para ello, hay que realizar cambios de manera que cada uno de sus hilos de ejecuciรณn procesen datos no rรญgidos.El hilo encargado del seguimiento ya realizaba seguimiento basado en un mapa de puntos 3D, proporcionado a priori. La modificaciรณn mรกs importante aquรญ es la integraciรณn de un modelo de deformaciรณn lineal para que se realice el cรกlculo de la deformaciรณn del objeto en tiempo real, asumiendo fijas las formas bรกsicas de deformaciรณn. El cรกlculo de la pose de la cรกmara estรก basado en el sistema de estimaciรณn rรญgido, por lo que la estimaciรณn de pose y coeficientes de deformaciรณn se hace de manera alternada usando el algoritmo E-M (Expectation-Maximization). Tambiรฉn, se imponen restricciones temporales y de forma para restringir las ambigรผedades inherentes en las soluciones y mejorar la calidad de la estimaciรณn 3D.Respecto al hilo que gestiona el mapa, se actualiza en funciรณn del tiempo para que sea capaz de mejorar las bases de deformaciรณn cuando รฉstas no son capaces de explicar las formas que se ven en las imรกgenes actuales. Para ello, se sustituye la optimizaciรณn de modelo rรญgido incluida en este hilo por un mรฉtodo de procesamiento exhaustivo NRSfM, para mejorar las bases acorde a las imรกgenes con gran error de reconstrucciรณn desde el hilo de seguimiento. Con esto, el modelo se consigue adaptar a nuevas deformaciones, permitiendo al sistema evolucionar y ser estable a largo plazo.A diferencia de una gran parte de los mรฉtodos de la literatura, el sistema propuesto aborda el problema de la proyecciรณn perspectiva de forma nativa, minimizando los problemas de ambigรผedad y de distancia al objeto existente en la proyecciรณn ortogrรกfica. El sistema propuesto maneja centenares de puntos y estรก preparado para cumplir con restricciones de tiempo real para su aplicaciรณn en sistemas con recursos hardware limitados

    Robust and large-scale quasiconvex programming in structure-from-motion

    Get PDF
    Structure-from-Motion (SfM) is a cornerstone of computer vision. Briefly speaking, SfM is the task of simultaneously estimating the poses of the cameras behind a set of images of a scene, and the 3D coordinates of the points in the scene. Often, the optimisation problems that underpin SfM do not have closed-form solutions, and finding solutions via numerical schemes is necessary. An objective function, which measures the discrepancy of a geometric object (e.g., camera poses, rotations, 3D coordi- nates) with a set of image measurements, is to be minimised. Each image measurement gives rise to an error function. For example, the reprojection error, which measures the distance between an observed image point and the projection of a 3D point onto the image, is a commonly used error function. An influential optimisation paradigm in SfM is the โ„“โ‚€โ‚€ paradigm, where the objective function takes the form of the maximum of all individual error functions (e.g. individual reprojection errors of scene points). The benefit of the โ„“โ‚€โ‚€ paradigm is that the objective function of many SfM optimisation problems become quasiconvex, hence there is a unique minimum in the objective function. The task of formulating and minimising quasiconvex objective functions is called quasiconvex programming. Although tremendous progress in SfM techniques under the โ„“โ‚€โ‚€ paradigm has been made, there are still unsatisfactorily solved problems, specifically, problems associated with large-scale input data and outliers in the data. This thesis describes novel techniques to tackle these problems. A major weakness of the โ„“โ‚€โ‚€ paradigm is its susceptibility to outliers. This thesis improves the robustness of โ„“โ‚€โ‚€ solutions against outliers by employing the least median of squares (LMS) criterion, which amounts to minimising the median error. In the context of triangulation, this thesis proposes a locally convergent robust algorithm underpinned by a novel quasiconvex plane sweep technique. Imposing the LMS criterion achieves significant outlier tolerance, and, at the same time, some properties of quasiconvexity greatly simplify the process of solving the LMS problem. Approximation is a commonly used technique to tackle large-scale input data. This thesis introduces the coreset technique to quasiconvex programming problems. The coreset technique aims find a representative subset of the input data, such that solving the same problem on the subset yields a solution that is within known bound of the optimal solution on the complete input set. In particular, this thesis develops a coreset approximate algorithm to handle large-scale triangulation tasks. Another technique to handle large-scale input data is to break the optimisation into multiple smaller sub-problems. Such a decomposition usually speeds up the overall optimisation process, and alleviates the limitation on memory. This thesis develops a large-scale optimisation algorithm for the known rotation problem (KRot). The proposed method decomposes the original quasiconvex programming problem with potentially hundreds of thousands of parameters into multiple sub-problems with only three parameters each. An efficient solver based on a novel minimum enclosing ball technique is proposed to solve the sub-problems.Thesis (Ph.D.) (Research by Publication) -- University of Adelaide, School of Computer Science, 201

    Robust Estimation of Motion Parameters and Scene Geometry : Minimal Solvers and Convexification of Regularisers for Low-Rank Approximation

    Get PDF
    In the dawning age of autonomous driving, accurate and robust tracking of vehicles is a quintessential part. This is inextricably linked with the problem of Simultaneous Localisation and Mapping (SLAM), in which one tries to determine the position of a vehicle relative to its surroundings without prior knowledge of them. The more you know about the object you wish to trackโ€”through sensors or mechanical constructionโ€”the more likely you are to get good positioning estimates. In the first part of this thesis, we explore new ways of improving positioning for vehicles travelling on a planar surface. This is done in several different ways: first, we generalise the work done for monocular vision to include two cameras, we propose ways of speeding up the estimation time with polynomial solvers, and we develop an auto-calibration method to cope with radially distorted images, without enforcing pre-calibration procedures.We continue to investigate the case of constrained motionโ€”this time using auxiliary data from inertial measurement units (IMUs) to improve positioning of unmanned aerial vehicles (UAVs). The proposed methods improve the state-of-the-art for partially calibrated cases (with unknown focal length) for indoor navigation. Furthermore, we propose the first-ever real-time compatible minimal solver for simultaneous estimation of radial distortion profile, focal length, and motion parameters while utilising the IMU data.In the third and final part of this thesis, we develop a bilinear framework for low-rank regularisation, with global optimality guarantees under certain conditions. We also show equivalence between the linear and the bilinear framework, in the sense that the objectives are equal. This enables users of alternating direction method of multipliers (ADMM)โ€”or other subgradient or splitting methodsโ€”to transition to the new framework, while being able to enjoy the benefits of second order methods. Furthermore, we propose a novel regulariser fusing two popular methods. This way we are able to combine the best of two worlds by encouraging bias reduction while enforcing low-rank solutions

    3์ฐจ์› ์‚ฌ๋žŒ ์ž์„ธ ์ถ”์ •์„ ์œ„ํ•œ 3์ฐจ์› ๋ณต์›, ์•ฝ์ง€๋„ํ•™์Šต, ์ง€๋„ํ•™์Šต ๋ฐฉ๋ฒ•

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ (๋ฐ•์‚ฌ)-- ์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› : ์œตํ•ฉ๊ณผํ•™๊ธฐ์ˆ ๋Œ€ํ•™์› ์œตํ•ฉ๊ณผํ•™๋ถ€(์ง€๋Šฅํ˜•์œตํ•ฉ์‹œ์Šคํ…œ์ „๊ณต), 2019. 2. ๊ณฝ๋…ธ์ค€.Estimating human poses from images is one of the fundamental tasks in computer vision, which leads to lots of applications such as action recognition, human-computer interaction, and virtual reality. Especially, estimating 3D human poses from 2D inputs is a challenging problem since it is inherently under-constrained. In addition, obtaining 3D ground truth data for human poses is only possible under the limited and restricted environments. In this dissertation, 3D human pose estimation is studied in different aspects focusing on various types of the availability of the data. To this end, three different methods to retrieve 3D human poses from 2D observations or from RGB images---algorithms of 3D reconstruction, weakly-supervised learning, and supervised learning---are proposed. First, a non-rigid structure from motion (NRSfM) algorithm that reconstructs 3D structures of non-rigid objects such as human bodies from 2D observations is proposed. In the proposed framework which is named as Procrustean Regression, the 3D shapes are regularized based on their aligned shapes. We show that the cost function of the Procrustean Regression can be casted into an unconstrained problem or a problem with simple bound constraints, which can be efficiently solved by existing gradient descent solvers. This framework can be easily integrated with numerous existing models and assumptions, which makes it more practical for various real situations. The experimental results show that the proposed method gives competitive result to the state-of-the-art methods for orthographic projection with much less time complexity and memory requirement, and outperforms the existing methods for perspective projection. Second, a weakly-supervised learning method that is capable of learning 3D structures when only 2D ground truth data is available as a training set is presented. Extending the Procrustean Regression framework, we suggest Procrustean Regression Network, a learning method that trains neural networks to learn 3D structures using training data with 2D ground truths. This is the first attempt that directly integrates an NRSfM algorithm into neural network training. The cost function that contains a low-rank function is also firstly used as a cost function of neural networks that reconstructs 3D shapes. During the test phase, 3D structures of human bodies can be obtained via a feed-forward operation, which enables the framework to have much faster inference time compared to the 3D reconstruction algorithms. Third, a supervised learning method that infers 3D poses from 2D inputs using neural networks is suggested. The method exploits a relational unit which captures the relations between different body parts. In the method, each pair of different body parts generates relational features, and the average of the features from all the pairs are used for 3D pose estimation. We also suggest a dropout method called relational dropout, which can be used in relational modules to impose robustness to the occlusions. The experimental results validate that the performance of the proposed algorithm does not degrade much when missing points exist while maintaining state-of-the-art performance when every point is visible.RGB ์˜์ƒ์—์„œ์˜ ์‚ฌ๋žŒ ์ž์„ธ ์ถ”์ • ๋ฐฉ๋ฒ•์€ ์ปดํ“จํ„ฐ ๋น„์ „ ๋ถ„์•ผ์—์„œ ์ค‘์š”ํ•˜๋ฉฐ ์—ฌ๋Ÿฌ ์–ดํ”Œ๋ฆฌ์ผ€์ด์…˜์˜ ๊ธฐ๋ณธ์ด ๋˜๋Š” ๊ธฐ์ˆ ์ด๋‹ค. ์‚ฌ๋žŒ ์ž์„ธ ์ถ”์ •์€ ๋™์ž‘ ์ธ์‹, ์ธ๊ฐ„-์ปดํ“จํ„ฐ ์ƒํ˜ธ์ž‘์šฉ, ๊ฐ€์ƒ ํ˜„์‹ค, ์ฆ๊ฐ• ํ˜„์‹ค ๋“ฑ ๊ด‘๋ฒ”์œ„ํ•œ ๋ถ„์•ผ์—์„œ ๊ธฐ๋ฐ˜ ๊ธฐ์ˆ ๋กœ ์‚ฌ์šฉ๋  ์ˆ˜ ์žˆ๋‹ค. ํŠนํžˆ, 2์ฐจ์› ์ž…๋ ฅ์œผ๋กœ๋ถ€ํ„ฐ 3์ฐจ์› ์‚ฌ๋žŒ ์ž์„ธ๋ฅผ ์ถ”์ •ํ•˜๋Š” ๋ฌธ์ œ๋Š” ๋ฌด์ˆ˜ํžˆ ๋งŽ์€ ํ•ด๋ฅผ ๊ฐ€์งˆ ์ˆ˜ ์žˆ๋Š” ๋ฌธ์ œ์ด๊ธฐ ๋•Œ๋ฌธ์— ํ’€๊ธฐ ์–ด๋ ค์šด ๋ฌธ์ œ๋กœ ์•Œ๋ ค์ ธ ์žˆ๋‹ค. ๋˜ํ•œ, 3์ฐจ์› ์‹ค์ œ ๋ฐ์ดํ„ฐ์˜ ์Šต๋“์€ ๋ชจ์…˜์บก์ฒ˜ ์ŠคํŠœ๋””์˜ค ๋“ฑ ์ œํ•œ๋œ ํ™˜๊ฒฝํ•˜์—์„œ๋งŒ ๊ฐ€๋Šฅํ•˜๊ธฐ ๋•Œ๋ฌธ์— ์–ป์„ ์ˆ˜ ์žˆ๋Š” ๋ฐ์ดํ„ฐ์˜ ์–‘์ด ํ•œ์ •์ ์ด๋‹ค. ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š”, ์–ป์„ ์ˆ˜ ์žˆ๋Š” ํ•™์Šต ๋ฐ์ดํ„ฐ์˜ ์ข…๋ฅ˜์— ๋”ฐ๋ผ ์—ฌ๋Ÿฌ ๋ฐฉ๋ฉด์œผ๋กœ 3์ฐจ์› ์‚ฌ๋žŒ ์ž์„ธ๋ฅผ ์ถ”์ •ํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์—ฐ๊ตฌํ•˜์˜€๋‹ค. ๊ตฌ์ฒด์ ์œผ๋กœ, 2์ฐจ์› ๊ด€์ธก๊ฐ’ ๋˜๋Š” RGB ์˜์ƒ์„ ๋ฐ”ํƒ•์œผ๋กœ 3์ฐจ์› ์‚ฌ๋žŒ ์ž์„ธ๋ฅผ ์ถ”์ •, ๋ณต์›ํ•˜๋Š” ์„ธ ๊ฐ€์ง€ ๋ฐฉ๋ฒ•--3์ฐจ์› ๋ณต์›, ์•ฝ์ง€๋„ํ•™์Šต, ์ง€๋„ํ•™์Šต--์„ ์ œ์‹œํ•˜์˜€๋‹ค. ์ฒซ ๋ฒˆ์งธ๋กœ, ์‚ฌ๋žŒ์˜ ์‹ ์ฒด์™€ ๊ฐ™์ด ๋น„์ •ํ˜• ๊ฐ์ฒด์˜ 2์ฐจ์› ๊ด€์ธก๊ฐ’์œผ๋กœ๋ถ€ํ„ฐ 3์ฐจ์› ๊ตฌ์กฐ๋ฅผ ๋ณต์›ํ•˜๋Š” ๋น„์ •ํ˜• ์›€์ง์ž„ ๊ธฐ๋ฐ˜ ๊ตฌ์กฐ (Non-rigid structure from motion) ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์ œ์•ˆํ•˜์˜€๋‹ค. ํ”„๋กœํฌ๋ฃจ์Šคํ…Œ์Šค ํšŒ๊ท€ (Procrustean regression)์œผ๋กœ ๋ช…๋ช…ํ•œ ์ œ์•ˆ๋œ ํ”„๋ ˆ์ž„์›Œํฌ์—์„œ, 3์ฐจ์› ํ˜•ํƒœ๋“ค์€ ๊ทธ๋“ค์˜ ์ •๋ ฌ๋œ ํ˜•ํƒœ์— ๋Œ€ํ•œ ํ•จ์ˆ˜๋กœ ์ •๊ทœํ™”๋œ๋‹ค. ์ œ์•ˆ๋œ ํ”„๋กœํฌ๋ฃจ์Šคํ…Œ์Šค ํšŒ๊ท€์˜ ๋น„์šฉ ํ•จ์ˆ˜๋Š” 3์ฐจ์› ํ˜•ํƒœ ์ •๋ ฌ๊ณผ ๊ด€๋ จ๋œ ์ œ์•ฝ์„ ๋น„์šฉ ํ•จ์ˆ˜์— ํฌํ•จ์‹œ์ผœ ๊ฒฝ์‚ฌ ํ•˜๊ฐ•๋ฒ•์„ ์ด์šฉํ•œ ์ตœ์ ํ™”๊ฐ€ ๊ฐ€๋Šฅํ•˜๋‹ค. ์ œ์•ˆ๋œ ๋ฐฉ๋ฒ•์€ ๋‹ค์–‘ํ•œ ๋ชจ๋ธ๊ณผ ๊ฐ€์ •์„ ํฌํ•จ์‹œํ‚ฌ ์ˆ˜ ์žˆ์–ด ์‹ค์šฉ์ ์ด๊ณ  ์œ ์—ฐํ•œ ํ”„๋ ˆ์ž„์›Œํฌ์ด๋‹ค. ๋‹ค์–‘ํ•œ ์‹คํ—˜์„ ํ†ตํ•ด ์ œ์•ˆ๋œ ๋ฐฉ๋ฒ•์€ ์„ธ๊ณ„ ์ตœ๊ณ  ์ˆ˜์ค€์˜ ๋ฐฉ๋ฒ•๋“ค๊ณผ ๋น„๊ตํ•ด ์œ ์‚ฌํ•œ ์„ฑ๋Šฅ์„ ๋ณด์ด๋ฉด์„œ, ๋™์‹œ์— ์‹œ๊ฐ„, ๊ณต๊ฐ„ ๋ณต์žก๋„ ๋ฉด์—์„œ ๊ธฐ์กด ๋ฐฉ๋ฒ•์— ๋น„ํ•ด ์šฐ์ˆ˜ํ•จ์„ ๋ณด์˜€๋‹ค. ๋‘ ๋ฒˆ์งธ๋กœ ์ œ์•ˆ๋œ ๋ฐฉ๋ฒ•์€, 2์ฐจ์› ํ•™์Šต ๋ฐ์ดํ„ฐ๋งŒ ์ฃผ์–ด์กŒ์„ ๋•Œ 2์ฐจ์› ์ž…๋ ฅ์—์„œ 3์ฐจ์› ๊ตฌ์กฐ๋ฅผ ๋ณต์›ํ•˜๋Š” ์•ฝ์ง€๋„ํ•™์Šต ๋ฐฉ๋ฒ•์ด๋‹ค. ํ”„๋กœํฌ๋ฃจ์Šคํ…Œ์Šค ํšŒ๊ท€ ์‹ ๊ฒฝ๋ง (Procrustean regression network)๋กœ ๋ช…๋ช…ํ•œ ์ œ์•ˆ๋œ ํ•™์Šต ๋ฐฉ๋ฒ•์€ ์‹ ๊ฒฝ๋ง ๋˜๋Š” ์ปจ๋ณผ๋ฃจ์…˜ ์‹ ๊ฒฝ๋ง์„ ํ†ตํ•ด ์‚ฌ๋žŒ์˜ 2์ฐจ์› ์ž์„ธ๋กœ๋ถ€ํ„ฐ 3์ฐจ์› ์ž์„ธ๋ฅผ ์ถ”์ •ํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ํ•™์Šตํ•œ๋‹ค. ํ”„๋กœํฌ๋ฃจ์Šคํ…Œ์Šค ํšŒ๊ท€์— ์‚ฌ์šฉ๋œ ๋น„์šฉ ํ•จ์ˆ˜๋ฅผ ์ˆ˜์ •ํ•˜์—ฌ ์‹ ๊ฒฝ๋ง์„ ํ•™์Šต์‹œํ‚ค๋Š” ๋ณธ ๋ฐฉ๋ฒ•์€, ๋น„์ •ํ˜• ์›€์ง์ž„ ๊ธฐ๋ฐ˜ ๊ตฌ์กฐ์— ์‚ฌ์šฉ๋œ ๋น„์šฉ ํ•จ์ˆ˜๋ฅผ ์‹ ๊ฒฝ๋ง ํ•™์Šต์— ์ ์šฉํ•œ ์ตœ์ดˆ์˜ ์‹œ๋„์ด๋‹ค. ๋˜ํ•œ ๋น„์šฉํ•จ์ˆ˜์— ์‚ฌ์šฉ๋œ ์ €๊ณ„์ˆ˜ ํ•จ์ˆ˜ (low-rank function)๋ฅผ ์‹ ๊ฒฝ๋ง ํ•™์Šต์— ์ฒ˜์Œ์œผ๋กœ ์‚ฌ์šฉํ•˜์˜€๋‹ค. ํ…Œ์ŠคํŠธ ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•ด์„œ 3์ฐจ์› ์‚ฌ๋žŒ ์ž์„ธ๋Š” ์‹ ๊ฒฝ๋ง์˜ ์ „๋ฐฉ์ „๋‹ฌ(feed forward)์—ฐ์‚ฐ์— ์˜ํ•ด ์–ป์–ด์ง€๋ฏ€๋กœ, 3์ฐจ์› ๋ณต์› ๋ฐฉ๋ฒ•์— ๋น„ํ•ด ํ›จ์”ฌ ๋น ๋ฅธ 3์ฐจ์› ์ž์„ธ ์ถ”์ •์ด ๊ฐ€๋Šฅํ•˜๋‹ค. ๋งˆ์ง€๋ง‰์œผ๋กœ, ์‹ ๊ฒฝ๋ง์„ ์ด์šฉํ•ด 2์ฐจ์› ์ž…๋ ฅ์œผ๋กœ๋ถ€ํ„ฐ 3์ฐจ์› ์‚ฌ๋žŒ ์ž์„ธ๋ฅผ ์ถ”์ •ํ•˜๋Š” ์ง€๋„ํ•™์Šต ๋ฐฉ๋ฒ•์„ ์ œ์‹œํ•˜์˜€๋‹ค. ๋ณธ ๋ฐฉ๋ฒ•์€ ๊ด€๊ณ„ ์‹ ๊ฒฝ๋ง ๋ชจ๋“ˆ(relational modules)์„ ํ™œ์šฉํ•ด ์‹ ์ฒด์˜ ๋‹ค๋ฅธ ๋ถ€์œ„๊ฐ„์˜ ๊ด€๊ณ„๋ฅผ ํ•™์Šตํ•œ๋‹ค. ์„œ๋กœ ๋‹ค๋ฅธ ๋ถ€์œ„์˜ ์Œ๋งˆ๋‹ค ๊ด€๊ณ„ ํŠน์ง•์„ ์ถ”์ถœํ•ด ๋ชจ๋“  ๊ด€๊ณ„ ํŠน์ง•์˜ ํ‰๊ท ์„ ์ตœ์ข… 3์ฐจ์› ์ž์„ธ ์ถ”์ •์— ์‚ฌ์šฉํ•œ๋‹ค. ๋˜ํ•œ ๊ด€๊ณ„ํ˜• ๋“œ๋ž์•„์›ƒ(relational dropout)์ด๋ผ๋Š” ์ƒˆ๋กœ์šด ํ•™์Šต ๋ฐฉ๋ฒ•์„ ์ œ์‹œํ•ด ๊ฐ€๋ ค์ง์— ์˜ํ•ด ๋‚˜ํƒ€๋‚˜์ง€ ์•Š์€ 2์ฐจ์› ๊ด€์ธก๊ฐ’์ด ์žˆ๋Š” ์ƒํ™ฉ์—์„œ, ๊ฐ•์ธํ•˜๊ฒŒ ๋™์ž‘ํ•  ์ˆ˜ ์žˆ๋Š” 3์ฐจ์› ์ž์„ธ ์ถ”์ • ๋ฐฉ๋ฒ•์„ ์ œ์‹œํ•˜์˜€๋‹ค. ์‹คํ—˜์„ ํ†ตํ•ด ํ•ด๋‹น ๋ฐฉ๋ฒ•์ด 2์ฐจ์› ๊ด€์ธก๊ฐ’์ด ์ผ๋ถ€๋งŒ ์ฃผ์–ด์ง„ ์ƒํ™ฉ์—์„œ๋„ ํฐ ์„ฑ๋Šฅ ํ•˜๋ฝ์ด ์—†์ด ํšจ๊ณผ์ ์œผ๋กœ 3์ฐจ์› ์ž์„ธ๋ฅผ ์ถ”์ •ํ•จ์„ ์ฆ๋ช…ํ•˜์˜€๋‹ค.Abstract i Contents iii List of Tables vi List of Figures viii 1 Introduction 1 1.1 Problem Definition . . . . . . . . . . . . . . . . . . . . . . . . 4 1.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.3 Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 1.4 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 1.4.1 3D Reconstruction of Human Bodies . . . . . . . . . . 9 1.4.2 Weakly-Supervised Learning for 3D HPE . . . . . . . . 11 1.4.3 Supervised Learning for 3D HPE . . . . . . . . . . . . 11 1.5 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2 Related Works 14 2.1 2D Human Pose Estimation . . . . . . . . . . . . . . . . . . . . 14 2.2 3D Human Pose Estimation . . . . . . . . . . . . . . . . . . . . 16 2.3 Non-rigid Structure from Motion . . . . . . . . . . . . . . . . . 18 2.4 Learning to Reconstruct 3D Structures via Neural Networks . . 23 3 3D Reconstruction of Human Bodies via Procrustean Regression 25 3.1 Formalization of NRSfM . . . . . . . . . . . . . . . . . . . . . 27 3.2 Procrustean Regression . . . . . . . . . . . . . . . . . . . . . . 28 3.2.1 The Cost Function of Procrustean Regression . . . . . . 29 3.2.2 Derivatives of the Cost Function . . . . . . . . . . . . . 32 3.2.3 Example Functions for f and g . . . . . . . . . . . . . . 38 3.2.4 Handling Missing Points . . . . . . . . . . . . . . . . . 43 3.2.5 Optimization . . . . . . . . . . . . . . . . . . . . . . . 44 3.2.6 Initialization . . . . . . . . . . . . . . . . . . . . . . . 44 3.3 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . 45 3.3.1 Orthographic Projection . . . . . . . . . . . . . . . . . 46 3.3.2 Perspective Projection . . . . . . . . . . . . . . . . . . 56 3.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 3.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 4 Weakly-Supervised Learning of 3D Human Pose via Procrustean Regression Networks 69 4.1 The Cost Function for Procrustean Regression Network . . . . . 70 4.2 Choosing f and g for Procrustean Regression Network . . . . . 74 4.3 Implementation Details . . . . . . . . . . . . . . . . . . . . . . 75 4.4 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . 77 4.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 5 Supervised Learning of 3D Human Pose via Relational Networks 86 5.1 Relational Networks . . . . . . . . . . . . . . . . . . . . . . . 88 5.2 Relational Networks for 3D HPE . . . . . . . . . . . . . . . . . 88 5.3 Extensions to Multi-Frame Inputs . . . . . . . . . . . . . . . . 91 5.4 Relational Dropout . . . . . . . . . . . . . . . . . . . . . . . . 93 5.5 Implementation Details . . . . . . . . . . . . . . . . . . . . . . 94 5.6 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . 95 5.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 6 Concluding Remarks 105 6.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 6.2 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 6.3 Future Directions . . . . . . . . . . . . . . . . . . . . . . . . . 108 Abstract (In Korean) 128Docto
    • โ€ฆ
    corecore