798 research outputs found
Complexity Analysis and Efficient Measurement Selection Primitives for High-Rate Graph SLAM
Sparsity has been widely recognized as crucial for efficient optimization in
graph-based SLAM. Because the sparsity and structure of the SLAM graph reflect
the set of incorporated measurements, many methods for sparsification have been
proposed in hopes of reducing computation. These methods often focus narrowly
on reducing edge count without regard for structure at a global level. Such
structurally-naive techniques can fail to produce significant computational
savings, even after aggressive pruning. In contrast, simple heuristics such as
measurement decimation and keyframing are known empirically to produce
significant computation reductions. To demonstrate why, we propose a
quantitative metric called elimination complexity (EC) that bridges the
existing analytic gap between graph structure and computation. EC quantifies
the complexity of the primary computational bottleneck: the factorization step
of a Gauss-Newton iteration. Using this metric, we show rigorously that
decimation and keyframing impose favorable global structures and therefore
achieve computation reductions on the order of and , respectively,
where is the pruning rate. We additionally present numerical results
showing EC provides a good approximation of computation in both batch and
incremental (iSAM2) optimization and demonstrate that pruning methods promoting
globally-efficient structure outperform those that do not.Comment: Pre-print accepted to ICRA 201
Affine Approximation for Direct Batch Recovery of Euclidean Motion From Sparse Data
We present a batch method for recovering Euclidian camera motion from sparse image data. The main purpose of the algorithm is to recover the motion parameters using as much of the available information and as few computational steps as possible. The algorithmthus places itself in the gap between factorisation schemes, which make use of all available information in the initial recovery step, and sequential approaches which are able to handle sparseness in the image data. Euclidian camera matrices are approximated via the affine camera model, thus making the recovery direct in the sense that no intermediate projective reconstruction is made. Using a little known closure constraint, the FA-closure, we are able to formulate the camera coefficients linearly in the entries of the affine fundamental matrices. The novelty of the presented work is twofold: Firstly the presented formulation allows for a particularly good conditioning of the estimation of the initial motion parameters but also for an unprecedented diversity in the choice of possible regularisation terms. Secondly, the new autocalibration scheme presented here is in practice guaranteed to yield a Least Squares Estimate of the calibration parameters. As a bi-product, the affine camera model is rehabilitated as a useful model for most cameras and scene configurations, e.g. wide angle lenses observing a scene at close range. Experiments on real and synthetic data demonstrate the ability to reconstruct scenes which are very problematic for previous structure from motion techniques due to local ambiguities and error accumulation
Scaling Up Large-scale Sparse Learning and Its Application to Medical Imaging
abstract: Large-scale -regularized loss minimization problems arise in high-dimensional applications such as compressed sensing and high-dimensional supervised learning, including classification and regression problems. In many applications, it remains challenging to apply the sparse learning model to large-scale problems that have massive data samples with high-dimensional features. One popular and promising strategy is to scaling up the optimization problem in parallel. Parallel solvers run multiple cores on a shared memory system or a distributed environment to speed up the computation, while the practical usage is limited by the huge dimension in the feature space and synchronization problems.
In this dissertation, I carry out the research along the direction with particular focuses on scaling up the optimization of sparse learning for supervised and unsupervised learning problems. For the supervised learning, I firstly propose an asynchronous parallel solver to optimize the large-scale sparse learning model in a multithreading environment. Moreover, I propose a distributed framework to conduct the learning process when the dataset is distributed stored among different machines. Then the proposed model is further extended to the studies of risk genetic factors for Alzheimer's Disease (AD) among different research institutions, integrating a group feature selection framework to rank the top risk SNPs for AD. For the unsupervised learning problem, I propose a highly efficient solver, termed Stochastic Coordinate Coding (SCC), scaling up the optimization of dictionary learning and sparse coding problems. The common issue for the medical imaging research is that the longitudinal features of patients among different time points are beneficial to study together. To further improve the dictionary learning model, I propose a multi-task dictionary learning method, learning the different task simultaneously and utilizing shared and individual dictionary to encode both consistent and changing imaging features.Dissertation/ThesisDoctoral Dissertation Computer Science 201
Real Time Sequential Non Rigid Structure from motion using a single camera
En la actualidad las aplicaciones que basan su funcionamiento en una correcta localizaciรณn y reconstrucciรณn dentro de un entorno real en 3D han experimentado un gran interรฉs en los รบltimos aรฑos, tanto por la comunidad investigadora como por la industrial. Estas aplicaciones varรญan desde la realidad aumentada, la robรณtica, la simulaciรณn, los videojuegos, etc. Dependiendo de la aplicaciรณn y del nivel de detalle de la reconstrucciรณn, se emplean diversos dispositivos, algunos especรญficos, mรกs complejos y caros como las cรกmaras estรฉreo, cรกmara y profundidad (RGBD) con Luz estructurada y Time of Flight (ToF), asรญ como lรกser y otros mรกs avanzados. Para aplicaciones sencillas es suficiente con dispositivos de uso comรบn, como los smartphones, en los que aplicando tรฉcnicas de visiรณn artificial, se pueden obtener modelos 3D del entorno para, en el caso de la realidad aumentada, mostrar informaciรณn aumentada en la ubicaciรณn seleccionada.En robรณtica, la localizaciรณn y generaciรณn simultรกneas de un mapa del entorno en 3D es una tarea fundamental para conseguir la navegaciรณn autรณnoma. Este problema se conoce en el estado del arte como Simultaneous Localization And Mapping (SLAM) o Structure from Motion (SfM). Para la aplicaciรณn de estas tรฉcnicas, el objeto no ha de cambiar su forma a lo largo del tiempo. La reconstrucciรณn es unรญvoca salvo factor de escala en captura monocular sin referencia. Si la condiciรณn de rigidez no se cumple, es porque la forma del objeto cambia a lo largo del tiempo. El problema serรญa equivalente a realizar una reconstrucciรณn por fotograma, lo cual no se puede hacer de manera directa, puesto que diferentes formas, combinadas con diferentes poses de cรกmara pueden dar proyecciones similares. Es por esto que el campo de la reconstrucciรณn de objetos deformables es todavรญa un รกrea en desarrollo. Los mรฉtodos de SfM se han adaptado aplicando modelos fรญsicos, restricciones temporales, espaciales, geomรฉtricas o de otros tipos para reducir la ambigรผedad en las soluciones, naciendo asรญ las tรฉcnicas conocidas como Non-Rigid SfM (NRSfM).En esta tesis se propone partir de una tรฉcnica de reconstrucciรณn rรญgida bien conocida en el estado del arte como es PTAM (Parallel Tracking and Mapping) y adaptarla para incluir tรฉcnicas de NRSfM, basadas en modelo de bases lineales para estimar las deformaciones del objeto modelado dinรกmicamente y aplicar restricciones temporales y espaciales para mejorar las reconstrucciones, ademรกs de ir adaptรกndose a cambios de deformaciรณn que se presenten en la secuencia. Para ello, hay que realizar cambios de manera que cada uno de sus hilos de ejecuciรณn procesen datos no rรญgidos.El hilo encargado del seguimiento ya realizaba seguimiento basado en un mapa de puntos 3D, proporcionado a priori. La modificaciรณn mรกs importante aquรญ es la integraciรณn de un modelo de deformaciรณn lineal para que se realice el cรกlculo de la deformaciรณn del objeto en tiempo real, asumiendo fijas las formas bรกsicas de deformaciรณn. El cรกlculo de la pose de la cรกmara estรก basado en el sistema de estimaciรณn rรญgido, por lo que la estimaciรณn de pose y coeficientes de deformaciรณn se hace de manera alternada usando el algoritmo E-M (Expectation-Maximization). Tambiรฉn, se imponen restricciones temporales y de forma para restringir las ambigรผedades inherentes en las soluciones y mejorar la calidad de la estimaciรณn 3D.Respecto al hilo que gestiona el mapa, se actualiza en funciรณn del tiempo para que sea capaz de mejorar las bases de deformaciรณn cuando รฉstas no son capaces de explicar las formas que se ven en las imรกgenes actuales. Para ello, se sustituye la optimizaciรณn de modelo rรญgido incluida en este hilo por un mรฉtodo de procesamiento exhaustivo NRSfM, para mejorar las bases acorde a las imรกgenes con gran error de reconstrucciรณn desde el hilo de seguimiento. Con esto, el modelo se consigue adaptar a nuevas deformaciones, permitiendo al sistema evolucionar y ser estable a largo plazo.A diferencia de una gran parte de los mรฉtodos de la literatura, el sistema propuesto aborda el problema de la proyecciรณn perspectiva de forma nativa, minimizando los problemas de ambigรผedad y de distancia al objeto existente en la proyecciรณn ortogrรกfica. El sistema propuesto maneja centenares de puntos y estรก preparado para cumplir con restricciones de tiempo real para su aplicaciรณn en sistemas con recursos hardware limitados
Robust and large-scale quasiconvex programming in structure-from-motion
Structure-from-Motion (SfM) is a cornerstone of computer vision. Briefly speaking,
SfM is the task of simultaneously estimating the poses of the cameras behind a set of images of a
scene, and the 3D coordinates of the points in the scene.
Often, the optimisation problems that underpin SfM do not have closed-form solutions, and finding
solutions via numerical schemes is necessary. An objective function, which measures the discrepancy
of a geometric object (e.g., camera poses, rotations, 3D coordi- nates) with a set of image
measurements, is to be minimised. Each image measurement gives rise to an error function. For
example, the reprojection error, which measures the distance between an observed image point and
the projection of a 3D point onto the image, is a commonly used error function.
An influential optimisation paradigm in SfM is the โโโ paradigm, where the objective function takes
the form of the maximum of all individual error functions (e.g. individual reprojection errors of
scene points). The benefit of the โโโ paradigm is that the objective function of many SfM
optimisation problems become quasiconvex, hence there is a unique minimum in the objective
function. The task of formulating and minimising quasiconvex objective functions is called
quasiconvex programming.
Although tremendous progress in SfM techniques under the โโโ paradigm has been made, there are still
unsatisfactorily solved problems, specifically, problems associated with large-scale input data and
outliers in the data. This thesis describes novel techniques to
tackle these problems.
A major weakness of the โโโ paradigm is its susceptibility to outliers. This thesis improves the
robustness of โโโ solutions against outliers by employing the least median of squares (LMS)
criterion, which amounts to minimising the median error. In the context of triangulation, this
thesis proposes a locally convergent robust algorithm underpinned by a novel quasiconvex plane
sweep technique. Imposing the LMS criterion achieves significant outlier tolerance, and, at the
same time, some properties of quasiconvexity greatly simplify the process of solving the LMS
problem.
Approximation is a commonly used technique to tackle large-scale input data. This thesis introduces
the coreset technique to quasiconvex programming problems. The coreset technique aims find a
representative subset of the input data, such that solving the same problem on the subset yields a
solution that is within known bound of the optimal solution on the complete input set. In
particular, this thesis develops a coreset approximate algorithm to handle large-scale
triangulation tasks.
Another technique to handle large-scale input data is to break the optimisation into multiple
smaller sub-problems. Such a decomposition usually speeds up the overall optimisation process,
and alleviates the limitation on memory. This thesis develops a large-scale optimisation algorithm
for the known rotation problem (KRot). The proposed method decomposes the original quasiconvex
programming problem with potentially hundreds of thousands of parameters into multiple sub-problems
with only three parameters each. An efficient solver based on a novel minimum enclosing ball
technique is proposed to solve the sub-problems.Thesis (Ph.D.) (Research by Publication) -- University of Adelaide, School of Computer Science, 201
Robust Estimation of Motion Parameters and Scene Geometry : Minimal Solvers and Convexification of Regularisers for Low-Rank Approximation
In the dawning age of autonomous driving, accurate and robust tracking of vehicles is a quintessential part. This is inextricably linked with the problem of Simultaneous Localisation and Mapping (SLAM), in which one tries to determine the position of a vehicle relative to its surroundings without prior knowledge of them. The more you know about the object you wish to trackโthrough sensors or mechanical constructionโthe more likely you are to get good positioning estimates. In the first part of this thesis, we explore new ways of improving positioning for vehicles travelling on a planar surface. This is done in several different ways: first, we generalise the work done for monocular vision to include two cameras, we propose ways of speeding up the estimation time with polynomial solvers, and we develop an auto-calibration method to cope with radially distorted images, without enforcing pre-calibration procedures.We continue to investigate the case of constrained motionโthis time using auxiliary data from inertial measurement units (IMUs) to improve positioning of unmanned aerial vehicles (UAVs). The proposed methods improve the state-of-the-art for partially calibrated cases (with unknown focal length) for indoor navigation. Furthermore, we propose the first-ever real-time compatible minimal solver for simultaneous estimation of radial distortion profile, focal length, and motion parameters while utilising the IMU data.In the third and final part of this thesis, we develop a bilinear framework for low-rank regularisation, with global optimality guarantees under certain conditions. We also show equivalence between the linear and the bilinear framework, in the sense that the objectives are equal. This enables users of alternating direction method of multipliers (ADMM)โor other subgradient or splitting methodsโto transition to the new framework, while being able to enjoy the benefits of second order methods. Furthermore, we propose a novel regulariser fusing two popular methods. This way we are able to combine the best of two worlds by encouraging bias reduction while enforcing low-rank solutions
3์ฐจ์ ์ฌ๋ ์์ธ ์ถ์ ์ ์ํ 3์ฐจ์ ๋ณต์, ์ฝ์ง๋ํ์ต, ์ง๋ํ์ต ๋ฐฉ๋ฒ
ํ์๋
ผ๋ฌธ (๋ฐ์ฌ)-- ์์ธ๋ํ๊ต ๋ํ์ : ์ตํฉ๊ณผํ๊ธฐ์ ๋ํ์ ์ตํฉ๊ณผํ๋ถ(์ง๋ฅํ์ตํฉ์์คํ
์ ๊ณต), 2019. 2. ๊ณฝ๋
ธ์ค.Estimating human poses from images is one of the fundamental tasks in computer vision, which leads to lots of applications such as action recognition, human-computer interaction, and virtual reality. Especially, estimating 3D human poses from 2D inputs is a challenging problem since it is inherently under-constrained. In addition, obtaining 3D ground truth data for human poses is only possible under the limited and restricted environments. In this dissertation, 3D human pose estimation is studied in different aspects focusing on various types of the availability of the data. To this end, three different methods to retrieve 3D human poses from 2D observations or from RGB images---algorithms of 3D reconstruction, weakly-supervised learning, and supervised learning---are proposed.
First, a non-rigid structure from motion (NRSfM) algorithm that reconstructs 3D structures of non-rigid objects such as human bodies from 2D observations is proposed. In the proposed framework which is named as Procrustean Regression, the 3D shapes are regularized based on their aligned shapes. We show that the cost function of the Procrustean Regression can be casted into an unconstrained problem or a problem with simple bound constraints, which can be efficiently solved by existing gradient descent solvers. This framework can be easily integrated with numerous existing models and assumptions, which makes it more practical for various real situations. The experimental results show that the proposed method gives competitive result to the state-of-the-art methods for orthographic projection with much less time complexity and memory requirement, and outperforms the existing methods for perspective projection.
Second, a weakly-supervised learning method that is capable of learning 3D structures when only 2D ground truth data is available as a training set is presented. Extending the Procrustean Regression framework, we suggest Procrustean Regression Network, a learning method that trains neural networks to learn 3D structures using training data with 2D ground truths. This is the first attempt that directly integrates an NRSfM algorithm into neural network training. The cost function that contains a low-rank function is also firstly used as a cost function of neural networks that reconstructs 3D shapes. During the test phase, 3D structures of human bodies can be obtained via a feed-forward operation, which enables the framework to have much faster inference time compared to the 3D reconstruction algorithms.
Third, a supervised learning method that infers 3D poses from 2D inputs using neural networks is suggested. The method exploits a relational unit which captures the relations between different body parts. In the method, each pair of different body parts generates relational features, and the average of the features from all the pairs are used for 3D pose estimation. We also suggest a dropout method called relational dropout, which can be used in relational modules to impose robustness to the occlusions. The experimental results validate that the performance of the proposed algorithm does not degrade much when missing points exist while maintaining state-of-the-art performance when every point is visible.RGB ์์์์์ ์ฌ๋ ์์ธ ์ถ์ ๋ฐฉ๋ฒ์ ์ปดํจํฐ ๋น์ ๋ถ์ผ์์ ์ค์ํ๋ฉฐ ์ฌ๋ฌ ์ดํ๋ฆฌ์ผ์ด์
์ ๊ธฐ๋ณธ์ด ๋๋ ๊ธฐ์ ์ด๋ค. ์ฌ๋ ์์ธ ์ถ์ ์ ๋์ ์ธ์, ์ธ๊ฐ-์ปดํจํฐ ์ํธ์์ฉ, ๊ฐ์ ํ์ค, ์ฆ๊ฐ ํ์ค ๋ฑ ๊ด๋ฒ์ํ ๋ถ์ผ์์ ๊ธฐ๋ฐ ๊ธฐ์ ๋ก ์ฌ์ฉ๋ ์ ์๋ค. ํนํ, 2์ฐจ์ ์
๋ ฅ์ผ๋ก๋ถํฐ 3์ฐจ์ ์ฌ๋ ์์ธ๋ฅผ ์ถ์ ํ๋ ๋ฌธ์ ๋ ๋ฌด์ํ ๋ง์ ํด๋ฅผ ๊ฐ์ง ์ ์๋ ๋ฌธ์ ์ด๊ธฐ ๋๋ฌธ์ ํ๊ธฐ ์ด๋ ค์ด ๋ฌธ์ ๋ก ์๋ ค์ ธ ์๋ค. ๋ํ, 3์ฐจ์ ์ค์ ๋ฐ์ดํฐ์ ์ต๋์ ๋ชจ์
์บก์ฒ ์คํ๋์ค ๋ฑ ์ ํ๋ ํ๊ฒฝํ์์๋ง ๊ฐ๋ฅํ๊ธฐ ๋๋ฌธ์ ์ป์ ์ ์๋ ๋ฐ์ดํฐ์ ์์ด ํ์ ์ ์ด๋ค. ๋ณธ ๋
ผ๋ฌธ์์๋, ์ป์ ์ ์๋ ํ์ต ๋ฐ์ดํฐ์ ์ข
๋ฅ์ ๋ฐ๋ผ ์ฌ๋ฌ ๋ฐฉ๋ฉด์ผ๋ก 3์ฐจ์ ์ฌ๋ ์์ธ๋ฅผ ์ถ์ ํ๋ ๋ฐฉ๋ฒ์ ์ฐ๊ตฌํ์๋ค. ๊ตฌ์ฒด์ ์ผ๋ก, 2์ฐจ์ ๊ด์ธก๊ฐ ๋๋ RGB ์์์ ๋ฐํ์ผ๋ก 3์ฐจ์ ์ฌ๋ ์์ธ๋ฅผ ์ถ์ , ๋ณต์ํ๋ ์ธ ๊ฐ์ง ๋ฐฉ๋ฒ--3์ฐจ์ ๋ณต์, ์ฝ์ง๋ํ์ต, ์ง๋ํ์ต--์ ์ ์ํ์๋ค.
์ฒซ ๋ฒ์งธ๋ก, ์ฌ๋์ ์ ์ฒด์ ๊ฐ์ด ๋น์ ํ ๊ฐ์ฒด์ 2์ฐจ์ ๊ด์ธก๊ฐ์ผ๋ก๋ถํฐ 3์ฐจ์ ๊ตฌ์กฐ๋ฅผ ๋ณต์ํ๋ ๋น์ ํ ์์ง์ ๊ธฐ๋ฐ ๊ตฌ์กฐ (Non-rigid structure from motion) ์๊ณ ๋ฆฌ์ฆ์ ์ ์ํ์๋ค. ํ๋กํฌ๋ฃจ์คํ
์ค ํ๊ท (Procrustean regression)์ผ๋ก ๋ช
๋ช
ํ ์ ์๋ ํ๋ ์์ํฌ์์, 3์ฐจ์ ํํ๋ค์ ๊ทธ๋ค์ ์ ๋ ฌ๋ ํํ์ ๋ํ ํจ์๋ก ์ ๊ทํ๋๋ค. ์ ์๋ ํ๋กํฌ๋ฃจ์คํ
์ค ํ๊ท์ ๋น์ฉ ํจ์๋ 3์ฐจ์ ํํ ์ ๋ ฌ๊ณผ ๊ด๋ จ๋ ์ ์ฝ์ ๋น์ฉ ํจ์์ ํฌํจ์์ผ ๊ฒฝ์ฌ ํ๊ฐ๋ฒ์ ์ด์ฉํ ์ต์ ํ๊ฐ ๊ฐ๋ฅํ๋ค. ์ ์๋ ๋ฐฉ๋ฒ์ ๋ค์ํ ๋ชจ๋ธ๊ณผ ๊ฐ์ ์ ํฌํจ์ํฌ ์ ์์ด ์ค์ฉ์ ์ด๊ณ ์ ์ฐํ ํ๋ ์์ํฌ์ด๋ค. ๋ค์ํ ์คํ์ ํตํด ์ ์๋ ๋ฐฉ๋ฒ์ ์ธ๊ณ ์ต๊ณ ์์ค์ ๋ฐฉ๋ฒ๋ค๊ณผ ๋น๊ตํด ์ ์ฌํ ์ฑ๋ฅ์ ๋ณด์ด๋ฉด์, ๋์์ ์๊ฐ, ๊ณต๊ฐ ๋ณต์ก๋ ๋ฉด์์ ๊ธฐ์กด ๋ฐฉ๋ฒ์ ๋นํด ์ฐ์ํจ์ ๋ณด์๋ค.
๋ ๋ฒ์งธ๋ก ์ ์๋ ๋ฐฉ๋ฒ์, 2์ฐจ์ ํ์ต ๋ฐ์ดํฐ๋ง ์ฃผ์ด์ก์ ๋ 2์ฐจ์ ์
๋ ฅ์์ 3์ฐจ์ ๊ตฌ์กฐ๋ฅผ ๋ณต์ํ๋ ์ฝ์ง๋ํ์ต ๋ฐฉ๋ฒ์ด๋ค. ํ๋กํฌ๋ฃจ์คํ
์ค ํ๊ท ์ ๊ฒฝ๋ง (Procrustean regression network)๋ก ๋ช
๋ช
ํ ์ ์๋ ํ์ต ๋ฐฉ๋ฒ์ ์ ๊ฒฝ๋ง ๋๋ ์ปจ๋ณผ๋ฃจ์
์ ๊ฒฝ๋ง์ ํตํด ์ฌ๋์ 2์ฐจ์ ์์ธ๋ก๋ถํฐ 3์ฐจ์ ์์ธ๋ฅผ ์ถ์ ํ๋ ๋ฐฉ๋ฒ์ ํ์ตํ๋ค. ํ๋กํฌ๋ฃจ์คํ
์ค ํ๊ท์ ์ฌ์ฉ๋ ๋น์ฉ ํจ์๋ฅผ ์์ ํ์ฌ ์ ๊ฒฝ๋ง์ ํ์ต์ํค๋ ๋ณธ ๋ฐฉ๋ฒ์, ๋น์ ํ ์์ง์ ๊ธฐ๋ฐ ๊ตฌ์กฐ์ ์ฌ์ฉ๋ ๋น์ฉ ํจ์๋ฅผ ์ ๊ฒฝ๋ง ํ์ต์ ์ ์ฉํ ์ต์ด์ ์๋์ด๋ค. ๋ํ ๋น์ฉํจ์์ ์ฌ์ฉ๋ ์ ๊ณ์ ํจ์ (low-rank function)๋ฅผ ์ ๊ฒฝ๋ง ํ์ต์ ์ฒ์์ผ๋ก ์ฌ์ฉํ์๋ค. ํ
์คํธ ๋ฐ์ดํฐ์ ๋ํด์ 3์ฐจ์ ์ฌ๋ ์์ธ๋ ์ ๊ฒฝ๋ง์ ์ ๋ฐฉ์ ๋ฌ(feed forward)์ฐ์ฐ์ ์ํด ์ป์ด์ง๋ฏ๋ก, 3์ฐจ์ ๋ณต์ ๋ฐฉ๋ฒ์ ๋นํด ํจ์ฌ ๋น ๋ฅธ 3์ฐจ์ ์์ธ ์ถ์ ์ด ๊ฐ๋ฅํ๋ค.
๋ง์ง๋ง์ผ๋ก, ์ ๊ฒฝ๋ง์ ์ด์ฉํด 2์ฐจ์ ์
๋ ฅ์ผ๋ก๋ถํฐ 3์ฐจ์ ์ฌ๋ ์์ธ๋ฅผ ์ถ์ ํ๋ ์ง๋ํ์ต ๋ฐฉ๋ฒ์ ์ ์ํ์๋ค. ๋ณธ ๋ฐฉ๋ฒ์ ๊ด๊ณ ์ ๊ฒฝ๋ง ๋ชจ๋(relational modules)์ ํ์ฉํด ์ ์ฒด์ ๋ค๋ฅธ ๋ถ์๊ฐ์ ๊ด๊ณ๋ฅผ ํ์ตํ๋ค. ์๋ก ๋ค๋ฅธ ๋ถ์์ ์๋ง๋ค ๊ด๊ณ ํน์ง์ ์ถ์ถํด ๋ชจ๋ ๊ด๊ณ ํน์ง์ ํ๊ท ์ ์ต์ข
3์ฐจ์ ์์ธ ์ถ์ ์ ์ฌ์ฉํ๋ค. ๋ํ ๊ด๊ณํ ๋๋์์(relational dropout)์ด๋ผ๋ ์๋ก์ด ํ์ต ๋ฐฉ๋ฒ์ ์ ์ํด ๊ฐ๋ ค์ง์ ์ํด ๋ํ๋์ง ์์ 2์ฐจ์ ๊ด์ธก๊ฐ์ด ์๋ ์ํฉ์์, ๊ฐ์ธํ๊ฒ ๋์ํ ์ ์๋ 3์ฐจ์ ์์ธ ์ถ์ ๋ฐฉ๋ฒ์ ์ ์ํ์๋ค. ์คํ์ ํตํด ํด๋น ๋ฐฉ๋ฒ์ด 2์ฐจ์ ๊ด์ธก๊ฐ์ด ์ผ๋ถ๋ง ์ฃผ์ด์ง ์ํฉ์์๋ ํฐ ์ฑ๋ฅ ํ๋ฝ์ด ์์ด ํจ๊ณผ์ ์ผ๋ก 3์ฐจ์ ์์ธ๋ฅผ ์ถ์ ํจ์ ์ฆ๋ช
ํ์๋ค.Abstract i
Contents iii
List of Tables vi
List of Figures viii
1 Introduction 1
1.1 Problem Definition . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3 Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.4 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.4.1 3D Reconstruction of Human Bodies . . . . . . . . . . 9
1.4.2 Weakly-Supervised Learning for 3D HPE . . . . . . . . 11
1.4.3 Supervised Learning for 3D HPE . . . . . . . . . . . . 11
1.5 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2 Related Works 14
2.1 2D Human Pose Estimation . . . . . . . . . . . . . . . . . . . . 14
2.2 3D Human Pose Estimation . . . . . . . . . . . . . . . . . . . . 16
2.3 Non-rigid Structure from Motion . . . . . . . . . . . . . . . . . 18
2.4 Learning to Reconstruct 3D Structures via Neural Networks . . 23
3 3D Reconstruction of Human Bodies via Procrustean Regression 25
3.1 Formalization of NRSfM . . . . . . . . . . . . . . . . . . . . . 27
3.2 Procrustean Regression . . . . . . . . . . . . . . . . . . . . . . 28
3.2.1 The Cost Function of Procrustean Regression . . . . . . 29
3.2.2 Derivatives of the Cost Function . . . . . . . . . . . . . 32
3.2.3 Example Functions for f and g . . . . . . . . . . . . . . 38
3.2.4 Handling Missing Points . . . . . . . . . . . . . . . . . 43
3.2.5 Optimization . . . . . . . . . . . . . . . . . . . . . . . 44
3.2.6 Initialization . . . . . . . . . . . . . . . . . . . . . . . 44
3.3 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . 45
3.3.1 Orthographic Projection . . . . . . . . . . . . . . . . . 46
3.3.2 Perspective Projection . . . . . . . . . . . . . . . . . . 56
3.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
3.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
4 Weakly-Supervised Learning of 3D Human Pose via Procrustean Regression Networks 69
4.1 The Cost Function for Procrustean Regression Network . . . . . 70
4.2 Choosing f and g for Procrustean Regression Network . . . . . 74
4.3 Implementation Details . . . . . . . . . . . . . . . . . . . . . . 75
4.4 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . 77
4.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
5 Supervised Learning of 3D Human Pose via Relational Networks 86
5.1 Relational Networks . . . . . . . . . . . . . . . . . . . . . . . 88
5.2 Relational Networks for 3D HPE . . . . . . . . . . . . . . . . . 88
5.3 Extensions to Multi-Frame Inputs . . . . . . . . . . . . . . . . 91
5.4 Relational Dropout . . . . . . . . . . . . . . . . . . . . . . . . 93
5.5 Implementation Details . . . . . . . . . . . . . . . . . . . . . . 94
5.6 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . 95
5.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
6 Concluding Remarks 105
6.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
6.2 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
6.3 Future Directions . . . . . . . . . . . . . . . . . . . . . . . . . 108
Abstract (In Korean) 128Docto
- โฆ