549 research outputs found
On Motion Parameterizations in Image Sequences from Fixed Viewpoints
This dissertation addresses the problem of parameterizing object motion within a set of images taken with a stationary camera. We develop data-driven methods across all image scales: characterizing motion observed at the scale of individual pixels, along extended structures such as roads, and whole image deformations such as lungs deforming over time. The primary contributions include: a) fundamental studies of the relationship between spatio-temporal image derivatives accumulated at a pixel, and the object motions at that pixel,: b) data driven approaches to parameterize breath motion and reconstruct lung CT data volumes, and: c) defining and offering initial results for a new class of Partially Unsupervised Manifold Learning: PUML) problems, which often arise in medical imagery. Specifically, we create energy functions for measuring how consistent a given velocity vector is with observed spatio-temporal image derivatives. These energy functions are used to fit parametric snake models to roads using velocity constraints. We create an automatic data-driven technique for finding the breath phase of lung CT scans which is able to replace external belt measurements currently in use clinically. This approach is extended to automatically create a full deformation model of a CT lung volume during breathing or heart MRI during breathing and heartbeat. Additionally, motivated by real use cases, we address a scenario in which a dataset is collected along with meta-data which describes some, but not all, aspects of the dataset. We create an embedding which displays the remaining variability in a dataset after accounting for variability related to the meta-data
Robust Motion In-betweening
In this work we present a novel, robust transition generation technique that
can serve as a new tool for 3D animators, based on adversarial recurrent neural
networks. The system synthesizes high-quality motions that use
temporally-sparse keyframes as animation constraints. This is reminiscent of
the job of in-betweening in traditional animation pipelines, in which an
animator draws motion frames between provided keyframes. We first show that a
state-of-the-art motion prediction model cannot be easily converted into a
robust transition generator when only adding conditioning information about
future keyframes. To solve this problem, we then propose two novel additive
embedding modifiers that are applied at each timestep to latent representations
encoded inside the network's architecture. One modifier is a time-to-arrival
embedding that allows variations of the transition length with a single model.
The other is a scheduled target noise vector that allows the system to be
robust to target distortions and to sample different transitions given fixed
keyframes. To qualitatively evaluate our method, we present a custom
MotionBuilder plugin that uses our trained model to perform in-betweening in
production scenarios. To quantitatively evaluate performance on transitions and
generalizations to longer time horizons, we present well-defined in-betweening
benchmarks on a subset of the widely used Human3.6M dataset and on LaFAN1, a
novel high quality motion capture dataset that is more appropriate for
transition generation. We are releasing this new dataset along with this work,
with accompanying code for reproducing our baseline results.Comment: Published at SIGGRAPH 202
Mining Spatial-Temporal Patterns and Structural Sparsity for Human Motion Data Denoising
Motion capture is an important technique with a wide range of applications in areas such as computer vision, computer animation, film production, and medical rehabilitation. Even with the professional motion capture systems, the acquired raw data mostly contain inevitable noises and outliers. To denoise the data, numerous methods have been developed, while this problem still remains a challenge due to the high complexity of human motion and the diversity of real-life situations. In this paper, we propose a data-driven-based robust human motion denoising approach by mining the spatial-temporal patterns and the structural sparsity embedded in motion data. We first replace the regularly used entire pose model with a much fine-grained partlet model as feature representation to exploit the abundant local body part posture and movement similarities. Then, a robust dictionary learning algorithm is proposed to learn multiple compact and representative motion dictionaries from the training data in parallel. Finally, we reformulate the human motion denoising problem as a robust structured sparse coding problem in which both the noise distribution information and the temporal smoothness property of human motion have been jointly taken into account. Compared with several state-of-the-art motion denoising methods on both the synthetic and real noisy motion data, our method consistently yields better performance than its counterparts. The outputs of our approach are much more stable than that of the others. In addition, it is much easier to setup the training dataset of our method than that of the other data-driven-based methods
Learning to Transform Time Series with a Few Examples
We describe a semi-supervised regression algorithm that learns to transform one time series into another time series given examples of the transformation. This algorithm is applied to tracking, where a time series of observations from sensors is transformed to a time series describing the pose of a target. Instead of defining and implementing such transformations for each tracking task separately, our algorithm learns a memoryless transformation of time series from a few example input-output mappings. The algorithm searches for a smooth function that fits the training examples and, when applied to the input time series, produces a time series that evolves according to assumed dynamics. The learning procedure is fast and lends itself to a closed-form solution. It is closely related to nonlinear system identification and manifold learning techniques. We demonstrate our algorithm on the tasks of tracking RFID tags from signal strength measurements, recovering the pose of rigid objects, deformable bodies, and articulated bodies from video sequences. For these tasks, this algorithm requires significantly fewer examples compared to fully-supervised regression algorithms or semi-supervised learning algorithms that do not take the dynamics of the output time series into account
- …