16,393 research outputs found
MonoPerfCap: Human Performance Capture from Monocular Video
We present the first marker-less approach for temporally coherent 3D
performance capture of a human with general clothing from monocular video. Our
approach reconstructs articulated human skeleton motion as well as medium-scale
non-rigid surface deformations in general scenes. Human performance capture is
a challenging problem due to the large range of articulation, potentially fast
motion, and considerable non-rigid deformations, even from multi-view data.
Reconstruction from monocular video alone is drastically more challenging,
since strong occlusions and the inherent depth ambiguity lead to a highly
ill-posed reconstruction problem. We tackle these challenges by a novel
approach that employs sparse 2D and 3D human pose detections from a
convolutional neural network using a batch-based pose estimation strategy.
Joint recovery of per-batch motion allows to resolve the ambiguities of the
monocular reconstruction problem based on a low dimensional trajectory
subspace. In addition, we propose refinement of the surface geometry based on
fully automatically extracted silhouettes to enable medium-scale non-rigid
alignment. We demonstrate state-of-the-art performance capture results that
enable exciting applications such as video editing and free viewpoint video,
previously infeasible from monocular video. Our qualitative and quantitative
evaluation demonstrates that our approach significantly outperforms previous
monocular methods in terms of accuracy, robustness and scene complexity that
can be handled.Comment: Accepted to ACM TOG 2018, to be presented on SIGGRAPH 201
Biplane Fluoroscopy for Hindfoot Motion Analysis during Gait: A Model-based Evaluation
The purpose of this study was to quantify the accuracy and precision of a biplane fluoroscopy system for model-based tracking of in vivo hindfoot motion during over-ground gait. Gait was simulated by manually manipulating a cadaver foot specimen through a biplane fluoroscopy system attached to a walkway. Three 1.6-mm diameter steel beads were implanted into the specimen to provide marker-based tracking measurements for comparison to model-based tracking. A CT scan was acquired to define a gold standard of implanted bead positions and to create 3D models for model-based tracking. Static and dynamic trials manipulating the specimen through the capture volume were performed. Marker-based tracking error was calculated relative to the gold standard implanted bead positions. The bias, precision, and root-mean-squared (RMS) error of model-based tracking was calculated relative to the marker-based measurements. The overall RMS error of the model-based tracking method averaged 0.43 ± 0.22 mm and 0.66 ± 0.43° for static and 0.59 ± 0.10 mm and 0.71 ± 0.12° for dynamic trials. The model-based tracking approach represents a non-invasive technique for accurately measuring dynamic hindfoot joint motion during in vivo, weight bearing conditions. The model-based tracking method is recommended for application on the basis of the study results
Simultaneous Learning of Nonlinear Manifold and Dynamical Models for High-dimensional Time Series
The goal of this work is to learn a parsimonious and informative representation for high-dimensional time series. Conceptually, this comprises two distinct yet tightly coupled tasks: learning a low-dimensional manifold and modeling the dynamical process. These two tasks have a complementary relationship as the temporal constraints provide valuable neighborhood information for dimensionality reduction and conversely, the low-dimensional space allows dynamics to be learnt efficiently. Solving these two tasks simultaneously allows important information to be exchanged mutually. If nonlinear models are required to capture the rich complexity of time series, then the learning problem becomes harder as the nonlinearities in both tasks are coupled. The proposed solution approximates the nonlinear manifold and dynamics using piecewise linear models. The interactions among the linear models are captured in a graphical model. By exploiting the model structure, efficient inference and learning algorithms are obtained without oversimplifying the model of the underlying dynamical process. Evaluation of the proposed framework with competing approaches is conducted in three sets of experiments: dimensionality reduction and reconstruction using synthetic time series, video synthesis using a dynamic texture database, and human motion synthesis, classification and tracking on a benchmark data set. In all experiments, the proposed approach provides superior performance.National Science Foundation (IIS 0308213, IIS 0329009, CNS 0202067
Autonomous Tissue Scanning under Free-Form Motion for Intraoperative Tissue Characterisation
In Minimally Invasive Surgery (MIS), tissue scanning with imaging probes is
required for subsurface visualisation to characterise the state of the tissue.
However, scanning of large tissue surfaces in the presence of deformation is a
challenging task for the surgeon. Recently, robot-assisted local tissue
scanning has been investigated for motion stabilisation of imaging probes to
facilitate the capturing of good quality images and reduce the surgeon's
cognitive load. Nonetheless, these approaches require the tissue surface to be
static or deform with periodic motion. To eliminate these assumptions, we
propose a visual servoing framework for autonomous tissue scanning, able to
deal with free-form tissue deformation. The 3D structure of the surgical scene
is recovered and a feature-based method is proposed to estimate the motion of
the tissue in real-time. A desired scanning trajectory is manually defined on a
reference frame and continuously updated using projective geometry to follow
the tissue motion and control the movement of the robotic arm. The advantage of
the proposed method is that it does not require the learning of the tissue
motion prior to scanning and can deal with free-form deformation. We deployed
this framework on the da Vinci surgical robot using the da Vinci Research Kit
(dVRK) for Ultrasound tissue scanning. Since the framework does not rely on
information from the Ultrasound data, it can be easily extended to other
probe-based imaging modalities.Comment: 7 pages, 5 figures, ICRA 202
LiveCap: Real-time Human Performance Capture from Monocular Video
We present the first real-time human performance capture approach that
reconstructs dense, space-time coherent deforming geometry of entire humans in
general everyday clothing from just a single RGB video. We propose a novel
two-stage analysis-by-synthesis optimization whose formulation and
implementation are designed for high performance. In the first stage, a skinned
template model is jointly fitted to background subtracted input video, 2D and
3D skeleton joint positions found using a deep neural network, and a set of
sparse facial landmark detections. In the second stage, dense non-rigid 3D
deformations of skin and even loose apparel are captured based on a novel
real-time capable algorithm for non-rigid tracking using dense photometric and
silhouette constraints. Our novel energy formulation leverages automatically
identified material regions on the template to model the differing non-rigid
deformation behavior of skin and apparel. The two resulting non-linear
optimization problems per-frame are solved with specially-tailored
data-parallel Gauss-Newton solvers. In order to achieve real-time performance
of over 25Hz, we design a pipelined parallel architecture using the CPU and two
commodity GPUs. Our method is the first real-time monocular approach for
full-body performance capture. Our method yields comparable accuracy with
off-line performance capture techniques, while being orders of magnitude
faster
FlightGoggles: A Modular Framework for Photorealistic Camera, Exteroceptive Sensor, and Dynamics Simulation
FlightGoggles is a photorealistic sensor simulator for perception-driven
robotic vehicles. The key contributions of FlightGoggles are twofold. First,
FlightGoggles provides photorealistic exteroceptive sensor simulation using
graphics assets generated with photogrammetry. Second, it provides the ability
to combine (i) synthetic exteroceptive measurements generated in silico in real
time and (ii) vehicle dynamics and proprioceptive measurements generated in
motio by vehicle(s) in a motion-capture facility. FlightGoggles is capable of
simulating a virtual-reality environment around autonomous vehicle(s). While a
vehicle is in flight in the FlightGoggles virtual reality environment,
exteroceptive sensors are rendered synthetically in real time while all complex
extrinsic dynamics are generated organically through the natural interactions
of the vehicle. The FlightGoggles framework allows for researchers to
accelerate development by circumventing the need to estimate complex and
hard-to-model interactions such as aerodynamics, motor mechanics, battery
electrochemistry, and behavior of other agents. The ability to perform
vehicle-in-the-loop experiments with photorealistic exteroceptive sensor
simulation facilitates novel research directions involving, e.g., fast and
agile autonomous flight in obstacle-rich environments, safe human interaction,
and flexible sensor selection. FlightGoggles has been utilized as the main test
for selecting nine teams that will advance in the AlphaPilot autonomous drone
racing challenge. We survey approaches and results from the top AlphaPilot
teams, which may be of independent interest.Comment: Initial version appeared at IROS 2019. Supplementary material can be
found at https://flightgoggles.mit.edu. Revision includes description of new
FlightGoggles features, such as a photogrammetric model of the MIT Stata
Center, new rendering settings, and a Python AP
- …