Search CORE

16 research outputs found

SMASH: Data-driven Reconstruction of Physically Valid Collisions.

Author: Mitra NJ
Monszpart A
Thuerey N
Publication venue
Publication date: 11/11/2016
Field of study

Collision sequences are commonly used in games and entertainment to add drama and excitement. Authoring even two body collisions in real world can be difficult as one has to get timing and the object trajectories to be correctly synchronized. After trial-anderror iterations, when objects can actually be made to collide, then they are difficult to acquire in 3D. In contrast, synthetically generating plausible collisions is difficult as it requires adjusting different collision parameters (e.g., object mass ratio, coefficient of restitution, etc.) and appropriate initial parameters. We present SMASH to directly ‘read off’ appropriate collision parameters simply based on input video recordings. Specifically, we describe how to use laws of rigid body collision to regularize the problem of lifting 2D annotated poses to 3D reconstruction of collision sequences. The reconstructed sequences can then be modified and combined to easily author novel and plausible collision sequences. We demonstrate the system on various complex collision sequences

UCL Discovery

iMAPPER: Interaction-guided Scene Mapping from Monocular Videos

Author: Ceylan D
Guerrero P
Mitra NJ
Monszpart A
Yumer E
Publication venue: ASSOC COMPUTING MACHINERY
Publication date: 01/07/2019
Field of study

Next generation smart and augmented reality systems demand a computational understanding of monocular footage that captures humans in physical spaces to reveal plausible object arrangements and human-object interactions. Despite recent advances, both in scene layout and human motion analysis, the above setting remains challenging to analyze due to regular occlusions that occur between objects and human motions. We observe that the interaction between object arrangements and human actions is often strongly correlated, and hence can be used to help recover from these occlusions. We present iMapper, a data-driven method to identify such human-object interactions and utilize them to infer layouts of occluded objects. Starting from a monocular video with detected 2D human joint positions that are potentially noisy and occluded, we first introduce the notion of interaction-saliency as space-time snapshots where informative human-object interactions happen. Then, we propose a global optimization to retrieve and fit interactions from a database to the detected salient interactions in order to best explain the input video. We extensively evaluate the approach, both quantitatively against manually annotated ground truth and through a user study, and demonstrate that iMapper produces plausible scene layouts for scenes with medium to heavy occlusion. Code and data are available on the project page

UCL Discovery

Visual Camera Re-Localization Using Graph Neural Networks and Relative Pose Supervision

Author: Brachmann E
Brostow GJ
Monszpart A
Schindler K
Turkoglu MO
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 12/04/2021
Field of study

Visual re-localization means using a single image as input to estimate the camera's location and orientation relative to a pre-recorded environment. The highest-scoring methods are 'structure-based,' and need the query camera's intrinsics as an input to the model,with careful geometric optimization. When intrinsics are absent,methods vie for accuracy by making various other assumptions. This yields fairly good localization scores,but the models are 'narrow' in some way,e.g.,requiring costly test-time computations,or depth sensors,or multiple query frames. In contrast,our proposed method makes few special assumptions,and is fairly lightweight in training and testing.Our pose regression network learns from only relative poses of training scenes. For inference,it builds a graph connecting the query image to training counterparts and uses a graph neural network (GNN) with image representations on nodes and image-pair representations on edges. By efficiently passing messages between them,both representation types are refined to produce a consistent camera pose estimate. We validate the effectiveness of our approach on both standard indoor (7-Scenes) and outdoor (Cambridge Landmarks) camera re-localization benchmarks. Our relative pose regression method matches the accuracy of absolute pose regression networks,while retaining the relative-pose models' test-time speed and ability to generalize to non-training scenes

arXiv.org e-Print Archive

UCL Discovery

Unsupervised Intuitive Physics from Visual Observations

Author: A Monszpart
AN Sanborn
G Bradski
I Misra
L Ladický
P Battaglia
Publication venue
Publication date: 01/01/2019
Field of study

While learning models of intuitive physics is an increasingly active area of research, current approaches still fall short of natural intelligences in one important regard: they require external supervision, such as explicit access to physical states, at training and sometimes even at test times. Some authors have relaxed such requirements by supplementing the model with an handcrafted physical simulator. Still, the resulting methods are unable to automatically learn new complex environments and to understand physical interactions within them. In this work, we demonstrated for the first time learning such predictors directly from raw visual observations and without relying on simulators. We do so in two steps: first, we learn to track mechanically-salient objects in videos using causality and equivariance, two unsupervised learning principles that do not require auto-encoding. Second, we demonstrate that the extracted positions are sufficient to successfully train visual motion predictors that can take the underlying environment into account. We validate our predictors on synthetic datasets; then, we introduce a new dataset, ROLL4REAL, consisting of real objects rolling on complex terrains (pool table, elliptical bowl, and random height-field). We show that in all such cases it is possible to learn reliable extrapolators of the object trajectories from raw videos alone, without any form of external supervision and with no more prior knowledge than the choice of a convolutional neural network architecture

arXiv.org e-Print Archive

Crossref

Oxford University Research Archive

String-Actuated Curved Folded Surfaces

Author: Kilian M
Mitra N
Monszpart A
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/06/2017
Field of study

Curved folded surfaces, given their ability to produce elegant freeform shapes by folding flat sheets etched with curved creases, hold a special place in computational Origami. Artists and designers have proposed a wide variety of different fold patterns to create a range of interesting surfaces. The creative process, design, as well as fabrication is usually only concerned with the static surface that emerges once folding has completed. Folding such patterns, however, is difficult as multiple creases have to be folded simultaneously to obtain a properly folded target shape. We introduce string actuated curved folded surfaces that can be shaped by pulling a network of strings, thus, vastly simplifying the process of creating such surfaces and making the folding motion an integral part of the design. Technically, we solve the problem of which surface points to string together and how to actuate them by locally expressing a desired folding path in the space of isometric shape deformations in terms of novel string actuation modes. We demonstrate the validity of our approach by computing string actuation networks for a range of well-known crease patterns and testing their effectiveness on physical prototypes. All the examples in this article can be downloaded for personal use from http://geometry.cs.ucl.ac.uk/projects/2017/string-actuated/

UCL Discovery

GRAB: A Dataset of Whole-Body Human Grasping of Objects

Author: A Monszpart
A Ranjan
A Sahbani
AT Miller
D Tzionas
FMP Behbahani
H Zhang
HA Ruff
J Romero
JR Napier
K Bernardin
KH Lee
M Kokic
M Oberweger
M Savva
M Savva
MA Brubaker
MK Johnson
MR Cutkosky
N Kamakura
PG Kry
S Han
S Pirk
S Starke
S Sundaram
SA Mascaro
T Feix
T Pham
VG Kim
Y Wang
Y Ye
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2020
Field of study

Training computers to understand, model, and synthesize human grasping requires a rich dataset containing complex 3D object shapes, detailed contact information, hand pose and shape, and the 3D body motion over time. While "grasping" is commonly thought of as a single hand stably lifting an object, we capture the motion of the entire body and adopt the generalized notion of "whole-body grasps". Thus, we collect a new dataset, called GRAB (GRasping Actions with Bodies), of whole-body grasps, containing full 3D shape and pose sequences of 10 subjects interacting with 51 everyday objects of varying shape and size. Given MoCap markers, we fit the full 3D body shape and pose, including the articulated face and hands, as well as the 3D object pose. This gives detailed 3D meshes over time, from which we compute contact between the body and object. This is a unique dataset, that goes well beyond existing ones for modeling and understanding how humans grasp and manipulate objects, how their full body is involved, and how interaction varies with the task. We illustrate the practical value of GRAB with an example application; we train GrabNet, a conditional generative network, to predict 3D hand grasps for unseen 3D object shapes. The dataset and code are available for research purposes at https://grab.is.tue.mpg.de.Comment: ECCV 202

arXiv.org e-Print Archive

Crossref

MPG.PuRe

Taking visual motion prediction to new heightfields

Author: Ehrhardt S
Mitra NJ
Monszpart A
Vedaldi A
Publication venue: 'Elsevier BV'
Publication date: 01/01/2019
Field of study

While the basic laws of Newtonian mechanics are well understood, explaining a physical scenario still requires manually modeling the problem with suitable equations and estimating the associated parameters. In order to be able to leverage the approximation capabilities of artificial intelligence techniques in such physics related contexts, researchers have handcrafted relevant states, and then used neural networks to learn the state transitions using simulation runs as training data. Unfortunately, such approaches are unsuited for modeling complex real-world scenarios, where manually authoring relevant state spaces tend to be tedious and challenging. In this work, we investigate if neural networks can implicitly learn physical states of real-world mechanical processes only based on visual data while internally modeling non-homogeneous environment and in the process enable long-term physical extrapolation. We develop a recurrent neural network architecture for this task and also characterize resultant uncertainties in the form of evolving variance estimates. We evaluate our setup, to extrapolate motion of rolling ball(s) on bowls of varying shape and orientation, and on arbitrary heightfields using only images as input. We report significant improvements over existing image-based methods both in terms of accuracy of predictions and complexity of scenarios; and report competitive performance with approaches that, unlike us, assume access to internal physical states

Oxford University Research Archive

Unsupervised intuitive physics from visual observations

Author: Ehrhardt S
Mitra N
Monszpart A
Vedaldi A
Publication venue: Springer, Cham
Publication date: 01/01/2019
Field of study

While learning models of intuitive physics is an active area of research, current approaches fall short of natural intelligences in one important regard: they require external supervision, such as explicit access to physical states, at training and sometimes even at test time. Some approaches sidestep these requirements by building models on top of handcrafted physical simulators. In both cases, however, methods cannot learn automatically new physical environments and their laws as humans do. In this work, we successfully demonstrate, for the first time, learning unsupervised predictors of physical states, such as the position of objects in an environment, directly from raw visual observations and without relying on simulators. We do so in two steps: (i) we learn to track dynamically-salient objects in videos using causality and equivariance, two non-generative unsupervised learning principles that do not require manual or external supervision. (ii) we demonstrate that the extracted positions are sufficient to successfully train visual motion predictors that can take the underlying environment into account. We validate our predictors on synthetic datasets; then, we introduce a new dataset, Roll4Real, consisting of real objects rolling on complex terrains (pool table, elliptical bowl, and random height-field). We show that it is possible to learn reliable object trajectory extrapolators from raw videos alone, without any external supervision and with no more prior knowledge than the choice of a convolutional neural network architecture

Oxford University Research Archive

Taking visual motion prediction to new heightfields

Author: Ehrhardt S
Mitra NJ
Monszpart A
Vedaldi A
Publication venue: 'Elsevier BV'
Publication date: 01/01/2019
Field of study

Oxford University Research Archive

Taking visual motion prediction to new heightfields

Author: Ehrhardt S
Mitra NJ
Monszpart A
Vedaldi A
Publication venue: 'Elsevier BV'
Publication date: 01/01/2019
Field of study

Oxford University Research Archive