11 research outputs found
Reconstruction Bottlenecks in Object-Centric Generative Models
A range of methods with suitable inductive biases exist to learn
interpretable object-centric representations of images without supervision.
However, these are largely restricted to visually simple images; robust object
discovery in real-world sensory datasets remains elusive. To increase the
understanding of such inductive biases, we empirically investigate the role of
"reconstruction bottlenecks" for scene decomposition in GENESIS, a recent
VAE-based model. We show such bottlenecks determine reconstruction and
segmentation quality and critically influence model behaviour.Comment: 10 pages, 7 Figures, Workshop on Object-Oriented Learning at ICML
202
GENESIS: Generative Scene Inference and Sampling with Object-Centric Latent Representations
Generative latent-variable models are emerging as promising tools in robotics
and reinforcement learning. Yet, even though tasks in these domains typically
involve distinct objects, most state-of-the-art generative models do not
explicitly capture the compositional nature of visual scenes. Two recent
exceptions, MONet and IODINE, decompose scenes into objects in an unsupervised
fashion. Their underlying generative processes, however, do not account for
component interactions. Hence, neither of them allows for principled sampling
of novel scenes. Here we present GENESIS, the first object-centric generative
model of 3D visual scenes capable of both decomposing and generating scenes by
capturing relationships between scene components. GENESIS parameterises a
spatial GMM over images which is decoded from a set of object-centric latent
variables that are either inferred sequentially in an amortised fashion or
sampled from an autoregressive prior. We train GENESIS on several publicly
available datasets and evaluate its performance on scene generation,
decomposition, and semi-supervised learning.Comment: Published at the International Conference on Learning Representations
(ICLR) 202
RELATE: Physically Plausible Multi-Object Scene Synthesis Using Structured Latent Spaces
We present RELATE, a model that learns to generate physically plausible
scenes and videos of multiple interacting objects. Similar to other generative
approaches, RELATE is trained end-to-end on raw, unlabeled data. RELATE
combines an object-centric GAN formulation with a model that explicitly
accounts for correlations between individual objects. This allows the model to
generate realistic scenes and videos from a physically-interpretable
parameterization. Furthermore, we show that modeling the object correlation is
necessary to learn to disentangle object positions and identity. We find that
RELATE is also amenable to physically realistic scene editing and that it
significantly outperforms prior art in object-centric scene generation in both
synthetic (CLEVR, ShapeStacks) and real-world data (cars). In addition, in
contrast to state-of-the-art methods in object-centric generative modeling,
RELATE also extends naturally to dynamic scenes and generates videos of high
visual fidelity. Source code, datasets and more results are available at
http://geometry.cs.ucl.ac.uk/projects/2020/relate/
VAE-Loco: versatile quadruped locomotion by learning a disentangled gait representation
Quadruped locomotion is rapidly maturing to a degree where robots are able to realize highly dynamic maneuvers. However, current planners are unable to vary key gait parameters of the in-swing feet midair. In this article, we address this limitation and show that it is pivotal in increasing controller robustness by learning a latent space capturing the key stance phases constituting a particular gait. This is achieved via a generative model trained on a single trot style, which encourages disentanglement such that application of a drive signal to a single dimension of the latent state induces holistic plans synthesizing a continuous variety of trot styles. We demonstrate that specific properties of the drive signal map directly to gait parameters, such as cadence, footstep height, and full-stance duration. Due to the nature of our approach, these synthesized gaits are continuously variable online during robot operation. The use of a generative model facilitates the detection and mitigation of disturbances to provide a versatile and robust planning framework. We evaluate our approach on two versions of the real ANYmal quadruped robots and demonstrate that our method achieves a continuous blend of dynamic trot styles while being robust and reactive to external perturbations
Next steps: learning a disentangled gait representation for versatile quadruped locomotion
Quadruped locomotion is rapidly maturing to a degree where robots now routinely traverse a variety of unstructured terrains. However, while gaits can be varied typically by selecting from a range of pre-computed styles, current planners are unable to vary key gait parameters continuously while the robot is in motion. The synthesis, on-the-fly, of gaits with unexpected operational characteristics or even the blending of dynamic manoeuvres lies beyond the capabilities of the current state-of-the-art. In this work we address this limitation by learning a latent space capturing the key stance phases of a particular gait, via a generative model trained on a single trot style. This encourages disentanglement such that application of a drive signal to a single dimension of the latent state induces holistic plans synthesising a continuous variety of trot styles. In fact properties of this drive signal map directly to gait parameters such as cadence, footstep height and full stance duration. The use of a generative model facilitates the detection and mitigation of disturbances to provide a versatile and robust planning framework. We evaluate our approach on a real ANYmal quadruped robot and demonstrate that our method achieves a continuous blend of dynamic trot styles whilst being robust and reactive to external perturbations
APEX: Unsupervised, object-centric scene segmentation and tracking for robot manipulation
Recent advances in unsupervised learning for object detection, segmentation, and tracking hold significant promise for applications in robotics. A common approach is to frame these tasks as inference in probabilistic latent-variable models. In this paper, however, we show that the current state-of-the-art struggles with visually complex scenes such as typically encountered in robot manipulation tasks. We propose APEX, a new latent-variable model which is able to segment and track objects in more realistic scenes featuring objects that vary widely in size and texture, including the robot arm itself. This is achieved by a principled mask normalisation algorithm and a high-resolution scene encoder. To evaluate our approach, we present results on the real-world Sketchy dataset. This dataset, however, does not contain ground truth masks and object IDs for a quantitative evaluation. We thus introduce the Panda Pushing Dataset (P2D) which shows a Panda arm interacting with objects on a table in simulation and which includes ground-truth segmentation masks and object IDs for tracking. In both cases, APEX comprehensively outperforms the current state-of-the-art in unsupervised object segmentation and tracking. We demonstrate the efficacy of our segmentations for robot skill execution on an object arrangement task, where we also achieve the best or comparable performance among all the baselines
Reaching through latent space: From joint statistics to path planning in manipulation
We present a novel approach to path planning for robotic manipulators, in which paths are produced via iterative optimisation in the latent space of a generative model of robot poses. Constraints are incorporated through the use of constraint satisfaction classifiers operating on the same space. Optimisation leverages gradients through our learned models that provide a simple way to combine goal reaching objectives with constraint satisfaction, even in the presence of otherwise non-differentiable constraints. Our models are trained in a task-agnostic manner on randomly sampled robot poses. In baseline comparisons against a number of widely used planners, we achieve commensurate performance in terms of task success, planning time and path length, performing successful path planning with obstacle avoidance on a real 7-DoF robot arm
VAE-Loco: Versatile Quadruped Locomotion by Learning a Disentangled Gait Representation
Quadruped locomotion is rapidly maturing to a degree where robots now
routinely traverse a variety of unstructured terrains. However, while gaits can
be varied typically by selecting from a range of pre-computed styles, current
planners are unable to vary key gait parameters continuously while the robot is
in motion. The synthesis, on-the-fly, of gaits with unexpected operational
characteristics or even the blending of dynamic manoeuvres lies beyond the
capabilities of the current state-of-the-art. In this work we address this
limitation by learning a latent space capturing the key stance phases
constituting a particular gait. This is achieved via a generative model trained
on a single trot style, which encourages disentanglement such that application
of a drive signal to a single dimension of the latent state induces holistic
plans synthesising a continuous variety of trot styles. We demonstrate that
specific properties of the drive signal map directly to gait parameters such as
cadence, footstep height and full stance duration. Due to the nature of our
approach these synthesised gaits are continuously variable online during robot
operation and robustly capture a richness of movement significantly exceeding
the relatively narrow behaviour seen during training. In addition, the use of a
generative model facilitates the detection and mitigation of disturbances to
provide a versatile and robust planning framework. We evaluate our approach on
two versions of the real ANYmal quadruped robots and demonstrate that our
method achieves a continuous blend of dynamic trot styles whilst being robust
and reactive to external perturbations.Comment: 15 pages, 13 figures, 1 table, submitted to IEEE Transactions on
Robotics (T-RO). arXiv admin note: substantial text overlap with
arXiv:2112.0480