69 research outputs found
Learning Real-world Autonomous Navigation by Self-Supervised Environment Synthesis
Machine learning approaches have recently enabled autonomous navigation for
mobile robots in a data-driven manner. Since most existing learning-based
navigation systems are trained with data generated in artificially created
training environments, during real-world deployment at scale, it is inevitable
that robots will encounter unseen scenarios, which are out of the training
distribution and therefore lead to poor real-world performance. On the other
hand, directly training in the real world is generally unsafe and inefficient.
To address this issue, we introduce Self-supervised Environment Synthesis
(SES), in which, after real-world deployment with safety and efficiency
requirements, autonomous mobile robots can utilize experience from the
real-world deployment, reconstruct navigation scenarios, and synthesize
representative training environments in simulation. Training in these
synthesized environments leads to improved future performance in the real
world. The effectiveness of SES at synthesizing representative simulation
environments and improving real-world navigation performance is evaluated via a
large-scale deployment in a high-fidelity, realistic simulator and a
small-scale deployment on a physical robot
Benchmarking Reinforcement Learning Techniques for Autonomous Navigation
Deep reinforcement learning (RL) has brought many successes for autonomous
robot navigation. However, there still exists important limitations that
prevent real-world use of RL-based navigation systems. For example, most
learning approaches lack safety guarantees; and learned navigation systems may
not generalize well to unseen environments. Despite a variety of recent
learning techniques to tackle these challenges in general, a lack of an
open-source benchmark and reproducible learning methods specifically for
autonomous navigation makes it difficult for roboticists to choose what
learning methods to use for their mobile robots and for learning researchers to
identify current shortcomings of general learning methods for autonomous
navigation. In this paper, we identify four major desiderata of applying deep
RL approaches for autonomous navigation: (D1) reasoning under uncertainty, (D2)
safety, (D3) learning from limited trial-and-error data, and (D4)
generalization to diverse and novel environments. Then, we explore four major
classes of learning techniques with the purpose of achieving one or more of the
four desiderata: memory-based neural network architectures (D1), safe RL (D2),
model-based RL (D2, D3), and domain randomization (D4). By deploying these
learning techniques in a new open-source large-scale navigation benchmark and
real-world environments, we perform a comprehensive study aimed at establishing
to what extent can these techniques achieve these desiderata for RL-based
navigation systems
Recommended from our members
Machine Learning Methods for Local Motion Planning: A Study of End-to-End vs. Parameter Learning
This conference paper was featured in the October 2021 Good System Network Digest.Office of the VP for Researc
Deep Generative Models on 3D Representations: A Survey
Generative models, as an important family of statistical modeling, target
learning the observed data distribution via generating new instances. Along
with the rise of neural networks, deep generative models, such as variational
autoencoders (VAEs) and generative adversarial network (GANs), have made
tremendous progress in 2D image synthesis. Recently, researchers switch their
attentions from the 2D space to the 3D space considering that 3D data better
aligns with our physical world and hence enjoys great potential in practice.
However, unlike a 2D image, which owns an efficient representation (i.e., pixel
grid) by nature, representing 3D data could face far more challenges.
Concretely, we would expect an ideal 3D representation to be capable enough to
model shapes and appearances in details, and to be highly efficient so as to
model high-resolution data with fast speed and low memory cost. However,
existing 3D representations, such as point clouds, meshes, and recent neural
fields, usually fail to meet the above requirements simultaneously. In this
survey, we make a thorough review of the development of 3D generation,
including 3D shape generation and 3D-aware image synthesis, from the
perspectives of both algorithms and more importantly representations. We hope
that our discussion could help the community track the evolution of this field
and further spark some innovative ideas to advance this challenging task
Neuromorphic Incremental on-chip Learning with Hebbian Weight Consolidation
As next-generation implantable brain-machine interfaces become pervasive on
edge device, incrementally learning new tasks in bio-plasticity ways is
urgently demanded for Neuromorphic chips. Due to the inherent characteristics
of its structure, spiking neural networks are naturally well-suited for
BMI-chips. Here we propose Hebbian Weight Consolidation, as well as an on-chip
learning framework. HWC selectively masks synapse modifications for previous
tasks, retaining them to store new knowledge from subsequent tasks while
preserving the old knowledge. Leveraging the bio-plasticity of dendritic
spines, the intrinsic self-organizing nature of Hebbian Weight Consolidation
aligns naturally with the incremental learning paradigm, facilitating robust
learning outcomes. By reading out spikes layer by layer and performing
back-propagation on the external micro-controller unit, MLoC can efficiently
accomplish on-chip learning. Experiments show that our HWC algorithm up to
23.19% outperforms lower bound that without incremental learning algorithm,
particularly in more challenging monkey behavior decoding scenarios. Taking
into account on-chip computing on Synsense Speck 2e chip, our proposed
algorithm exhibits an improvement of 11.06%. This study demonstrates the
feasibility of employing incremental learning for high-performance neural
signal decoding in next-generation brain-machine interfaces.Comment: 12 pages, 6 figure
Improving 3D-aware Image Synthesis with A Geometry-aware Discriminator
3D-aware image synthesis aims at learning a generative model that can render
photo-realistic 2D images while capturing decent underlying 3D shapes. A
popular solution is to adopt the generative adversarial network (GAN) and
replace the generator with a 3D renderer, where volume rendering with neural
radiance field (NeRF) is commonly used. Despite the advancement of synthesis
quality, existing methods fail to obtain moderate 3D shapes. We argue that,
considering the two-player game in the formulation of GANs, only making the
generator 3D-aware is not enough. In other words, displacing the generative
mechanism only offers the capability, but not the guarantee, of producing
3D-aware images, because the supervision of the generator primarily comes from
the discriminator. To address this issue, we propose GeoD through learning a
geometry-aware discriminator to improve 3D-aware GANs. Concretely, besides
differentiating real and fake samples from the 2D image space, the
discriminator is additionally asked to derive the geometry information from the
inputs, which is then applied as the guidance of the generator. Such a simple
yet effective design facilitates learning substantially more accurate 3D
shapes. Extensive experiments on various generator architectures and training
datasets verify the superiority of GeoD over state-of-the-art alternatives.
Moreover, our approach is registered as a general framework such that a more
capable discriminator (i.e., with a third task of novel view synthesis beyond
domain classification and geometry extraction) can further assist the generator
with a better multi-view consistency.Comment: Accepted by NeurIPS 2022. Project page:
https://vivianszf.github.io/geo
DMV3D: Denoising Multi-View Diffusion using 3D Large Reconstruction Model
We propose \textbf{DMV3D}, a novel 3D generation approach that uses a
transformer-based 3D large reconstruction model to denoise multi-view
diffusion. Our reconstruction model incorporates a triplane NeRF representation
and can denoise noisy multi-view images via NeRF reconstruction and rendering,
achieving single-stage 3D generation in 30s on single A100 GPU. We train
\textbf{DMV3D} on large-scale multi-view image datasets of highly diverse
objects using only image reconstruction losses, without accessing 3D assets. We
demonstrate state-of-the-art results for the single-image reconstruction
problem where probabilistic modeling of unseen object parts is required for
generating diverse reconstructions with sharp textures. We also show
high-quality text-to-3D generation results outperforming previous 3D diffusion
models. Our project website is at: https://justimyhxu.github.io/projects/dmv3d/ .Comment: Project Page: https://justimyhxu.github.io/projects/dmv3d
Gaussian Shell Maps for Efficient 3D Human Generation
Efficient generation of 3D digital humans is important in several industries,
including virtual reality, social media, and cinematic production. 3D
generative adversarial networks (GANs) have demonstrated state-of-the-art
(SOTA) quality and diversity for generated assets. Current 3D GAN
architectures, however, typically rely on volume representations, which are
slow to render, thereby hampering the GAN training and requiring
multi-view-inconsistent 2D upsamplers. Here, we introduce Gaussian Shell Maps
(GSMs) as a framework that connects SOTA generator network architectures with
emerging 3D Gaussian rendering primitives using an articulable multi
shell--based scaffold. In this setting, a CNN generates a 3D texture stack with
features that are mapped to the shells. The latter represent inflated and
deflated versions of a template surface of a digital human in a canonical body
pose. Instead of rasterizing the shells directly, we sample 3D Gaussians on the
shells whose attributes are encoded in the texture features. These Gaussians
are efficiently and differentiably rendered. The ability to articulate the
shells is important during GAN training and, at inference time, to deform a
body into arbitrary user-defined poses. Our efficient rendering scheme bypasses
the need for view-inconsistent upsamplers and achieves high-quality multi-view
consistent renderings at a native resolution of pixels. We
demonstrate that GSMs successfully generate 3D humans when trained on
single-view datasets, including SHHQ and DeepFashion.Comment: Project page : https://rameenabdal.github.io/GaussianShellMaps
- …