34 research outputs found
Incorporating Recurrent Reinforcement Learning into Model Predictive Control for Adaptive Control in Autonomous Driving
Model Predictive Control (MPC) is attracting tremendous attention in the
autonomous driving task as a powerful control technique. The success of an MPC
controller strongly depends on an accurate internal dynamics model. However,
the static parameters, usually learned by system identification, often fail to
adapt to both internal and external perturbations in real-world scenarios. In
this paper, we firstly (1) reformulate the problem as a Partially Observed
Markov Decision Process (POMDP) that absorbs the uncertainties into
observations and maintains Markov property into hidden states; and (2) learn a
recurrent policy continually adapting the parameters of the dynamics model via
Recurrent Reinforcement Learning (RRL) for optimal and adaptive control; and
(3) finally evaluate the proposed algorithm (referred as ) in
CARLA simulator and leading to robust behaviours under a wide range of
perturbations
NeRRF: 3D Reconstruction and View Synthesis for Transparent and Specular Objects with Neural Refractive-Reflective Fields
Neural radiance fields (NeRF) have revolutionized the field of image-based
view synthesis. However, NeRF uses straight rays and fails to deal with
complicated light path changes caused by refraction and reflection. This
prevents NeRF from successfully synthesizing transparent or specular objects,
which are ubiquitous in real-world robotics and A/VR applications. In this
paper, we introduce the refractive-reflective field. Taking the object
silhouette as input, we first utilize marching tetrahedra with a progressive
encoding to reconstruct the geometry of non-Lambertian objects and then model
refraction and reflection effects of the object in a unified framework using
Fresnel terms. Meanwhile, to achieve efficient and effective anti-aliasing, we
propose a virtual cone supersampling technique. We benchmark our method on
different shapes, backgrounds and Fresnel terms on both real-world and
synthetic datasets. We also qualitatively and quantitatively benchmark the
rendering results of various editing applications, including material editing,
object replacement/insertion, and environment illumination estimation. Codes
and data are publicly available at https://github.com/dawning77/NeRRF
DPF: Learning Dense Prediction Fields with Weak Supervision
Nowadays, many visual scene understanding problems are addressed by dense
prediction networks. But pixel-wise dense annotations are very expensive (e.g.,
for scene parsing) or impossible (e.g., for intrinsic image decomposition),
motivating us to leverage cheap point-level weak supervision. However, existing
pointly-supervised methods still use the same architecture designed for full
supervision. In stark contrast to them, we propose a new paradigm that makes
predictions for point coordinate queries, as inspired by the recent success of
implicit representations, like distance or radiance fields. As such, the method
is named as dense prediction fields (DPFs). DPFs generate expressive
intermediate features for continuous sub-pixel locations, thus allowing outputs
of an arbitrary resolution. DPFs are naturally compatible with point-level
supervision. We showcase the effectiveness of DPFs using two substantially
different tasks: high-level semantic parsing and low-level intrinsic image
decomposition. In these two cases, supervision comes in the form of
single-point semantic category and two-point relative reflectance,
respectively. As benchmarked by three large-scale public datasets
PASCALContext, ADE20K and IIW, DPFs set new state-of-the-art performance on all
of them with significant margins.
Code can be accessed at https://github.com/cxx226/DPF
ASSIST: Interactive Scene Nodes for Scalable and Realistic Indoor Simulation
We present ASSIST, an object-wise neural radiance field as a panoptic
representation for compositional and realistic simulation. Central to our
approach is a novel scene node data structure that stores the information of
each object in a unified fashion, allowing online interaction in both intra-
and cross-scene settings. By incorporating a differentiable neural network
along with the associated bounding box and semantic features, the proposed
structure guarantees user-friendly interaction on independent objects to scale
up novel view simulation. Objects in the scene can be queried, added,
duplicated, deleted, transformed, or swapped simply through mouse/keyboard
controls or language instructions. Experiments demonstrate the efficacy of the
proposed method, where scaled realistic simulation can be achieved through
interactive editing and compositional rendering, with color images, depth
images, and panoptic segmentation masks generated in a 3D consistent manner
LATITUDE: Robotic Global Localization with Truncated Dynamic Low-pass Filter in City-scale NeRF
Neural Radiance Fields (NeRFs) have made great success in representing
complex 3D scenes with high-resolution details and efficient memory.
Nevertheless, current NeRF-based pose estimators have no initial pose
prediction and are prone to local optima during optimization. In this paper, we
present LATITUDE: Global Localization with Truncated Dynamic Low-pass Filter,
which introduces a two-stage localization mechanism in city-scale NeRF. In
place recognition stage, we train a regressor through images generated from
trained NeRFs, which provides an initial value for global localization. In pose
optimization stage, we minimize the residual between the observed image and
rendered image by directly optimizing the pose on tangent plane. To avoid
convergence to local optimum, we introduce a Truncated Dynamic Low-pass Filter
(TDLF) for coarse-to-fine pose registration. We evaluate our method on both
synthetic and real-world data and show its potential applications for
high-precision navigation in large-scale city scenes. Codes and data will be
publicly available at https://github.com/jike5/LATITUDE.Comment: 7 pages, 6 figures, submitted to ICRA 202
H2O+: An Improved Framework for Hybrid Offline-and-Online RL with Dynamics Gaps
Solving real-world complex tasks using reinforcement learning (RL) without
high-fidelity simulation environments or large amounts of offline data can be
quite challenging. Online RL agents trained in imperfect simulation
environments can suffer from severe sim-to-real issues. Offline RL approaches
although bypass the need for simulators, often pose demanding requirements on
the size and quality of the offline datasets. The recently emerged hybrid
offline-and-online RL provides an attractive framework that enables joint use
of limited offline data and imperfect simulator for transferable policy
learning. In this paper, we develop a new algorithm, called H2O+, which offers
great flexibility to bridge various choices of offline and online learning
methods, while also accounting for dynamics gaps between the real and
simulation environment. Through extensive simulation and real-world robotics
experiments, we demonstrate superior performance and flexibility over advanced
cross-domain online and offline RL algorithms
SurrealDriver: Designing Generative Driver Agent Simulation Framework in Urban Contexts based on Large Language Model
Simulation plays a critical role in the research and development of
autonomous driving and intelligent transportation systems. However, the current
simulation platforms exhibit limitations in the realism and diversity of agent
behaviors, which impede the transfer of simulation outcomes to the real world.
In this paper, we propose a generative driver agent simulation framework based
on large language models (LLMs), capable of perceiving complex traffic
scenarios and providing realistic driving maneuvers. Notably, we conducted
interviews with 24 drivers and used their detailed descriptions of driving
behavior as chain-of-thought prompts to develop a `coach agent' module, which
can evaluate and assist driver agents in accumulating driving experience and
developing human-like driving styles. Through practical simulation experiments
and user experiments, we validate the feasibility of this framework in
generating reliable driver agents and analyze the roles of each module. The
results show that the framework with full architect decreased the collision
rate by 81.04% and increased the human-likeness by 50%. Our research proposes
the first urban context driver agent simulation framework based on LLMs and
provides valuable insights into the future of agent simulation for complex
tasks.Comment: 12 pages, 8 figure
3D Implicit Transporter for Temporally Consistent Keypoint Discovery
Keypoint-based representation has proven advantageous in various visual and
robotic tasks. However, the existing 2D and 3D methods for detecting keypoints
mainly rely on geometric consistency to achieve spatial alignment, neglecting
temporal consistency. To address this issue, the Transporter method was
introduced for 2D data, which reconstructs the target frame from the source
frame to incorporate both spatial and temporal information. However, the direct
application of the Transporter to 3D point clouds is infeasible due to their
structural differences from 2D images. Thus, we propose the first 3D version of
the Transporter, which leverages hybrid 3D representation, cross attention, and
implicit reconstruction. We apply this new learning system on 3D articulated
objects and nonrigid animals (humans and rodents) and show that learned
keypoints are spatio-temporally consistent. Additionally, we propose a
closed-loop control strategy that utilizes the learned keypoints for 3D object
manipulation and demonstrate its superior performance. Codes are available at
https://github.com/zhongcl-thu/3D-Implicit-Transporter.Comment: ICCV2023 oral pape
On-board Inertial-assisted Visual Odometer on an Embedded System
In this paper, we propose a novel inertial-assisted visual odometry system intended for low-cost micro aerial vehicles (MAVs). The system sensor assembly consists of two downward-facing cameras and an inertial measurement unit (IMU) with three-axis accelerometers/gyroscopes. Real-time implementation of the system is enabled by a low-cost embedded system via two important features: firstly, simple pixel-level algorithms are integrated in a low-end FPGA and accelerated via pipeline and combinational logic techniques; secondly, a fast yaw-and-translation estimation algorithm works well with a novel outlier rejection scheme based on probabilistic predetermined operations rather than hypothesis testing iterations. We illustrate the performance of our system by hovering a MAV in a GPS-denied environment. Its feasibility and robustness is also illustrated in complex outdoor environments