631 research outputs found
SelfOcc: Self-Supervised Vision-Based 3D Occupancy Prediction
3D occupancy prediction is an important task for the robustness of
vision-centric autonomous driving, which aims to predict whether each point is
occupied in the surrounding 3D space. Existing methods usually require 3D
occupancy labels to produce meaningful results. However, it is very laborious
to annotate the occupancy status of each voxel. In this paper, we propose
SelfOcc to explore a self-supervised way to learn 3D occupancy using only video
sequences. We first transform the images into the 3D space (e.g., bird's eye
view) to obtain 3D representation of the scene. We directly impose constraints
on the 3D representations by treating them as signed distance fields. We can
then render 2D images of previous and future frames as self-supervision signals
to learn the 3D representations. We propose an MVS-embedded strategy to
directly optimize the SDF-induced weights with multiple depth proposals. Our
SelfOcc outperforms the previous best method SceneRF by 58.7% using a single
frame as input on SemanticKITTI and is the first self-supervised work that
produces reasonable 3D occupancy for surround cameras on nuScenes. SelfOcc
produces high-quality depth and achieves state-of-the-art results on novel
depth synthesis, monocular depth estimation, and surround-view depth estimation
on the SemanticKITTI, KITTI-2015, and nuScenes, respectively. Code:
https://github.com/huang-yh/SelfOcc.Comment: Code is available at: https://github.com/huang-yh/SelfOc
Ultra-high-linearity integrated lithium niobate electro-optic modulators
Integrated lithium niobate (LN) photonics is a promising platform for future
chip-scale microwave photonics systems owing to its unique electro-optic
properties, low optical loss and excellent scalability. A key enabler for such
systems is a highly linear electro-optic modulator that could faithfully covert
analog electrical signals into optical signals. In this work, we demonstrate a
monolithic integrated LN modulator with an ultrahigh spurious-free dynamic
range (SFDR) of 120.04 dB Hz4/5 at 1 GHz, using a ring-assisted Mach-Zehnder
interferometer configuration. The excellent synergy between the intrinsically
linear electro-optic response of LN and an optimized linearization strategy
allows us to fully suppress the cubic terms of third-order intermodulation
distortions (IMD3) without active feedback controls, leading to ~ 20 dB
improvement over previous results in the thin-film LN platform. Our
ultra-high-linearity LN modulators could become a core building block for
future large-scale functional microwave photonic integrated circuits, by
further integration with other high-performance components like low-loss delay
lines, tunable filters and phase shifters available on the LN platform
Exploring Unified Perspective For Fast Shapley Value Estimation
Shapley values have emerged as a widely accepted and trustworthy tool,
grounded in theoretical axioms, for addressing challenges posed by black-box
models like deep neural networks. However, computing Shapley values encounters
exponential complexity in the number of features. Various approaches, including
ApproSemivalue, KernelSHAP, and FastSHAP, have been explored to expedite the
computation. We analyze the consistency of existing works and conclude that
stochastic estimators can be unified as the linear transformation of importance
sampling of feature subsets. Based on this, we investigate the possibility of
designing simple amortized estimators and propose a straightforward and
efficient one, SimSHAP, by eliminating redundant techniques. Extensive
experiments conducted on tabular and image datasets validate the effectiveness
of our SimSHAP, which significantly accelerates the computation of accurate
Shapley values
OccWorld: Learning a 3D Occupancy World Model for Autonomous Driving
Understanding how the 3D scene evolves is vital for making decisions in
autonomous driving. Most existing methods achieve this by predicting the
movements of object boxes, which cannot capture more fine-grained scene
information. In this paper, we explore a new framework of learning a world
model, OccWorld, in the 3D Occupancy space to simultaneously predict the
movement of the ego car and the evolution of the surrounding scenes. We propose
to learn a world model based on 3D occupancy rather than 3D bounding boxes and
segmentation maps for three reasons: 1) expressiveness. 3D occupancy can
describe the more fine-grained 3D structure of the scene; 2) efficiency. 3D
occupancy is more economical to obtain (e.g., from sparse LiDAR points). 3)
versatility. 3D occupancy can adapt to both vision and LiDAR. To facilitate the
modeling of the world evolution, we learn a reconstruction-based scene
tokenizer on the 3D occupancy to obtain discrete scene tokens to describe the
surrounding scenes. We then adopt a GPT-like spatial-temporal generative
transformer to generate subsequent scene and ego tokens to decode the future
occupancy and ego trajectory. Extensive experiments on the widely used nuScenes
benchmark demonstrate the ability of OccWorld to effectively model the
evolution of the driving scenes. OccWorld also produces competitive planning
results without using instance and map supervision. Code:
https://github.com/wzzheng/OccWorld.Comment: Code is available at: https://github.com/wzzheng/OccWorl
- …