236 research outputs found
Photonic integrated circuit design in a foundry+fabless ecosystem
A foundry-based photonic ecosystem is expected to become necessary with increasing demand and adoption of photonics for commercial products. To make foundry-enabled photonics a real success, the photonic circuit design flow should adopt known concepts from analog and mixed signal electronics. Based on the similarities and differences between the existing photonic and the standardized electronics design flow, we project the needs and evolution of the photonic design flow, such as schematic driven design, accurate behavioral models, and yield prediction in the presence of fabrication variability
Diffusion-Guided Reconstruction of Everyday Hand-Object Interaction Clips
We tackle the task of reconstructing hand-object interactions from short
video clips. Given an input video, our approach casts 3D inference as a
per-video optimization and recovers a neural 3D representation of the object
shape, as well as the time-varying motion and hand articulation. While the
input video naturally provides some multi-view cues to guide 3D inference,
these are insufficient on their own due to occlusions and limited viewpoint
variations. To obtain accurate 3D, we augment the multi-view signals with
generic data-driven priors to guide reconstruction. Specifically, we learn a
diffusion network to model the conditional distribution of (geometric)
renderings of objects conditioned on hand configuration and category label, and
leverage it as a prior to guide the novel-view renderings of the reconstructed
scene. We empirically evaluate our approach on egocentric videos across 6
object categories, and observe significant improvements over prior single-view
and multi-view methods. Finally, we demonstrate our system's ability to
reconstruct arbitrary clips from YouTube, showing both 1st and 3rd person
interactions.Comment: Accepted to ICCV23 (Oral). Project Page:
https://judyye.github.io/diffhoi-www
Numerical modeling of a linear photonic system for accurate and efficient time-domain simulations
In this paper, a novel modeling and simulation method for general linear, time-invariant, passive photonic devices and circuits is proposed. This technique, starting from the scattering parameters of the photonic system under study, builds a baseband equivalent state-space model that splits the optical carrier frequency and operates at baseband, thereby significantly reducing the modeling and simulation complexity without losing accuracy. Indeed, it is possible to analytically reconstruct the port signals of the photonic system under study starting from the time-domain simulation of the corresponding baseband equivalent model. However, such equivalent models are complex-valued systems and, in this scenario, the conventional passivity constraints are not applicable anymore. Hence, the passivity constraints for scattering parameters and state-space models of baseband equivalent systems are presented, which are essential for time-domain simulations. Three suitable examples demonstrate the feasibility, accuracy, and efficiency of the proposed method. (C) 2018 Chinese Laser Pres
LPFormer: LiDAR Pose Estimation Transformer with Multi-Task Network
In this technical report, we present the 1st place solution for the 2023
Waymo Open Dataset Pose Estimation challenge. Due to the difficulty of
acquiring large-scale 3D human keypoint annotation, previous methods have
commonly relied on 2D image features and 2D sequential annotations for 3D human
pose estimation. In contrast, our proposed method, named LPFormer, uses only
LiDAR as its input along with its corresponding 3D annotations. LPFormer
consists of two stages: the first stage detects the human bounding box and
extracts multi-level feature representations, while the second stage employs a
transformer-based network to regress the human keypoints using these features.
Experimental results on the Waymo Open Dataset demonstrate the top performance,
and improvements even compared to previous multi-modal solutions.Comment: Technical report of the top solution for the Waymo Open Dataset
Challenges 2023 - Pose Estimation. CVPR 2023 Workshop on Autonomous Drivin
LidarMultiNet: Towards a Unified Multi-task Network for LiDAR Perception
LiDAR-based 3D object detection, semantic segmentation, and panoptic
segmentation are usually implemented in specialized networks with distinctive
architectures that are difficult to adapt to each other. This paper presents
LidarMultiNet, a LiDAR-based multi-task network that unifies these three major
LiDAR perception tasks. Among its many benefits, a multi-task network can
reduce the overall cost by sharing weights and computation among multiple
tasks. However, it typically underperforms compared to independently combined
single-task models. The proposed LidarMultiNet aims to bridge the performance
gap between the multi-task network and multiple single-task networks. At the
core of LidarMultiNet is a strong 3D voxel-based encoder-decoder architecture
with a Global Context Pooling (GCP) module extracting global contextual
features from a LiDAR frame. Task-specific heads are added on top of the
network to perform the three LiDAR perception tasks. More tasks can be
implemented simply by adding new task-specific heads while introducing little
additional cost. A second stage is also proposed to refine the first-stage
segmentation and generate accurate panoptic segmentation results. LidarMultiNet
is extensively tested on both Waymo Open Dataset and nuScenes dataset,
demonstrating for the first time that major LiDAR perception tasks can be
unified in a single strong network that is trained end-to-end and achieves
state-of-the-art performance. Notably, LidarMultiNet reaches the official 1st
place in the Waymo Open Dataset 3D semantic segmentation challenge 2022 with
the highest mIoU and the best accuracy for most of the 22 classes on the test
set, using only LiDAR points as input. It also sets the new state-of-the-art
for a single model on the Waymo 3D object detection benchmark and three
nuScenes benchmarks.Comment: Full-length paper extending our previous technical report of the 1st
place solution of the 2022 Waymo Open Dataset 3D Semantic Segmentation
challenge, including evaluations on 5 major benchmarks. arXiv admin note:
text overlap with arXiv:2206.1142
LiDARFormer: A Unified Transformer-based Multi-task Network for LiDAR Perception
There is a recent trend in the LiDAR perception field towards unifying
multiple tasks in a single strong network with improved performance, as opposed
to using separate networks for each task. In this paper, we introduce a new
LiDAR multi-task learning paradigm based on the transformer. The proposed
LiDARFormer utilizes cross-space global contextual feature information and
exploits cross-task synergy to boost the performance of LiDAR perception tasks
across multiple large-scale datasets and benchmarks. Our novel
transformer-based framework includes a cross-space transformer module that
learns attentive features between the 2D dense Bird's Eye View (BEV) and 3D
sparse voxel feature maps. Additionally, we propose a transformer decoder for
the segmentation task to dynamically adjust the learned features by leveraging
the categorical feature representations. Furthermore, we combine the
segmentation and detection features in a shared transformer decoder with
cross-task attention layers to enhance and integrate the object-level and
class-level features. LiDARFormer is evaluated on the large-scale nuScenes and
the Waymo Open datasets for both 3D detection and semantic segmentation tasks,
and it outperforms all previously published methods on both tasks. Notably,
LiDARFormer achieves the state-of-the-art performance of 76.4% L2 mAPH and
74.3% NDS on the challenging Waymo and nuScenes detection benchmarks for a
single model LiDAR-only method.Comment: ICRA 202
- …