26 research outputs found
Predict to Detect: Prediction-guided 3D Object Detection using Sequential Images
Recent camera-based 3D object detection methods have introduced sequential
frames to improve the detection performance hoping that multiple frames would
mitigate the large depth estimation error. Despite improved detection
performance, prior works rely on naive fusion methods (e.g., concatenation) or
are limited to static scenes (e.g., temporal stereo), neglecting the importance
of the motion cue of objects. These approaches do not fully exploit the
potential of sequential images and show limited performance improvements. To
address this limitation, we propose a novel 3D object detection model, P2D
(Predict to Detect), that integrates a prediction scheme into a detection
framework to explicitly extract and leverage motion features. P2D predicts
object information in the current frame using solely past frames to learn
temporal motion features. We then introduce a novel temporal feature
aggregation method that attentively exploits Bird's-Eye-View (BEV) features
based on predicted object information, resulting in accurate 3D object
detection. Experimental results demonstrate that P2D improves mAP and NDS by
3.0% and 3.7% compared to the sequential image-based baseline, illustrating
that incorporating a prediction scheme can significantly improve detection
accuracy.Comment: ICCV 202
InstaGraM: Instance-level Graph Modeling for Vectorized HD Map Learning
Inferring traffic object such as lane information is of foremost importance
for deployment of autonomous driving. Previous approaches focus on offline
construction of HD map inferred with GPS localization, which is insufficient
for globally scalable autonomous driving. To alleviate these issues, we propose
online HD map learning framework that detects HD map elements from onboard
sensor observations. We represent the map elements as a graph; we propose
InstaGraM, instance-level graph modeling of HD map that brings accurate and
fast end-to-end vectorized HD map learning. Along with the graph modeling
strategy, we propose end-to-end neural network composed of three stages: a
unified BEV feature extraction, map graph component detection, and association
via graph neural networks. Comprehensive experiments on public open dataset
show that our proposed network outperforms previous models by up to 13.7 mAP
with up to 33.8X faster computation time.Comment: Workshop on Vision-Centric Autonomous Driving (VCAD) at Conference on
Computer Vision and Pattern Recognition (CVPR) 202
RadarDistill: Boosting Radar-based Object Detection Performance via Knowledge Distillation from LiDAR Features
The inherent noisy and sparse characteristics of radar data pose challenges
in finding effective representations for 3D object detection. In this paper, we
propose RadarDistill, a novel knowledge distillation (KD) method, which can
improve the representation of radar data by leveraging LiDAR data. RadarDistill
successfully transfers desirable characteristics of LiDAR features into radar
features using three key components: Cross-Modality Alignment (CMA),
Activation-based Feature Distillation (AFD), and Proposal-based Feature
Distillation (PFD). CMA enhances the density of radar features by employing
multiple layers of dilation operations, effectively addressing the challenge of
inefficient knowledge transfer from LiDAR to radar. AFD selectively transfers
knowledge based on regions of the LiDAR features, with a specific focus on
areas where activation intensity exceeds a predefined threshold. PFD similarly
guides the radar network to selectively mimic features from the LiDAR network
within the object proposals. Our comparative analyses conducted on the nuScenes
datasets demonstrate that RadarDistill achieves state-of-the-art (SOTA)
performance for radar-only object detection task, recording 20.5% in mAP and
43.7% in NDS. Also, RadarDistill significantly improves the performance of the
camera-radar fusion model.Comment: Accepted to IEEE/CVF Conference on Computer Vision and Pattern
Recognition (CVPR) 2024, 10 pages, 3 figure
3D Dual-Fusion: Dual-Domain Dual-Query Camera-LiDAR Fusion for 3D Object Detection
Fusing data from cameras and LiDAR sensors is an essential technique to
achieve robust 3D object detection. One key challenge in camera-LiDAR fusion
involves mitigating the large domain gap between the two sensors in terms of
coordinates and data distribution when fusing their features. In this paper, we
propose a novel camera-LiDAR fusion architecture called, 3D Dual-Fusion, which
is designed to mitigate the gap between the feature representations of camera
and LiDAR data. The proposed method fuses the features of the camera-view and
3D voxel-view domain and models their interactions through deformable
attention. We redesign the transformer fusion encoder to aggregate the
information from the two domains. Two major changes include 1) dual query-based
deformable attention to fuse the dual-domain features interactively and 2) 3D
local self-attention to encode the voxel-domain queries prior to dual-query
decoding. The results of an experimental evaluation show that the proposed
camera-LiDAR fusion architecture achieved competitive performance on the KITTI
and nuScenes datasets, with state-of-the-art performances in some 3D object
detection benchmarks categories.Comment: 12 pages, 3 figure
CRN: Camera Radar Net for Accurate, Robust, Efficient 3D Perception
Autonomous driving requires an accurate and fast 3D perception system that
includes 3D object detection, tracking, and segmentation. Although recent
low-cost camera-based approaches have shown promising results, they are
susceptible to poor illumination or bad weather conditions and have a large
localization error. Hence, fusing camera with low-cost radar, which provides
precise long-range measurement and operates reliably in all environments, is
promising but has not yet been thoroughly investigated. In this paper, we
propose Camera Radar Net (CRN), a novel camera-radar fusion framework that
generates a semantically rich and spatially accurate bird's-eye-view (BEV)
feature map for various tasks. To overcome the lack of spatial information in
an image, we transform perspective view image features to BEV with the help of
sparse but accurate radar points. We further aggregate image and radar feature
maps in BEV using multi-modal deformable attention designed to tackle the
spatial misalignment between inputs. CRN with real-time setting operates at 20
FPS while achieving comparable performance to LiDAR detectors on nuScenes, and
even outperforms at a far distance on 100m setting. Moreover, CRN with offline
setting yields 62.4% NDS, 57.5% mAP on nuScenes test set and ranks first among
all camera and camera-radar 3D object detectors.Comment: IEEE/CVF International Conference on Computer Vision (ICCV'23
RCM-Fusion: Radar-Camera Multi-Level Fusion for 3D Object Detection
While LiDAR sensors have been succesfully applied to 3D object detection, the
affordability of radar and camera sensors has led to a growing interest in
fusiong radars and cameras for 3D object detection. However, previous
radar-camera fusion models have not been able to fully utilize radar
information in that initial 3D proposals were generated based on the camera
features only and the instance-level fusion is subsequently conducted. In this
paper, we propose radar-camera multi-level fusion (RCM-Fusion), which fuses
radar and camera modalities at both the feature-level and instance-level to
fully utilize radar information. At the feature-level, we propose a Radar
Guided BEV Encoder which utilizes radar Bird's-Eye-View (BEV) features to
transform image features into precise BEV representations and then adaptively
combines the radar and camera BEV features. At the instance-level, we propose a
Radar Grid Point Refinement module that reduces localization error by
considering the characteristics of the radar point clouds. The experiments
conducted on the public nuScenes dataset demonstrate that our proposed
RCM-Fusion offers 11.8% performance gain in nuScenes detection score (NDS) over
the camera-only baseline model and achieves state-of-the-art performaces among
radar-camera fusion methods in the nuScenes 3D object detection benchmark. Code
will be made publicly available.Comment: 10 pages, 5 figure
Recommended from our members
South Korea tutorial via web
Since I came to the U.S.A. to study, I have noticed that Americans do not know much about Korea. This was the motivation that led me to create this web site in order to educate and make fellow students and Americans more interested in Korea. The web site was generated by "Dream Weaver", web page generating software. Most of the information on the web site was researched from the Internet. After the web site was uploaded, I used the survey method in order to find out the quality, achievement, and effectiveness of the web site. First, the pre-test was taken by the volunteers to find out how much they already knew about Korea. Next, the volunteers were given a week to surf and read information on the web site. After using the web site, the post-test was given to the volunteers in order to see their response to the web site and how much they learned. Lastly, the results were analyzed and showed that volunteers have significantly improved. The address of the web site in the project is "www.wpi.edui/~dkum"
Modeling and Optimal Control of Parallel HEVs and Plug-in HEVs for Multiple Objectives.
For the simultaneous optimization of fuel economy and emissions, we first develop a parallel HEV (and PHEV) model that can efficiently evaluate both fuel economy and tail-pipe emissions, and then solve the optimal control problem that minimizes fuel consumption and emissions for a cold-start driving cycle using Dynamic Programming (DP). Based on DP results, a comprehensive extraction method is developed to extract implementable optimal control strategies over the entire state space, instead of a single optimal trajectory. This method is applied to both HEVs and PHEVs to extract both optimal energy management and catalytic converter temperature management strategies. For the optimal energy management of PHEVs under known trip distances, a new variable Energy-to-Distance Ratio (EDR) is introduced to quantify the level of battery state-of-charge (SOC) with respect to the remaining distance. The extracted results show that the engine on/off, gear-shift, and power-split strategies must be properly adjusted to optimize fuel economy and tail-pipe emission. Based on the extracted results, a DP-based cold-start supervisory powertrain controller (SPC) is designed and compared with instantaneous optimization methods. Simulation results show that instantaneous optimization methods are good for the optimization of fuel economy despite frequent engine on/off and gear-shift events, but the DP-based SPC performs better when multiple objectives are considered.
For the engine-start control problem, a more detailed powertrain model, including clutch and crank-angle domain engine models, is developed. Assuming that the clutch torque can be accurately estimated and perfectly cancelled, the optimal engine-start control problem is formulated to minimize engine-start time while accurately supplying the driver torque demand. This nonlinear optimal control problem is solved both analytically and numerically. Under special cases, the optimization problem can be analytically solved to obtain a closed form solution. DP, on the other hand, is used to obtain numerical solutions for all cases, and the results confirm that the numerical solution matches with the analytical solution. More importantly, the DP control policy is found to be time-invariant, and thus can be directly implemented in the form of a full state feedback controller.Ph.D.Mechanical EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/77784/1/dkum_1.pd
ROBUST CONTROL OF ACTIVE SUSPENSIONS
ABSTRACT Active suspension has been widely studied in recent decades but the implementation of the single-input, singleoutput (SISO) force-control architecture that many of the prior studies use has had limited success due to the lightly damped zeros. The inherent trade-off between robust stability and road disturbance attenuation for SISO control architecture is the main culprit. In this paper, we study whether the single-input, two-output (SITO) control architecture provides sufficient degrees of freedom in the control synthesis. First, a quarter car model with an electromagnetic motor is derived and the improved LQG/LTR design technique is employed to simultaneously recover both stability robustness and disturbance attenuation properties at the expense of measurement noise sensitivity. It was found that if the control system is restricted to SISO architecture, sprung mass acceleration is the most promising choice among practical measurements. Both classical and modern control approaches are used to analyze the effectiveness of the proposed method and its closed-loop performance. Simulation results show that stability robustness and disturbance attenuation can be dramatically improved by the SITO architecture over its SISO counterpart