26 research outputs found

    Predict to Detect: Prediction-guided 3D Object Detection using Sequential Images

    Full text link
    Recent camera-based 3D object detection methods have introduced sequential frames to improve the detection performance hoping that multiple frames would mitigate the large depth estimation error. Despite improved detection performance, prior works rely on naive fusion methods (e.g., concatenation) or are limited to static scenes (e.g., temporal stereo), neglecting the importance of the motion cue of objects. These approaches do not fully exploit the potential of sequential images and show limited performance improvements. To address this limitation, we propose a novel 3D object detection model, P2D (Predict to Detect), that integrates a prediction scheme into a detection framework to explicitly extract and leverage motion features. P2D predicts object information in the current frame using solely past frames to learn temporal motion features. We then introduce a novel temporal feature aggregation method that attentively exploits Bird's-Eye-View (BEV) features based on predicted object information, resulting in accurate 3D object detection. Experimental results demonstrate that P2D improves mAP and NDS by 3.0% and 3.7% compared to the sequential image-based baseline, illustrating that incorporating a prediction scheme can significantly improve detection accuracy.Comment: ICCV 202

    InstaGraM: Instance-level Graph Modeling for Vectorized HD Map Learning

    Full text link
    Inferring traffic object such as lane information is of foremost importance for deployment of autonomous driving. Previous approaches focus on offline construction of HD map inferred with GPS localization, which is insufficient for globally scalable autonomous driving. To alleviate these issues, we propose online HD map learning framework that detects HD map elements from onboard sensor observations. We represent the map elements as a graph; we propose InstaGraM, instance-level graph modeling of HD map that brings accurate and fast end-to-end vectorized HD map learning. Along with the graph modeling strategy, we propose end-to-end neural network composed of three stages: a unified BEV feature extraction, map graph component detection, and association via graph neural networks. Comprehensive experiments on public open dataset show that our proposed network outperforms previous models by up to 13.7 mAP with up to 33.8X faster computation time.Comment: Workshop on Vision-Centric Autonomous Driving (VCAD) at Conference on Computer Vision and Pattern Recognition (CVPR) 202

    RadarDistill: Boosting Radar-based Object Detection Performance via Knowledge Distillation from LiDAR Features

    Full text link
    The inherent noisy and sparse characteristics of radar data pose challenges in finding effective representations for 3D object detection. In this paper, we propose RadarDistill, a novel knowledge distillation (KD) method, which can improve the representation of radar data by leveraging LiDAR data. RadarDistill successfully transfers desirable characteristics of LiDAR features into radar features using three key components: Cross-Modality Alignment (CMA), Activation-based Feature Distillation (AFD), and Proposal-based Feature Distillation (PFD). CMA enhances the density of radar features by employing multiple layers of dilation operations, effectively addressing the challenge of inefficient knowledge transfer from LiDAR to radar. AFD selectively transfers knowledge based on regions of the LiDAR features, with a specific focus on areas where activation intensity exceeds a predefined threshold. PFD similarly guides the radar network to selectively mimic features from the LiDAR network within the object proposals. Our comparative analyses conducted on the nuScenes datasets demonstrate that RadarDistill achieves state-of-the-art (SOTA) performance for radar-only object detection task, recording 20.5% in mAP and 43.7% in NDS. Also, RadarDistill significantly improves the performance of the camera-radar fusion model.Comment: Accepted to IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2024, 10 pages, 3 figure

    3D Dual-Fusion: Dual-Domain Dual-Query Camera-LiDAR Fusion for 3D Object Detection

    Full text link
    Fusing data from cameras and LiDAR sensors is an essential technique to achieve robust 3D object detection. One key challenge in camera-LiDAR fusion involves mitigating the large domain gap between the two sensors in terms of coordinates and data distribution when fusing their features. In this paper, we propose a novel camera-LiDAR fusion architecture called, 3D Dual-Fusion, which is designed to mitigate the gap between the feature representations of camera and LiDAR data. The proposed method fuses the features of the camera-view and 3D voxel-view domain and models their interactions through deformable attention. We redesign the transformer fusion encoder to aggregate the information from the two domains. Two major changes include 1) dual query-based deformable attention to fuse the dual-domain features interactively and 2) 3D local self-attention to encode the voxel-domain queries prior to dual-query decoding. The results of an experimental evaluation show that the proposed camera-LiDAR fusion architecture achieved competitive performance on the KITTI and nuScenes datasets, with state-of-the-art performances in some 3D object detection benchmarks categories.Comment: 12 pages, 3 figure

    CRN: Camera Radar Net for Accurate, Robust, Efficient 3D Perception

    Full text link
    Autonomous driving requires an accurate and fast 3D perception system that includes 3D object detection, tracking, and segmentation. Although recent low-cost camera-based approaches have shown promising results, they are susceptible to poor illumination or bad weather conditions and have a large localization error. Hence, fusing camera with low-cost radar, which provides precise long-range measurement and operates reliably in all environments, is promising but has not yet been thoroughly investigated. In this paper, we propose Camera Radar Net (CRN), a novel camera-radar fusion framework that generates a semantically rich and spatially accurate bird's-eye-view (BEV) feature map for various tasks. To overcome the lack of spatial information in an image, we transform perspective view image features to BEV with the help of sparse but accurate radar points. We further aggregate image and radar feature maps in BEV using multi-modal deformable attention designed to tackle the spatial misalignment between inputs. CRN with real-time setting operates at 20 FPS while achieving comparable performance to LiDAR detectors on nuScenes, and even outperforms at a far distance on 100m setting. Moreover, CRN with offline setting yields 62.4% NDS, 57.5% mAP on nuScenes test set and ranks first among all camera and camera-radar 3D object detectors.Comment: IEEE/CVF International Conference on Computer Vision (ICCV'23

    RCM-Fusion: Radar-Camera Multi-Level Fusion for 3D Object Detection

    Full text link
    While LiDAR sensors have been succesfully applied to 3D object detection, the affordability of radar and camera sensors has led to a growing interest in fusiong radars and cameras for 3D object detection. However, previous radar-camera fusion models have not been able to fully utilize radar information in that initial 3D proposals were generated based on the camera features only and the instance-level fusion is subsequently conducted. In this paper, we propose radar-camera multi-level fusion (RCM-Fusion), which fuses radar and camera modalities at both the feature-level and instance-level to fully utilize radar information. At the feature-level, we propose a Radar Guided BEV Encoder which utilizes radar Bird's-Eye-View (BEV) features to transform image features into precise BEV representations and then adaptively combines the radar and camera BEV features. At the instance-level, we propose a Radar Grid Point Refinement module that reduces localization error by considering the characteristics of the radar point clouds. The experiments conducted on the public nuScenes dataset demonstrate that our proposed RCM-Fusion offers 11.8% performance gain in nuScenes detection score (NDS) over the camera-only baseline model and achieves state-of-the-art performaces among radar-camera fusion methods in the nuScenes 3D object detection benchmark. Code will be made publicly available.Comment: 10 pages, 5 figure

    Modeling and Optimal Control of Parallel HEVs and Plug-in HEVs for Multiple Objectives.

    Full text link
    For the simultaneous optimization of fuel economy and emissions, we first develop a parallel HEV (and PHEV) model that can efficiently evaluate both fuel economy and tail-pipe emissions, and then solve the optimal control problem that minimizes fuel consumption and emissions for a cold-start driving cycle using Dynamic Programming (DP). Based on DP results, a comprehensive extraction method is developed to extract implementable optimal control strategies over the entire state space, instead of a single optimal trajectory. This method is applied to both HEVs and PHEVs to extract both optimal energy management and catalytic converter temperature management strategies. For the optimal energy management of PHEVs under known trip distances, a new variable Energy-to-Distance Ratio (EDR) is introduced to quantify the level of battery state-of-charge (SOC) with respect to the remaining distance. The extracted results show that the engine on/off, gear-shift, and power-split strategies must be properly adjusted to optimize fuel economy and tail-pipe emission. Based on the extracted results, a DP-based cold-start supervisory powertrain controller (SPC) is designed and compared with instantaneous optimization methods. Simulation results show that instantaneous optimization methods are good for the optimization of fuel economy despite frequent engine on/off and gear-shift events, but the DP-based SPC performs better when multiple objectives are considered. For the engine-start control problem, a more detailed powertrain model, including clutch and crank-angle domain engine models, is developed. Assuming that the clutch torque can be accurately estimated and perfectly cancelled, the optimal engine-start control problem is formulated to minimize engine-start time while accurately supplying the driver torque demand. This nonlinear optimal control problem is solved both analytically and numerically. Under special cases, the optimization problem can be analytically solved to obtain a closed form solution. DP, on the other hand, is used to obtain numerical solutions for all cases, and the results confirm that the numerical solution matches with the analytical solution. More importantly, the DP control policy is found to be time-invariant, and thus can be directly implemented in the form of a full state feedback controller.Ph.D.Mechanical EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/77784/1/dkum_1.pd

    ROBUST CONTROL OF ACTIVE SUSPENSIONS

    Full text link
    ABSTRACT Active suspension has been widely studied in recent decades but the implementation of the single-input, singleoutput (SISO) force-control architecture that many of the prior studies use has had limited success due to the lightly damped zeros. The inherent trade-off between robust stability and road disturbance attenuation for SISO control architecture is the main culprit. In this paper, we study whether the single-input, two-output (SITO) control architecture provides sufficient degrees of freedom in the control synthesis. First, a quarter car model with an electromagnetic motor is derived and the improved LQG/LTR design technique is employed to simultaneously recover both stability robustness and disturbance attenuation properties at the expense of measurement noise sensitivity. It was found that if the control system is restricted to SISO architecture, sprung mass acceleration is the most promising choice among practical measurements. Both classical and modern control approaches are used to analyze the effectiveness of the proposed method and its closed-loop performance. Simulation results show that stability robustness and disturbance attenuation can be dramatically improved by the SITO architecture over its SISO counterpart
    corecore