1,570 research outputs found
Deep Lidar CNN to Understand the Dynamics of Moving Vehicles
Perception technologies in Autonomous Driving are experiencing their golden
age due to the advances in Deep Learning. Yet, most of these systems rely on
the semantically rich information of RGB images. Deep Learning solutions
applied to the data of other sensors typically mounted on autonomous cars (e.g.
lidars or radars) are not explored much. In this paper we propose a novel
solution to understand the dynamics of moving vehicles of the scene from only
lidar information. The main challenge of this problem stems from the fact that
we need to disambiguate the proprio-motion of the 'observer' vehicle from that
of the external 'observed' vehicles. For this purpose, we devise a CNN
architecture which at testing time is fed with pairs of consecutive lidar
scans. However, in order to properly learn the parameters of this network,
during training we introduce a series of so-called pretext tasks which also
leverage on image data. These tasks include semantic information about
vehicleness and a novel lidar-flow feature which combines standard image-based
optical flow with lidar scans. We obtain very promising results and show that
including distilled image information only during training, allows improving
the inference results of the network at test time, even when image data is no
longer used.Comment: Presented in IEEE ICRA 2018. IEEE Copyrights: Personal use of this
material is permitted. Permission from IEEE must be obtained for all other
uses. (V2 just corrected comments on arxiv submission
A Comprehensive Survey of Deep Learning in Remote Sensing: Theories, Tools and Challenges for the Community
In recent years, deep learning (DL), a re-branding of neural networks (NNs),
has risen to the top in numerous areas, namely computer vision (CV), speech
recognition, natural language processing, etc. Whereas remote sensing (RS)
possesses a number of unique challenges, primarily related to sensors and
applications, inevitably RS draws from many of the same theories as CV; e.g.,
statistics, fusion, and machine learning, to name a few. This means that the RS
community should be aware of, if not at the leading edge of, of advancements
like DL. Herein, we provide the most comprehensive survey of state-of-the-art
RS DL research. We also review recent new developments in the DL field that can
be used in DL for RS. Namely, we focus on theories, tools and challenges for
the RS community. Specifically, we focus on unsolved challenges and
opportunities as it relates to (i) inadequate data sets, (ii)
human-understandable solutions for modelling physical phenomena, (iii) Big
Data, (iv) non-traditional heterogeneous data sources, (v) DL architectures and
learning algorithms for spectral, spatial and temporal data, (vi) transfer
learning, (vii) an improved theoretical understanding of DL systems, (viii)
high barriers to entry, and (ix) training and optimizing the DL.Comment: 64 pages, 411 references. To appear in Journal of Applied Remote
Sensin
Deep Learning With Effective Hierarchical Attention Mechanisms in Perception of Autonomous Vehicles
Autonomous vehicles need to gather and understand information from their surroundings to drive safely. Just like how we look around and understand what\u27s happening on the road, these vehicles need to see and make sense of dynamic objects like other cars, pedestrians, and cyclists, and static objects like crosswalks, road barriers, and stop lines.
In this dissertation, we aim to figure out better ways for computers to understand their surroundings in the 3D object detection task and map segmentation task. The 3D object detection task automatically spots objects in 3D (like cars or cyclists) and the map segmentation task automatically divides maps into different sections. To do this, we use attention modules to help the computer focus on important items. We create one network to find 3D objects such as cars on a highway, and one network to divide different parts of a map into different regions. Each of the networks utilizes the attention module and its hierarchical attention module to achieve comparable results with the best methods on challenging benchmarks.
We name the 3D object detection network as Point Cloud Detection Network (PCDet), which utilizes LiDAR sensors to obtain the point cloud inputs with accurate depth information. To solve the problem of lacking multi-scale features and using the high-semantic features ineffectively, the proposed PCDet utilizes Hierarchical Double-branch Spatial Attention (HDSA) to capture high-level and low-level features at the same time. PCDet applies the Double-branch Spatial Attention (DSA) at the early stage and the late stage of the network, which helps to use the high-level features at the beginning of the network and obtain the multiple-scale features. However, HDSA does not consider global relational information. This limitation is solved by Hierarchical Residual Graph Convolutional Attention (HRGCA). PCDet applies the HRGCA module, which contains both graph and coordinate information, to not only effectively acquire the global information but also efficiently estimate contextual relationships of the global information in the 3D point cloud.
We name the map segmentation network as Multi-View Segmentation in Bird\u27s-Eye-View (BEVSeg), which utilizes multiple cameras to obtain multi-view image inputs with plenty of colorful and textured information. The proposed BEVSeg aims to utilize high-level features effectively and solve the common overfitting problems in map segmentation tasks. Specifically, BEVSeg utilizes an Aligned BEV domain data Augmentation (ABA) module to flip, rotate, and scale the BEV feature map and repeat the same process on its ground truths to address overfitting issues. It further incorporates the hierarchical attention mechanisms, namely, HDSA and HRGCA, to effectively capture high-level and low-level features and to estimate global relationships between different regions in both the early stage and the late stage of the network, respectively.
In general, the proposed HDSA is able to capture the high-level features and help utilize the high-level features effectively in both LiDAR-based 3D object detection and multiple camera-based map segmentation tasks, i.e. PCDet and BEVSeg. In addition, we proposed a new effective HRGCA to further capture global relationships between different regions to improve both 3D object detection accuracy and map segmentation performance
Towards Label-free Scene Understanding by Vision Foundation Models
Vision foundation models such as Contrastive Vision-Language Pre-training
(CLIP) and Segment Anything (SAM) have demonstrated impressive zero-shot
performance on image classification and segmentation tasks. However, the
incorporation of CLIP and SAM for label-free scene understanding has yet to be
explored. In this paper, we investigate the potential of vision foundation
models in enabling networks to comprehend 2D and 3D worlds without labelled
data. The primary challenge lies in effectively supervising networks under
extremely noisy pseudo labels, which are generated by CLIP and further
exacerbated during the propagation from the 2D to the 3D domain. To tackle
these challenges, we propose a novel Cross-modality Noisy Supervision (CNS)
method that leverages the strengths of CLIP and SAM to supervise 2D and 3D
networks simultaneously. In particular, we introduce a prediction consistency
regularization to co-train 2D and 3D networks, then further impose the
networks' latent space consistency using the SAM's robust feature
representation. Experiments conducted on diverse indoor and outdoor datasets
demonstrate the superior performance of our method in understanding 2D and 3D
open environments. Our 2D and 3D network achieves label-free semantic
segmentation with 28.4% and 33.5% mIoU on ScanNet, improving 4.7% and 7.9%,
respectively. And for nuScenes dataset, our performance is 26.8% with an
improvement of 6%. Code will be released
(https://github.com/runnanchen/Label-Free-Scene-Understanding)
Efficient Supervision for Robot Learning via Imitation, Simulation, and Adaptation
Recent successes in machine learning have led to a shift in the design of
autonomous systems, improving performance on existing tasks and rendering new
applications possible. Data-focused approaches gain relevance across diverse,
intricate applications when developing data collection and curation pipelines
becomes more effective than manual behaviour design. The following work aims at
increasing the efficiency of this pipeline in two principal ways: by utilising
more powerful sources of informative data and by extracting additional
information from existing data. In particular, we target three orthogonal
fronts: imitation learning, domain adaptation, and transfer from simulation.Comment: Dissertation Summar
- …