10,014 research outputs found
Multimodal 3D Object Detection from Simulated Pretraining
The need for simulated data in autonomous driving applications has become
increasingly important, both for validation of pretrained models and for
training new models. In order for these models to generalize to real-world
applications, it is critical that the underlying dataset contains a variety of
driving scenarios and that simulated sensor readings closely mimics real-world
sensors. We present the Carla Automated Dataset Extraction Tool (CADET), a
novel tool for generating training data from the CARLA simulator to be used in
autonomous driving research. The tool is able to export high-quality,
synchronized LIDAR and camera data with object annotations, and offers
configuration to accurately reflect a real-life sensor array. Furthermore, we
use this tool to generate a dataset consisting of 10 000 samples and use this
dataset in order to train the 3D object detection network AVOD-FPN, with
finetuning on the KITTI dataset in order to evaluate the potential for
effective pretraining. We also present two novel LIDAR feature map
configurations in Bird's Eye View for use with AVOD-FPN that can be easily
modified. These configurations are tested on the KITTI and CADET datasets in
order to evaluate their performance as well as the usability of the simulated
dataset for pretraining. Although insufficient to fully replace the use of real
world data, and generally not able to exceed the performance of systems fully
trained on real data, our results indicate that simulated data can considerably
reduce the amount of training on real data required to achieve satisfactory
levels of accuracy.Comment: 12 pages, part of proceedings for the NAIS 2019 symposiu
InLoc: Indoor Visual Localization with Dense Matching and View Synthesis
We seek to predict the 6 degree-of-freedom (6DoF) pose of a query photograph
with respect to a large indoor 3D map. The contributions of this work are
three-fold. First, we develop a new large-scale visual localization method
targeted for indoor environments. The method proceeds along three steps: (i)
efficient retrieval of candidate poses that ensures scalability to large-scale
environments, (ii) pose estimation using dense matching rather than local
features to deal with textureless indoor scenes, and (iii) pose verification by
virtual view synthesis to cope with significant changes in viewpoint, scene
layout, and occluders. Second, we collect a new dataset with reference 6DoF
poses for large-scale indoor localization. Query photographs are captured by
mobile phones at a different time than the reference 3D map, thus presenting a
realistic indoor localization scenario. Third, we demonstrate that our method
significantly outperforms current state-of-the-art indoor localization
approaches on this new challenging data
LO-Net: Deep Real-time Lidar Odometry
We present a novel deep convolutional network pipeline, LO-Net, for real-time
lidar odometry estimation. Unlike most existing lidar odometry (LO) estimations
that go through individually designed feature selection, feature matching, and
pose estimation pipeline, LO-Net can be trained in an end-to-end manner. With a
new mask-weighted geometric constraint loss, LO-Net can effectively learn
feature representation for LO estimation, and can implicitly exploit the
sequential dependencies and dynamics in the data. We also design a scan-to-map
module, which uses the geometric and semantic information learned in LO-Net, to
improve the estimation accuracy. Experiments on benchmark datasets demonstrate
that LO-Net outperforms existing learning based approaches and has similar
accuracy with the state-of-the-art geometry-based approach, LOAM
Super-resolution-based snake model—an unsupervised method for large-scale building extraction using airborne LiDAR Data and optical image
Automatic extraction of buildings in urban and residential scenes has become a subject of growing interest in the domain of photogrammetry and remote sensing, particularly since the mid-1990s. Active contour model, colloquially known as snake model, has been studied to extract buildings from aerial and satellite imagery. However, this task is still very challenging due to the complexity of building size, shape, and its surrounding environment. This complexity leads to a major obstacle for carrying out a reliable large-scale building extraction, since the involved prior information and assumptions on building such as shape, size, and color cannot be generalized over large areas. This paper presents an efficient snake model to overcome such a challenge, called Super-Resolution-based Snake Model (SRSM). The SRSM operates on high-resolution Light Detection and Ranging (LiDAR)-based elevation images—called z-images—generated by a super-resolution process applied to LiDAR data. The involved balloon force model is also improved to shrink or inflate adaptively, instead of inflating continuously. This method is applicable for a large scale such as city scale and even larger, while having a high level of automation and not requiring any prior knowledge nor training data from the urban scenes (hence unsupervised). It achieves high overall accuracy when tested on various datasets. For instance, the proposed SRSM yields an average area-based Quality of 86.57% and object-based Quality of 81.60% on the ISPRS Vaihingen benchmark datasets. Compared to other methods using this benchmark dataset, this level of accuracy is highly desirable even for a supervised method. Similarly desirable outcomes are obtained when carrying out the proposed SRSM on the whole City of Quebec (total area of 656 km2), yielding an area-based Quality of 62.37% and an object-based Quality of 63.21%
- …