456 research outputs found
Drosophila-Inspired 3D Moving Object Detection Based on Point Clouds
3D moving object detection is one of the most critical tasks in dynamic scene
analysis. In this paper, we propose a novel Drosophila-inspired 3D moving
object detection method using Lidar sensors. According to the theory of
elementary motion detector, we have developed a motion detector based on the
shallow visual neural pathway of Drosophila. This detector is sensitive to the
movement of objects and can well suppress background noise. Designing neural
circuits with different connection modes, the approach searches for motion
areas in a coarse-to-fine fashion and extracts point clouds of each motion area
to form moving object proposals. An improved 3D object detection network is
then used to estimate the point clouds of each proposal and efficiently
generates the 3D bounding boxes and the object categories. We evaluate the
proposed approach on the widely-used KITTI benchmark, and state-of-the-art
performance was obtained by using the proposed approach on the task of motion
detection
Boosting LiDAR-based Semantic Labeling by Cross-Modal Training Data Generation
Mobile robots and autonomous vehicles rely on multi-modal sensor setups to
perceive and understand their surroundings. Aside from cameras, LiDAR sensors
represent a central component of state-of-the-art perception systems. In
addition to accurate spatial perception, a comprehensive semantic understanding
of the environment is essential for efficient and safe operation. In this paper
we present a novel deep neural network architecture called LiLaNet for
point-wise, multi-class semantic labeling of semi-dense LiDAR data. The network
utilizes virtual image projections of the 3D point clouds for efficient
inference. Further, we propose an automated process for large-scale cross-modal
training data generation called Autolabeling, in order to boost semantic
labeling performance while keeping the manual annotation effort low. The
effectiveness of the proposed network architecture as well as the automated
data generation process is demonstrated on a manually annotated ground truth
dataset. LiLaNet is shown to significantly outperform current state-of-the-art
CNN architectures for LiDAR data. Applying our automatically generated
large-scale training data yields a boost of up to 14 percentage points compared
to networks trained on manually annotated data only
Knowledge-Enabled Robotic Agents for Shelf Replenishment in Cluttered Retail Environments
Autonomous robots in unstructured and dynamically changing retail
environments have to master complex perception, knowledgeprocessing, and
manipulation tasks. To enable them to act competently, we propose a framework
based on three core components: (o) a knowledge-enabled perception system,
capable of combining diverse information sources to cope with occlusions and
stacked objects with a variety of textures and shapes, (o) knowledge processing
methods produce strategies for tidying up supermarket racks, and (o) the
necessary manipulation skills in confined spaces to arrange objects in
semi-accessible rack shelves. We demonstrate our framework in an simulated
environment as well as on a real shopping rack using a PR2 robot. Typical
supermarket products are detected and rearranged in the retail rack, tidying up
what was found to be misplaced items.Comment: published in the proceedings of AAMAS 2016 as an extended abstrac
Purely Geometric Scene Association and Retrieval - A Case for Macro Scale 3D Geometry
We address the problems of measuring geometric similarity between 3D scenes,
represented through point clouds or range data frames, and associating them.
Our approach leverages macro-scale 3D structural geometry - the relative
configuration of arbitrary surfaces and relationships among structures that are
potentially far apart. We express such discriminative information in a
viewpoint-invariant feature space. These are subsequently encoded in a
frame-level signature that can be utilized to measure geometric similarity.
Such a characterization is robust to noise, incomplete and partially
overlapping data besides viewpoint changes. We show how it can be employed to
select a diverse set of data frames which have structurally similar content,
and how to validate whether views with similar geometric content are from the
same scene. The problem is formulated as one of general purpose retrieval from
an unannotated, spatio-temporally unordered database. Empirical analysis
indicates that the presented approach thoroughly outperforms baselines on depth
/ range data. Its depth-only performance is competitive with state-of-the-art
approaches with RGB or RGB-D inputs, including ones based on deep learning.
Experiments show retrieval performance to hold up well with much sparser
databases, which is indicative of the approach's robustness. The approach
generalized well - it did not require dataset specific training, and scaled up
in our experiments. Finally, we also demonstrate how geometrically diverse
selection of views can result in richer 3D reconstructions.Comment: Accepted in ICRA '1
VolMap: A Real-time Model for Semantic Segmentation of a LiDAR surrounding view
This paper introduces VolMap, a real-time approach for the semantic
segmentation of a 3D LiDAR surrounding view system in autonomous vehicles. We
designed an optimized deep convolution neural network that can accurately
segment the point cloud produced by a 360\degree{} LiDAR setup, where the input
consists of a volumetric bird-eye view with LiDAR height layers used as input
channels. We further investigated the usage of multi-LiDAR setup and its effect
on the performance of the semantic segmentation task. Our evaluations are
carried out on a large scale 3D object detection benchmark containing a LiDAR
cocoon setup, along with KITTI dataset, where the per-point segmentation labels
are derived from 3D bounding boxes. We show that VolMap achieved an excellent
balance between high accuracy and real-time running on CPU.Comment: Accepted at Thirty-sixth International Conference on Machine Learning
(ICML 2019) Workshop on AI for Autonomous Drivin
Spatial Transformer Point Convolution
Point clouds are unstructured and unordered in the embedded 3D space. In
order to produce consistent responses under different permutation layouts, most
existing methods aggregate local spatial points through maximum or summation
operation. But such an aggregation essentially belongs to the isotropic
filtering on all operated points therein, which tends to lose the information
of geometric structures. In this paper, we propose a spatial transformer point
convolution (STPC) method to achieve anisotropic convolution filtering on point
clouds. To capture and represent implicit geometric structures, we specifically
introduce spatial direction dictionary to learn those latent geometric
components. To better encode unordered neighbor points, we design sparse
deformer to transform them into the canonical ordered dictionary space by using
direction dictionary learning. In the transformed space, the standard
image-like convolution can be leveraged to generate anisotropic filtering,
which is more robust to express those finer variances of local regions.
Dictionary learning and encoding processes are encapsulated into a network
module and jointly learnt in an end-to-end manner. Extensive experiments on
several public datasets (including S3DIS, Semantic3D, SemanticKITTI)
demonstrate the effectiveness of our proposed method in point clouds semantic
segmentation task
Fully-Convolutional Point Networks for Large-Scale Point Clouds
This work proposes a general-purpose, fully-convolutional network
architecture for efficiently processing large-scale 3D data. One striking
characteristic of our approach is its ability to process unorganized 3D
representations such as point clouds as input, then transforming them
internally to ordered structures to be processed via 3D convolutions. In
contrast to conventional approaches that maintain either unorganized or
organized representations, from input to output, our approach has the advantage
of operating on memory efficient input data representations while at the same
time exploiting the natural structure of convolutional operations to avoid the
redundant computing and storing of spatial information in the network. The
network eliminates the need to pre- or post process the raw sensor data. This,
together with the fully-convolutional nature of the network, makes it an
end-to-end method able to process point clouds of huge spaces or even entire
rooms with up to 200k points at once. Another advantage is that our network can
produce either an ordered output or map predictions directly onto the input
cloud, thus making it suitable as a general-purpose point cloud descriptor
applicable to many 3D tasks. We demonstrate our network's ability to
effectively learn both low-level features as well as complex compositional
relationships by evaluating it on benchmark datasets for semantic voxel
segmentation, semantic part segmentation and 3D scene captioning.Comment: ECCV 201
PointConv: Deep Convolutional Networks on 3D Point Clouds
Unlike images which are represented in regular dense grids, 3D point clouds
are irregular and unordered, hence applying convolution on them can be
difficult. In this paper, we extend the dynamic filter to a new convolution
operation, named PointConv. PointConv can be applied on point clouds to build
deep convolutional networks. We treat convolution kernels as nonlinear
functions of the local coordinates of 3D points comprised of weight and density
functions. With respect to a given point, the weight functions are learned with
multi-layer perceptron networks and density functions through kernel density
estimation. The most important contribution of this work is a novel
reformulation proposed for efficiently computing the weight functions, which
allowed us to dramatically scale up the network and significantly improve its
performance. The learned convolution kernel can be used to compute
translation-invariant and permutation-invariant convolution on any point set in
the 3D space. Besides, PointConv can also be used as deconvolution operators to
propagate features from a subsampled point cloud back to its original
resolution. Experiments on ModelNet40, ShapeNet, and ScanNet show that deep
convolutional neural networks built on PointConv are able to achieve
state-of-the-art on challenging semantic segmentation benchmarks on 3D point
clouds. Besides, our experiments converting CIFAR-10 into a point cloud showed
that networks built on PointConv can match the performance of convolutional
networks in 2D images of a similar structure
A Review on Deep Learning Techniques Applied to Semantic Segmentation
Image semantic segmentation is more and more being of interest for computer
vision and machine learning researchers. Many applications on the rise need
accurate and efficient segmentation mechanisms: autonomous driving, indoor
navigation, and even virtual or augmented reality systems to name a few. This
demand coincides with the rise of deep learning approaches in almost every
field or application target related to computer vision, including semantic
segmentation or scene understanding. This paper provides a review on deep
learning methods for semantic segmentation applied to various application
areas. Firstly, we describe the terminology of this field as well as mandatory
background concepts. Next, the main datasets and challenges are exposed to help
researchers decide which are the ones that best suit their needs and their
targets. Then, existing methods are reviewed, highlighting their contributions
and their significance in the field. Finally, quantitative results are given
for the described methods and the datasets in which they were evaluated,
following up with a discussion of the results. At last, we point out a set of
promising future works and draw our own conclusions about the state of the art
of semantic segmentation using deep learning techniques.Comment: Submitted to TPAMI on Apr. 22, 201
Permutation Matters: Anisotropic Convolutional Layer for Learning on Point Clouds
It has witnessed a growing demand for efficient representation learning on
point clouds in many 3D computer vision applications. Behind the success story
of convolutional neural networks (CNNs) is that the data (e.g., images) are
Euclidean structured. However, point clouds are irregular and unordered.
Various point neural networks have been developed with isotropic filters or
using weighting matrices to overcome the structure inconsistency on point
clouds. However, isotropic filters or weighting matrices limit the
representation power. In this paper, we propose a permutable anisotropic
convolutional operation (PAI-Conv) that calculates soft-permutation matrices
for each point using dot-product attention according to a set of evenly
distributed kernel points on a sphere's surface and performs shared anisotropic
filters. In fact, dot product with kernel points is by analogy with the
dot-product with keys in Transformer as widely used in natural language
processing (NLP). From this perspective, PAI-Conv can be regarded as the
transformer for point clouds, which is physically meaningful and is robust to
cooperate with the efficient random point sampling method. Comprehensive
experiments on point clouds demonstrate that PAI-Conv produces competitive
results in classification and semantic segmentation tasks compared to
state-of-the-art methods
- …