3,502 research outputs found
Learning to Fuse Local Geometric Features for 3D Rigid Data Matching
This paper presents a simple yet very effective data-driven approach to fuse
both low-level and high-level local geometric features for 3D rigid data
matching. It is a common practice to generate distinctive geometric descriptors
by fusing low-level features from various viewpoints or subspaces, or enhance
geometric feature matching by leveraging multiple high-level features. In prior
works, they are typically performed via linear operations such as concatenation
and min pooling. We show that more compact and distinctive representations can
be achieved by optimizing a neural network (NN) model under the triplet
framework that non-linearly fuses local geometric features in Euclidean spaces.
The NN model is trained by an improved triplet loss function that fully
leverages all pairwise relationships within the triplet. Moreover, the fused
descriptor by our approach is also competitive to deep learned descriptors from
raw data while being more lightweight and rotational invariant. Experimental
results on four standard datasets with various data modalities and application
contexts confirm the advantages of our approach in terms of both feature
matching and geometric registration
Multi-directional Geodesic Neural Networks via Equivariant Convolution
We propose a novel approach for performing convolution of signals on curved
surfaces and show its utility in a variety of geometric deep learning
applications. Key to our construction is the notion of directional functions
defined on the surface, which extend the classic real-valued signals and which
can be naturally convolved with with real-valued template functions. As a
result, rather than trying to fix a canonical orientation or only keeping the
maximal response across all alignments of a 2D template at every point of the
surface, as done in previous works, we show how information across all
rotations can be kept across different layers of the neural network. Our
construction, which we call multi-directional geodesic convolution, or
directional convolution for short, allows, in particular, to propagate and
relate directional information across layers and thus different regions on the
shape. We first define directional convolution in the continuous setting, prove
its key properties and then show how it can be implemented in practice, for
shapes represented as triangle meshes. We evaluate directional convolution in a
wide variety of learning scenarios ranging from classification of signals on
surfaces, to shape segmentation and shape matching, where we show a significant
improvement over several baselines
Survey on RGB, 3D, Thermal, and Multimodal Approaches for Facial Expression Recognition: History, Trends, and Affect-related Applications
Facial expressions are an important way through which humans interact
socially. Building a system capable of automatically recognizing facial
expressions from images and video has been an intense field of study in recent
years. Interpreting such expressions remains challenging and much research is
needed about the way they relate to human affect. This paper presents a general
overview of automatic RGB, 3D, thermal and multimodal facial expression
analysis. We define a new taxonomy for the field, encompassing all steps from
face detection to facial expression recognition, and describe and classify the
state of the art methods accordingly. We also present the important datasets
and the bench-marking of most influential methods. We conclude with a general
discussion about trends, important questions and future lines of research
Learning 3D Object Categories by Looking Around Them
Traditional approaches for learning 3D object categories use either synthetic
data or manual supervision. In this paper, we propose a method which does not
require manual annotations and is instead cued by observing objects from a
moving vantage point. Our system builds on two innovations: a Siamese viewpoint
factorization network that robustly aligns different videos together without
explicitly comparing 3D shapes; and a 3D shape completion network that can
extract the full shape of an object from partial observations. We also
demonstrate the benefits of configuring networks to perform probabilistic
predictions as well as of geometry-aware data augmentation schemes. We obtain
state-of-the-art results on publicly-available benchmarks.Comment: Proceedings of the International Conference on Computer Vision, 201
Continuous Geodesic Convolutions for Learning on 3D Shapes
The majority of descriptor-based methods for geometric processing of
non-rigid shape rely on hand-crafted descriptors. Recently, learning-based
techniques have been shown effective, achieving state-of-the-art results in a
variety of tasks. Yet, even though these methods can in principle work directly
on raw data, most methods still rely on hand-crafted descriptors at the input
layer. In this work, we wish to challenge this practice and use a neural
network to learn descriptors directly from the raw mesh. To this end, we
introduce two modules into our neural architecture. The first is a local
reference frame (LRF) used to explicitly make the features invariant to rigid
transformations. The second is continuous convolution kernels that provide
robustness to sampling. We show the efficacy of our proposed network in
learning on raw meshes using two cornerstone tasks: shape matching, and human
body parts segmentation. Our results show superior results over baseline
methods that use hand-crafted descriptors
Modeling Local Geometric Structure of 3D Point Clouds using Geo-CNN
Recent advances in deep convolutional neural networks (CNNs) have motivated
researchers to adapt CNNs to directly model points in 3D point clouds. Modeling
local structure has been proven to be important for the success of
convolutional architectures, and researchers exploited the modeling of local
point sets in the feature extraction hierarchy. However, limited attention has
been paid to explicitly model the geometric structure amongst points in a local
region. To address this problem, we propose Geo-CNN, which applies a generic
convolution-like operation dubbed as GeoConv to each point and its local
neighborhood. Local geometric relationships among points are captured when
extracting edge features between the center and its neighboring points. We
first decompose the edge feature extraction process onto three orthogonal
bases, and then aggregate the extracted features based on the angles between
the edge vector and the bases. This encourages the network to preserve the
geometric structure in Euclidean space throughout the feature extraction
hierarchy. GeoConv is a generic and efficient operation that can be easily
integrated into 3D point cloud analysis pipelines for multiple applications. We
evaluate Geo-CNN on ModelNet40 and KITTI and achieve state-of-the-art
performance
Fusion++: Volumetric Object-Level SLAM
We propose an online object-level SLAM system which builds a persistent and
accurate 3D graph map of arbitrary reconstructed objects. As an RGB-D camera
browses a cluttered indoor scene, Mask-RCNN instance segmentations are used to
initialise compact per-object Truncated Signed Distance Function (TSDF)
reconstructions with object size-dependent resolutions and a novel 3D
foreground mask. Reconstructed objects are stored in an optimisable 6DoF pose
graph which is our only persistent map representation. Objects are
incrementally refined via depth fusion, and are used for tracking,
relocalisation and loop closure detection. Loop closures cause adjustments in
the relative pose estimates of object instances, but no intra-object warping.
Each object also carries semantic information which is refined over time and an
existence probability to account for spurious instance predictions. We
demonstrate our approach on a hand-held RGB-D sequence from a cluttered office
scene with a large number and variety of object instances, highlighting how the
system closes loops and makes good use of existing objects on repeated loops.
We quantitatively evaluate the trajectory error of our system against a
baseline approach on the RGB-D SLAM benchmark, and qualitatively compare
reconstruction quality of discovered objects on the YCB video dataset.
Performance evaluation shows our approach is highly memory efficient and runs
online at 4-8Hz (excluding relocalisation) despite not being optimised at the
software level
Multi-View Stereo by Temporal Nonparametric Fusion
We propose a novel idea for depth estimation from multi-view image-pose
pairs, where the model has capability to leverage information from previous
latent-space encodings of the scene. This model uses pairs of images and poses,
which are passed through an encoder--decoder model for disparity estimation.
The novelty lies in soft-constraining the bottleneck layer by a nonparametric
Gaussian process prior. We propose a pose-kernel structure that encourages
similar poses to have resembling latent spaces. The flexibility of the Gaussian
process (GP) prior provides adapting memory for fusing information from
previous views. We train the encoder--decoder and the GP hyperparameters
jointly end-to-end. In addition to a batch method, we derive a lightweight
estimation scheme that circumvents standard pitfalls in scaling Gaussian
process inference, and demonstrate how our scheme can run in real-time on smart
devices.Comment: ICCV 201
Leveraging Photogrammetric Mesh Models for Aerial-Ground Feature Point Matching Toward Integrated 3D Reconstruction
Integration of aerial and ground images has been proved as an efficient
approach to enhance the surface reconstruction in urban environments. However,
as the first step, the feature point matching between aerial and ground images
is remarkably difficult, due to the large differences in viewpoint and
illumination conditions. Previous studies based on geometry-aware image
rectification have alleviated this problem, but the performance and convenience
of this strategy is limited by several flaws, e.g. quadratic image pairs,
segregated extraction of descriptors and occlusions. To address these problems,
we propose a novel approach: leveraging photogrammetric mesh models for
aerial-ground image matching. The methods of this proposed approach have linear
time complexity with regard to the number of images, can explicitly handle low
overlap using multi-view images and can be directly injected into off-the-shelf
structure-from-motion (SfM) and multi-view stereo (MVS) solutions. First,
aerial and ground images are reconstructed separately and initially
co-registered through weak georeferencing data. Second, aerial models are
rendered to the initial ground views, in which the color, depth and normal
images are obtained. Then, the synthesized color images and the corresponding
ground images are matched by comparing the descriptors, filtered by local
geometrical information, and then propagated to the aerial views using depth
images and patch-based matching. Experimental evaluations using various
datasets confirm the superior performance of the proposed methods in
aerial-ground image matching. In addition, incorporation of the existing SfM
and MVS solutions into these methods enables more complete and accurate models
to be directly obtained.Comment: Accepted for publication in ISPRS Journal of Photogrammetry and
Remote Sensin
Drosophila-Inspired 3D Moving Object Detection Based on Point Clouds
3D moving object detection is one of the most critical tasks in dynamic scene
analysis. In this paper, we propose a novel Drosophila-inspired 3D moving
object detection method using Lidar sensors. According to the theory of
elementary motion detector, we have developed a motion detector based on the
shallow visual neural pathway of Drosophila. This detector is sensitive to the
movement of objects and can well suppress background noise. Designing neural
circuits with different connection modes, the approach searches for motion
areas in a coarse-to-fine fashion and extracts point clouds of each motion area
to form moving object proposals. An improved 3D object detection network is
then used to estimate the point clouds of each proposal and efficiently
generates the 3D bounding boxes and the object categories. We evaluate the
proposed approach on the widely-used KITTI benchmark, and state-of-the-art
performance was obtained by using the proposed approach on the task of motion
detection
- …