4,850 research outputs found
DeepPoint3D: Learning Discriminative Local Descriptors using Deep Metric Learning on 3D Point Clouds
Learning local descriptors is an important problem in computer vision. While
there are many techniques for learning local patch descriptors for 2D images,
recently efforts have been made for learning local descriptors for 3D points.
The recent progress towards solving this problem in 3D leverages the strong
feature representation capability of image based convolutional neural networks
by utilizing RGB-D or multi-view representations. However, in this paper, we
propose to learn 3D local descriptors by directly processing unstructured 3D
point clouds without needing any intermediate representation. The method
constitutes a deep network for learning permutation invariant representation of
3D points. To learn the local descriptors, we use a multi-margin contrastive
loss which discriminates between similar and dissimilar points on a surface
while also leveraging the extent of dissimilarity among the negative samples at
the time of training. With comprehensive evaluation against strong baselines,
we show that the proposed method outperforms state-of-the-art methods for
matching points in 3D point clouds. Further, we demonstrate the effectiveness
of the proposed method on various applications achieving state-of-the-art
results
Equivariant Multi-View Networks
Several popular approaches to 3D vision tasks process multiple views of the
input independently with deep neural networks pre-trained on natural images,
achieving view permutation invariance through a single round of pooling over
all views. We argue that this operation discards important information and
leads to subpar global descriptors. In this paper, we propose a group
convolutional approach to multiple view aggregation where convolutions are
performed over a discrete subgroup of the rotation group, enabling, thus, joint
reasoning over all views in an equivariant (instead of invariant) fashion, up
to the very last layer. We further develop this idea to operate on smaller
discrete homogeneous spaces of the rotation group, where a polar view
representation is used to maintain equivariance with only a fraction of the
number of input views. We set the new state of the art in several large scale
3D shape retrieval tasks, and show additional applications to panoramic scene
classification.Comment: Camera-ready. Accepted to ICCV'19 as oral presentatio
Self-supervised Learning of Dense Shape Correspondence
We introduce the first completely unsupervised correspondence learning
approach for deformable 3D shapes. Key to our model is the understanding that
natural deformations (such as changes in pose) approximately preserve the
metric structure of the surface, yielding a natural criterion to drive the
learning process toward distortion-minimizing predictions. On this basis, we
overcome the need for annotated data and replace it by a purely geometric
criterion. The resulting learning model is class-agnostic, and is able to
leverage any type of deformable geometric data for the training phase. In
contrast to existing supervised approaches which specialize on the class seen
at training time, we demonstrate stronger generalization as well as
applicability to a variety of challenging settings. We showcase our method on a
wide selection of correspondence benchmarks, where we outperform other methods
in terms of accuracy, generalization, and efficiency
ALIGNet: Partial-Shape Agnostic Alignment via Unsupervised Learning
The process of aligning a pair of shapes is a fundamental operation in
computer graphics. Traditional approaches rely heavily on matching
corresponding points or features to guide the alignment, a paradigm that
falters when significant shape portions are missing. These techniques generally
do not incorporate prior knowledge about expected shape characteristics, which
can help compensate for any misleading cues left by inaccuracies exhibited in
the input shapes. We present an approach based on a deep neural network,
leveraging shape datasets to learn a shape-aware prior for source-to-target
alignment that is robust to shape incompleteness. In the absence of ground
truth alignments for supervision, we train a network on the task of shape
alignment using incomplete shapes generated from full shapes for
self-supervision. Our network, called ALIGNet, is trained to warp complete
source shapes to incomplete targets, as if the target shapes were complete,
thus essentially rendering the alignment partial-shape agnostic. We aim for the
network to develop specialized expertise over the common characteristics of the
shapes in each dataset, thereby achieving a higher-level understanding of the
expected shape space to which a local approach would be oblivious. We constrain
ALIGNet through an anisotropic total variation identity regularization to
promote piecewise smooth deformation fields, facilitating both partial-shape
agnosticism and post-deformation applications. We demonstrate that ALIGNet
learns to align geometrically distinct shapes, and is able to infer plausible
mappings even when the target shape is significantly incomplete. We show that
our network learns the common expected characteristics of shape collections,
without over-fitting or memorization, enabling it to produce plausible
deformations on unseen data during test time.Comment: To be presented at SIGGRAPH Asia 201
Global spectral graph wavelet signature for surface analysis of carpal bones
In this paper, we present a spectral graph wavelet approach for shape
analysis of carpal bones of human wrist. We apply a metric called global
spectral graph wavelet signature for representation of cortical surface of the
carpal bone based on eigensystem of Laplace-Beltrami operator. Furthermore, we
propose a heuristic and efficient way of aggregating local descriptors of a
carpal bone surface to global descriptor. The resultant global descriptor is
not only isometric invariant, but also much more efficient and requires less
memory storage. We perform experiments on shape of the carpal bones of ten
women and ten men from a publicly-available database. Experimental results show
the excellency of the proposed GSGW compared to recent proposed GPS embedding
approach for comparing shapes of the carpal bones across populations.Comment: arXiv admin note: substantial text overlap with arXiv:1705.0625
Perspectival Knowledge in PSOA RuleML: Representation, Model Theory, and Translation
In Positional-Slotted Object-Applicative (PSOA) RuleML, a predicate
application (atom) can have an Object IDentifier (OID) and descriptors that may
be positional arguments (tuples) or attribute-value pairs (slots). PSOA RuleML
explicitly specifies for each descriptor whether it is to be interpreted under
the perspective of the predicate in whose scope it occurs. This
predicate-dependency dimension refines the space between oidless, positional
atoms (relationships) and oidful, slotted atoms (framepoints): While
relationships use only a predicate-scope-sensitive (predicate-dependent) tuple
and framepoints use only predicate-scope-insensitive (predicate-independent)
slots, PSOA uses a systematics of orthogonal constructs also permitting atoms
with (predicate-)independent tuples and atoms with (predicate-)dependent slots.
This supports data and knowledge representation where a slot attribute can have
different values depending on the predicate. PSOA thus extends object-oriented
multi-membership and multiple inheritance. Based on objectification, PSOA laws
are given: Besides unscoping and centralization, the semantic restriction and
transformation of describution permits rescoping of one atom's independent
descriptors to another atom with the same OID but a different predicate. For
inheritance, default descriptors are realized by rules. On top of a metamodel
and a Grailog visualization, PSOA's atom systematics for facts, queries, and
rules is explained. The presentation and (XML-)serialization syntaxes of PSOA
RuleML are introduced. Its model-theoretic semantics is formalized by extending
the interpretation functions for dependent descriptors. The open-source
PSOATransRun system realizes PSOA RuleML by a translator to runtime predicates,
including for dependent tuples (prdtupterm) and slots (prdsloterm). Our tests
show efficiency advantages of dependent and tupled modeling.Comment: 39 pages, 5 figures, 2 tables; updates for PSOATransRun 1.3.1 to
1.4.2; refined terminology and metamode
Attribute CNNs for Word Spotting in Handwritten Documents
Word spotting has become a field of strong research interest in document
image analysis over the last years. Recently, AttributeSVMs were proposed which
predict a binary attribute representation. At their time, this influential
method defined the state-of-the-art in segmentation-based word spotting. In
this work, we present an approach for learning attribute representations with
Convolutional Neural Networks (CNNs). By taking a probabilistic perspective on
training CNNs, we derive two different loss functions for binary and
real-valued word string embeddings. In addition, we propose two different CNN
architectures, specifically designed for word spotting. These architectures are
able to be trained in an end-to-end fashion. In a number of experiments, we
investigate the influence of different word string embeddings and optimization
strategies. We show our Attribute CNNs to achieve state-of-the-art results for
segmentation-based word spotting on a large variety of data sets.Comment: under review at IJDA
Multi-scale Volumes for Deep Object Detection and Localization
This study aims to analyze the benefits of improved multi-scale reasoning for
object detection and localization with deep convolutional neural networks. To
that end, an efficient and general object detection framework which operates on
scale volumes of a deep feature pyramid is proposed. In contrast to the
proposed approach, most current state-of-the-art object detectors operate on a
single-scale in training, while testing involves independent evaluation across
scales. One benefit of the proposed approach is in better capturing of
multi-scale contextual information, resulting in significant gains in both
detection performance and localization quality of objects on the PASCAL VOC
dataset and a multi-view highway vehicles dataset. The joint detection and
localization scale-specific models are shown to especially benefit detection of
challenging object categories which exhibit large scale variation as well as
detection of small objects.Comment: To appear in Pattern Recognition 201
Principal Polynomial Analysis
This paper presents a new framework for manifold learning based on a sequence
of principal polynomials that capture the possibly nonlinear nature of the
data. The proposed Principal Polynomial Analysis (PPA) generalizes PCA by
modeling the directions of maximal variance by means of curves, instead of
straight lines. Contrarily to previous approaches, PPA reduces to performing
simple univariate regressions, which makes it computationally feasible and
robust. Moreover, PPA shows a number of interesting analytical properties.
First, PPA is a volume-preserving map, which in turn guarantees the existence
of the inverse. Second, such an inverse can be obtained in closed form.
Invertibility is an important advantage over other learning methods, because it
permits to understand the identified features in the input domain where the
data has physical meaning. Moreover, it allows to evaluate the performance of
dimensionality reduction in sensible (input-domain) units. Volume preservation
also allows an easy computation of information theoretic quantities, such as
the reduction in multi-information after the transform. Third, the analytical
nature of PPA leads to a clear geometrical interpretation of the manifold: it
allows the computation of Frenet-Serret frames (local features) and of
generalized curvatures at any point of the space. And fourth, the analytical
Jacobian allows the computation of the metric induced by the data, thus
generalizing the Mahalanobis distance. These properties are demonstrated
theoretically and illustrated experimentally. The performance of PPA is
evaluated in dimensionality and redundancy reduction, in both synthetic and
real datasets from the UCI repository
Semantic Image Networks for Human Action Recognition
In this paper, we propose the use of a semantic image, an improved
representation for video analysis, principally in combination with Inception
networks. The semantic image is obtained by applying localized sparse
segmentation using global clustering (LSSGC) prior to the approximate rank
pooling which summarizes the motion characteristics in single or multiple
images. It incorporates the background information by overlaying a static
background from the window onto the subsequent segmented frames. The idea is to
improve the action-motion dynamics by focusing on the region which is important
for action recognition and encoding the temporal variances using the frame
ranking method. We also propose the sequential combination of
Inception-ResNetv2 and long-short-term memory network (LSTM) to leverage the
temporal variances for improved recognition performance. Extensive analysis has
been carried out on UCF101 and HMDB51 datasets which are widely used in action
recognition studies. We show that (i) the semantic image generates better
activations and converges faster than its original variant, (ii) using
segmentation prior to approximate rank pooling yields better recognition
performance, (iii) The use of LSTM leverages the temporal variance information
from approximate rank pooling to model the action behavior better than the base
network, (iv) the proposed representations can be adaptive as they can be used
with existing methods such as temporal segment networks to improve the
recognition performance, and (v) our proposed four-stream network architecture
comprising of semantic images and semantic optical flows achieves
state-of-the-art performance, 95.9% and 73.5% recognition accuracy on UCF101
and HMDB51, respectively.Comment: 30 page
- …