109,739 research outputs found
On the Generalized Essential Matrix Correction: An efficient solution to the problem and its applications
This paper addresses the problem of finding the closest generalized essential
matrix from a given matrix, with respect to the Frobenius norm. To
the best of our knowledge, this nonlinear constrained optimization problem has
not been addressed in the literature yet. Although it can be solved directly,
it involves a large number of constraints, and any optimization method to solve
it would require much computational effort. We start by deriving a couple of
unconstrained formulations of the problem. After that, we convert the original
problem into a new one, involving only orthogonal constraints, and propose an
efficient algorithm of steepest descent-type to find its solution. To test the
algorithms, we evaluate the methods with synthetic data and conclude that the
proposed steepest descent-type approach is much faster than the direct
application of general optimization techniques to the original formulation with
33 constraints and to the unconstrained ones. To further motivate the relevance
of our method, we apply it in two pose problems (relative and absolute) using
synthetic and real data.Comment: 14 pages, 7 figures, journa
Siamese Convolutional Neural Network for Sub-millimeter-accurate Camera Pose Estimation and Visual Servoing
Visual Servoing (VS), where images taken from a camera typically attached to
the robot end-effector are used to guide the robot motions, is an important
technique to tackle robotic tasks that require a high level of accuracy. We
propose a new neural network, based on a Siamese architecture, for highly
accurate camera pose estimation. This, in turn, can be used as a final
refinement step following a coarse VS or, if applied in an iterative manner, as
a standalone VS on its own. The key feature of our neural network is that it
outputs the relative pose between any pair of images, and does so with
sub-millimeter accuracy. We show that our network can reduce pose estimation
errors to 0.6 mm in translation and 0.4 degrees in rotation, from initial
errors of 10 mm / 5 degrees if applied once, or of several cm / tens of degrees
if applied iteratively. The network can generalize to similar objects, is
robust against changing lighting conditions, and to partial occlusions (when
used iteratively). The high accuracy achieved enables tackling low-tolerance
assembly tasks downstream: using our network, an industrial robot can achieve
97.5% success rate on a VGA-connector insertion task without any force sensing
mechanism
Can a biologically-plausible hierarchy effectively replace face detection, alignment, and recognition pipelines?
The standard approach to unconstrained face recognition in natural
photographs is via a detection, alignment, recognition pipeline. While that
approach has achieved impressive results, there are several reasons to be
dissatisfied with it, among them is its lack of biological plausibility. A
recent theory of invariant recognition by feedforward hierarchical networks,
like HMAX, other convolutional networks, or possibly the ventral stream,
implies an alternative approach to unconstrained face recognition. This
approach accomplishes detection and alignment implicitly by storing
transformations of training images (called templates) rather than explicitly
detecting and aligning faces at test time. Here we propose a particular
locality-sensitive hashing based voting scheme which we call "consensus of
collisions" and show that it can be used to approximate the full 3-layer
hierarchy implied by the theory. The resulting end-to-end system for
unconstrained face recognition operates on photographs of faces taken under
natural conditions, e.g., Labeled Faces in the Wild (LFW), without aligning or
cropping them, as is normally done. It achieves a drastic improvement in the
state of the art on this end-to-end task, reaching the same level of
performance as the best systems operating on aligned, closely cropped images
(no outside training data). It also performs well on two newer datasets,
similar to LFW, but more difficult: LFW-jittered (new here) and SUFR-W.Comment: 11 Pages, 4 Figures. Mar 26, (2014): Improved exposition. Added CBMM
memo cover page. No substantive change
The Effect of Learning Strategy versus Inherent Architecture Properties on the Ability of Convolutional Neural Networks to Develop Transformation Invariance
As object recognition becomes an increasingly common ML task, and recent
research demonstrating CNNs vulnerability to attacks and small image
perturbations necessitate fully understanding the foundations of object
recognition. We focus on understanding the mechanisms behind how neural
networks generalize to spatial transformations of complex objects. While humans
excel at discriminating between objects shown at new positions, orientations,
and scales, past results demonstrate that this may be limited to familiar
objects - humans demonstrate low tolerance of spatial-variances for
purposefully constructed novel objects. Because training artificial neural
networks from scratch is similar to showing novel objects to humans, we seek to
understand the factors influencing the tolerance of CNNs to spatial
transformations. We conduct a thorough empirical examination of seven
Convolutional Neural Network (CNN) architectures. By training on a controlled
face image dataset, we measure model accuracy across different degrees of 5
transformations: position, size, rotation, Gaussian blur, and resolution
transformation due to resampling. We also examine how learning strategy affects
generalizability by examining how different amounts of pre-training have on
model robustness. Overall, we find that the most significant contributor to
transformation invariance is pre-training on a large, diverse image dataset.
Moreover, while AlexNet tends to be the least robust network, VGG and ResNet
architectures demonstrate higher robustness for different transformations.
Along with kernel visualizations and qualitative analyses, we examine
differences between learning strategy and inherent architectural properties in
contributing to invariance of transformations, providing valuable information
towards understanding how to achieve greater robustness to transformations in
CNNs.Comment: 11 pages, 17 figure
Point cloud registration: matching a maximal common subset on pointclouds with noise (with 2D implementation)
We analyze the problem of determining whether 2 given point clouds in 2D,
with any distinct cardinality and any number of outliers, have subsets of the
same size that can be matched via a rigid motion. This problem is important,
for example, in the application of fingerprint matching with incomplete data.
We propose an algorithm that, under assumptions on the noise tolerance, allows
to find corresponding subclouds of the maximum possible size. Our procedure
optimizes a potential energy function to do so, which was first inspired in the
potential energy interaction that occurs between point charges in
electrostatics.Comment: 13 pages, 5 figure
Explicit Spatial Encoding for Deep Local Descriptors
We propose a kernelized deep local-patch descriptor based on efficient match
kernels of neural network activations. Response of each receptive field is
encoded together with its spatial location using explicit feature maps. Two
location parametrizations, Cartesian and polar, are used to provide robustness
to a different types of canonical patch misalignment. Additionally, we analyze
how the conventional architecture, i.e. a fully connected layer attached after
the convolutional part, encodes responses in a spatially variant way. In
contrary, explicit spatial encoding is used in our descriptor, whose potential
applications are not limited to local-patches. We evaluate the descriptor on
standard benchmarks. Both versions, encoding 32x32 or 64x64 patches,
consistently outperform all other methods on all benchmarks. The number of
parameters of the model is independent of the input patch resolution
OATM: Occlusion Aware Template Matching by Consensus Set Maximization
We present a novel approach to template matching that is efficient, can
handle partial occlusions, and comes with provable performance guarantees. A
key component of the method is a reduction that transforms the problem of
searching a nearest neighbor among high-dimensional vectors, to searching
neighbors among two sets of order vectors, which can be found
efficiently using range search techniques. This allows for a quadratic
improvement in search complexity, and makes the method scalable in handling
large search spaces. The second contribution is a hashing scheme based on
consensus set maximization, which allows us to handle occlusions. The resulting
scheme can be seen as a randomized hypothesize-and-test algorithm, which is
equipped with guarantees regarding the number of iterations required for
obtaining an optimal solution with high probability. The predicted matching
rates are validated empirically and the algorithm shows a significant
improvement over the state-of-the-art in both speed and robustness to
occlusions.Comment: to appear at cvpr 201
Kernelized Deep Convolutional Neural Network for Describing Complex Images
With the impressive capability to capture visual content, deep convolutional
neural networks (CNN) have demon- strated promising performance in various
vision-based ap- plications, such as classification, recognition, and objec- t
detection. However, due to the intrinsic structure design of CNN, for images
with complex content, it achieves lim- ited capability on invariance to
translation, rotation, and re-sizing changes, which is strongly emphasized in
the s- cenario of content-based image retrieval. In this paper, to address this
problem, we proposed a new kernelized deep convolutional neural network. We
first discuss our motiva- tion by an experimental study to demonstrate the
sensitivi- ty of the global CNN feature to the basic geometric trans-
formations. Then, we propose to represent visual content with approximate
invariance to the above geometric trans- formations from a kernelized
perspective. We extract CNN features on the detected object-like patches and
aggregate these patch-level CNN features to form a vectorial repre- sentation
with the Fisher vector model. The effectiveness of our proposed algorithm is
demonstrated on image search application with three benchmark datasets.Comment: 9 page
Learning selectivity and invariance through spatiotemporal Hebbian plasticity in a hierarchical neural network
When an object moves smoothly across a field of view, the identify of the
object is unchanged, but the activation pattern of the photoreceptors on the
retina changes drastically. One of the major computational roles of our visual
system is to manage selectivity for different objects and tolerance to such
identity-preserving transformations as translations or rotations. This study
demonstrates that a hierarchical neural network, whose synaptic connectivities
are learned competitively with Hebbian plasticity operating within a local
spatiotemporal pooling range, is capable of gradually achieving feature
selectivity and transformation tolerance, so that the top level neurons carry
higher mutual information about object categories than a single-level neural
network. Furthermore, when genetic algorithm is applied to search for a network
architecture that maximizes transformation-invariant object recognition
performance, in conjunction with the associative learning algorithm, it is
found that deep networks outperform shallower ones
Efficient Global Point Cloud Alignment using Bayesian Nonparametric Mixtures
Point cloud alignment is a common problem in computer vision and robotics,
with applications ranging from 3D object recognition to reconstruction. We
propose a novel approach to the alignment problem that utilizes Bayesian
nonparametrics to describe the point cloud and surface normal densities, and
branch and bound (BB) optimization to recover the relative transformation. BB
uses a novel, refinable, near-uniform tessellation of rotation space using 4D
tetrahedra, leading to more efficient optimization compared to the common
axis-angle tessellation. We provide objective function bounds for pruning given
the proposed tessellation, and prove that BB converges to the optimum of the
cost function along with providing its computational complexity. Finally, we
empirically demonstrate the efficiency of the proposed approach as well as its
robustness to real-world conditions such as missing data and partial overlap
- …