Search CORE

109,739 research outputs found

On the Generalized Essential Matrix Correction: An efficient solution to the problem and its applications

Author: Cardoso Joao R.
Miraldo Pedro
Publication venue
Publication date: 16/03/2020
Field of study

This paper addresses the problem of finding the closest generalized essential matrix from a given

6\times 6

matrix, with respect to the Frobenius norm. To the best of our knowledge, this nonlinear constrained optimization problem has not been addressed in the literature yet. Although it can be solved directly, it involves a large number of constraints, and any optimization method to solve it would require much computational effort. We start by deriving a couple of unconstrained formulations of the problem. After that, we convert the original problem into a new one, involving only orthogonal constraints, and propose an efficient algorithm of steepest descent-type to find its solution. To test the algorithms, we evaluate the methods with synthetic data and conclude that the proposed steepest descent-type approach is much faster than the direct application of general optimization techniques to the original formulation with 33 constraints and to the unconstrained ones. To further motivate the relevance of our method, we apply it in two pose problems (relative and absolute) using synthetic and real data.Comment: 14 pages, 7 figures, journa

arXiv.org e-Print Archive

Siamese Convolutional Neural Network for Sub-millimeter-accurate Camera Pose Estimation and Visual Servoing

Author: Cai Zhongang
Pham Hung
Pham Quang-Cuong
Yu Cunjun
Publication venue
Publication date: 11/03/2019
Field of study

Visual Servoing (VS), where images taken from a camera typically attached to the robot end-effector are used to guide the robot motions, is an important technique to tackle robotic tasks that require a high level of accuracy. We propose a new neural network, based on a Siamese architecture, for highly accurate camera pose estimation. This, in turn, can be used as a final refinement step following a coarse VS or, if applied in an iterative manner, as a standalone VS on its own. The key feature of our neural network is that it outputs the relative pose between any pair of images, and does so with sub-millimeter accuracy. We show that our network can reduce pose estimation errors to 0.6 mm in translation and 0.4 degrees in rotation, from initial errors of 10 mm / 5 degrees if applied once, or of several cm / tens of degrees if applied iteratively. The network can generalize to similar objects, is robust against changing lighting conditions, and to partial occlusions (when used iteratively). The high accuracy achieved enables tackling low-tolerance assembly tasks downstream: using our network, an industrial robot can achieve 97.5% success rate on a VGA-connector insertion task without any force sensing mechanism

arXiv.org e-Print Archive

Can a biologically-plausible hierarchy effectively replace face detection, alignment, and recognition pipelines?

Author: Leibo Joel Z
Liao Qianli
Mroueh Youssef
Poggio Tomaso
Publication venue
Publication date: 26/03/2014
Field of study

The standard approach to unconstrained face recognition in natural photographs is via a detection, alignment, recognition pipeline. While that approach has achieved impressive results, there are several reasons to be dissatisfied with it, among them is its lack of biological plausibility. A recent theory of invariant recognition by feedforward hierarchical networks, like HMAX, other convolutional networks, or possibly the ventral stream, implies an alternative approach to unconstrained face recognition. This approach accomplishes detection and alignment implicitly by storing transformations of training images (called templates) rather than explicitly detecting and aligning faces at test time. Here we propose a particular locality-sensitive hashing based voting scheme which we call "consensus of collisions" and show that it can be used to approximate the full 3-layer hierarchy implied by the theory. The resulting end-to-end system for unconstrained face recognition operates on photographs of faces taken under natural conditions, e.g., Labeled Faces in the Wild (LFW), without aligning or cropping them, as is normally done. It achieves a drastic improvement in the state of the art on this end-to-end task, reaching the same level of performance as the best systems operating on aligned, closely cropped images (no outside training data). It also performs well on two newer datasets, similar to LFW, but more difficult: LFW-jittered (new here) and SUFR-W.Comment: 11 Pages, 4 Figures. Mar 26, (2014): Improved exposition. Added CBMM memo cover page. No substantive change

arXiv.org e-Print Archive

The Effect of Learning Strategy versus Inherent Architecture Properties on the Ability of Convolutional Neural Networks to Develop Transformation Invariance

Author: Grill-Spector Kalanit
Srivastava Megha
Publication venue
Publication date: 31/10/2018
Field of study

As object recognition becomes an increasingly common ML task, and recent research demonstrating CNNs vulnerability to attacks and small image perturbations necessitate fully understanding the foundations of object recognition. We focus on understanding the mechanisms behind how neural networks generalize to spatial transformations of complex objects. While humans excel at discriminating between objects shown at new positions, orientations, and scales, past results demonstrate that this may be limited to familiar objects - humans demonstrate low tolerance of spatial-variances for purposefully constructed novel objects. Because training artificial neural networks from scratch is similar to showing novel objects to humans, we seek to understand the factors influencing the tolerance of CNNs to spatial transformations. We conduct a thorough empirical examination of seven Convolutional Neural Network (CNN) architectures. By training on a controlled face image dataset, we measure model accuracy across different degrees of 5 transformations: position, size, rotation, Gaussian blur, and resolution transformation due to resampling. We also examine how learning strategy affects generalizability by examining how different amounts of pre-training have on model robustness. Overall, we find that the most significant contributor to transformation invariance is pre-training on a large, diverse image dataset. Moreover, while AlexNet tends to be the least robust network, VGG and ResNet architectures demonstrate higher robustness for different transformations. Along with kernel visualizations and qualitative analyses, we examine differences between learning strategy and inherent architectural properties in contributing to invariance of transformations, providing valuable information towards understanding how to achieve greater robustness to transformations in CNNs.Comment: 11 pages, 17 figure

arXiv.org e-Print Archive

Point cloud registration: matching a maximal common subset on pointclouds with noise (with 2D implementation)

Author: Garro Jorge Arce
López David Jiménez
Publication venue
Publication date: 16/04/2019
Field of study

We analyze the problem of determining whether 2 given point clouds in 2D, with any distinct cardinality and any number of outliers, have subsets of the same size that can be matched via a rigid motion. This problem is important, for example, in the application of fingerprint matching with incomplete data. We propose an algorithm that, under assumptions on the noise tolerance, allows to find corresponding subclouds of the maximum possible size. Our procedure optimizes a potential energy function to do so, which was first inspired in the potential energy interaction that occurs between point charges in electrostatics.Comment: 13 pages, 5 figure

arXiv.org e-Print Archive

Explicit Spatial Encoding for Deep Local Descriptors

Author: Chum Ondrej
Mukundan Arun
Tolias Giorgos
Publication venue
Publication date: 15/04/2019
Field of study

We propose a kernelized deep local-patch descriptor based on efficient match kernels of neural network activations. Response of each receptive field is encoded together with its spatial location using explicit feature maps. Two location parametrizations, Cartesian and polar, are used to provide robustness to a different types of canonical patch misalignment. Additionally, we analyze how the conventional architecture, i.e. a fully connected layer attached after the convolutional part, encodes responses in a spatially variant way. In contrary, explicit spatial encoding is used in our descriptor, whose potential applications are not limited to local-patches. We evaluate the descriptor on standard benchmarks. Both versions, encoding 32x32 or 64x64 patches, consistently outperform all other methods on all benchmarks. The number of parameters of the model is independent of the input patch resolution

arXiv.org e-Print Archive

OATM: Occlusion Aware Template Matching by Consensus Set Maximization

Author: Korman Simon
Milam Mark
Soatto Stefano
Publication venue
Publication date: 08/04/2018
Field of study

We present a novel approach to template matching that is efficient, can handle partial occlusions, and comes with provable performance guarantees. A key component of the method is a reduction that transforms the problem of searching a nearest neighbor among

N

high-dimensional vectors, to searching neighbors among two sets of order

\sqrt{N}

vectors, which can be found efficiently using range search techniques. This allows for a quadratic improvement in search complexity, and makes the method scalable in handling large search spaces. The second contribution is a hashing scheme based on consensus set maximization, which allows us to handle occlusions. The resulting scheme can be seen as a randomized hypothesize-and-test algorithm, which is equipped with guarantees regarding the number of iterations required for obtaining an optimal solution with high probability. The predicted matching rates are validated empirically and the algorithm shows a significant improvement over the state-of-the-art in both speed and robustness to occlusions.Comment: to appear at cvpr 201

arXiv.org e-Print Archive

Kernelized Deep Convolutional Neural Network for Describing Complex Images

Author: Liu Zhen
Publication venue
Publication date: 15/09/2015
Field of study

With the impressive capability to capture visual content, deep convolutional neural networks (CNN) have demon- strated promising performance in various vision-based ap- plications, such as classification, recognition, and objec- t detection. However, due to the intrinsic structure design of CNN, for images with complex content, it achieves lim- ited capability on invariance to translation, rotation, and re-sizing changes, which is strongly emphasized in the s- cenario of content-based image retrieval. In this paper, to address this problem, we proposed a new kernelized deep convolutional neural network. We first discuss our motiva- tion by an experimental study to demonstrate the sensitivi- ty of the global CNN feature to the basic geometric trans- formations. Then, we propose to represent visual content with approximate invariance to the above geometric trans- formations from a kernelized perspective. We extract CNN features on the detected object-like patches and aggregate these patch-level CNN features to form a vectorial repre- sentation with the Fisher vector model. The effectiveness of our proposed algorithm is demonstrated on image search application with three benchmark datasets.Comment: 9 page

arXiv.org e-Print Archive

Learning selectivity and invariance through spatiotemporal Hebbian plasticity in a hierarchical neural network

Author: Kouh Minjoon
Publication venue
Publication date: 21/04/2014
Field of study

When an object moves smoothly across a field of view, the identify of the object is unchanged, but the activation pattern of the photoreceptors on the retina changes drastically. One of the major computational roles of our visual system is to manage selectivity for different objects and tolerance to such identity-preserving transformations as translations or rotations. This study demonstrates that a hierarchical neural network, whose synaptic connectivities are learned competitively with Hebbian plasticity operating within a local spatiotemporal pooling range, is capable of gradually achieving feature selectivity and transformation tolerance, so that the top level neurons carry higher mutual information about object categories than a single-level neural network. Furthermore, when genetic algorithm is applied to search for a network architecture that maximizes transformation-invariant object recognition performance, in conjunction with the associative learning algorithm, it is found that deep networks outperform shallower ones

arXiv.org e-Print Archive

Efficient Global Point Cloud Alignment using Bayesian Nonparametric Mixtures

Author: Campbell Trevor
Fisher III John W.
How Jonathan P.
Straub Julian
Publication venue
Publication date: 21/11/2016
Field of study

Point cloud alignment is a common problem in computer vision and robotics, with applications ranging from 3D object recognition to reconstruction. We propose a novel approach to the alignment problem that utilizes Bayesian nonparametrics to describe the point cloud and surface normal densities, and branch and bound (BB) optimization to recover the relative transformation. BB uses a novel, refinable, near-uniform tessellation of rotation space using 4D tetrahedra, leading to more efficient optimization compared to the common axis-angle tessellation. We provide objective function bounds for pruning given the proposed tessellation, and prove that BB converges to the optimum of the cost function along with providing its computational complexity. Finally, we empirically demonstrate the efficiency of the proposed approach as well as its robustness to real-world conditions such as missing data and partial overlap

arXiv.org e-Print Archive