20,814 research outputs found
Compensating for Large In-Plane Rotations in Natural Images
Rotation invariance has been studied in the computer vision community
primarily in the context of small in-plane rotations. This is usually achieved
by building invariant image features. However, the problem of achieving
invariance for large rotation angles remains largely unexplored. In this work,
we tackle this problem by directly compensating for large rotations, as opposed
to building invariant features. This is inspired by the neuro-scientific
concept of mental rotation, which humans use to compare pairs of rotated
objects. Our contributions here are three-fold. First, we train a Convolutional
Neural Network (CNN) to detect image rotations. We find that generic CNN
architectures are not suitable for this purpose. To this end, we introduce a
convolutional template layer, which learns representations for canonical
'unrotated' images. Second, we use Bayesian Optimization to quickly sift
through a large number of candidate images to find the canonical 'unrotated'
image. Third, we use this method to achieve robustness to large angles in an
image retrieval scenario. Our method is task-agnostic, and can be used as a
pre-processing step in any computer vision system.Comment: Accepted at Indian Conference on Computer Vision, Graphics and Image
Processing (ICVGIP) 201
Learning to Prevent Monocular SLAM Failure using Reinforcement Learning
Monocular SLAM refers to using a single camera to estimate robot ego motion
while building a map of the environment. While Monocular SLAM is a well studied
problem, automating Monocular SLAM by integrating it with trajectory planning
frameworks is particularly challenging. This paper presents a novel formulation
based on Reinforcement Learning (RL) that generates fail safe trajectories
wherein the SLAM generated outputs do not deviate largely from their true
values. Quintessentially, the RL framework successfully learns the otherwise
complex relation between perceptual inputs and motor actions and uses this
knowledge to generate trajectories that do not cause failure of SLAM. We show
systematically in simulations how the quality of the SLAM dramatically improves
when trajectories are computed using RL. Our method scales effectively across
Monocular SLAM frameworks in both simulation and in real world experiments with
a mobile robot.Comment: Accepted at the 11th Indian Conference on Computer Vision, Graphics
and Image Processing (ICVGIP) 2018 More info can be found at the project page
at https://robotics.iiit.ac.in/people/vignesh.prasad/SLAMSafePlanner.html and
the supplementary video can be found at
https://www.youtube.com/watch?v=420QmM_Z8v
STEFANN: Scene Text Editor using Font Adaptive Neural Network
Textual information in a captured scene plays an important role in scene
interpretation and decision making. Though there exist methods that can
successfully detect and interpret complex text regions present in a scene, to
the best of our knowledge, there is no significant prior work that aims to
modify the textual information in an image. The ability to edit text directly
on images has several advantages including error correction, text restoration
and image reusability. In this paper, we propose a method to modify text in an
image at character-level. We approach the problem in two stages. At first, the
unobserved character (target) is generated from an observed character (source)
being modified. We propose two different neural network architectures - (a)
FANnet to achieve structural consistency with source font and (b) Colornet to
preserve source color. Next, we replace the source character with the generated
character maintaining both geometric and visual consistency with neighboring
characters. Our method works as a unified platform for modifying text in
images. We present the effectiveness of our method on COCO-Text and ICDAR
datasets both qualitatively and quantitatively.Comment: Accepted in The IEEE Conference on Computer Vision and Pattern
Recognition (CVPR) 202
- …