Search CORE

11,536 research outputs found

Shape Representations Using Nested Descriptors

Author: Byrne Jeffrey
Publication venue: ScholarlyCommons
Publication date: 01/01/2014
Field of study

The problem of shape representation is a core problem in computer vision. It can be argued that shape representation is the most central representational problem for computer vision, since unlike texture or color, shape alone can be used for perceptual tasks such as image matching, object detection and object categorization. This dissertation introduces a new shape representation called the nested descriptor. A nested descriptor represents shape both globally and locally by pooling salient scaled and oriented complex gradients in a large nested support set. We show that this nesting property introduces a nested correlation structure that enables a new local distance function called the nesting distance, which provides a provably robust similarity function for image matching. Furthermore, the nesting property suggests an elegant flower like normalization strategy called a log-spiral difference. We show that this normalization enables a compact binary representation and is equivalent to a form a bottom up saliency. This suggests that the nested descriptor representational power is due to representing salient edges, which makes a fundamental connection between the saliency and local feature descriptor literature. In this dissertation, we introduce three examples of shape representation using nested descriptors: nested shape descriptors for imagery, nested motion descriptors for video and nested pooling for activities. We show evaluation results for these representations that demonstrate state-of-the-art performance for image matching, wide baseline stereo and activity recognition tasks

ScholarlyCommons@Penn

Pick and Place Without Geometric Object Models

Author: Gualtieri Marcus
Pas Andreas ten
Platt Robert
Publication venue
Publication date: 22/02/2018
Field of study

We propose a novel formulation of robotic pick and place as a deep reinforcement learning (RL) problem. Whereas most deep RL approaches to robotic manipulation frame the problem in terms of low level states and actions, we propose a more abstract formulation. In this formulation, actions are target reach poses for the hand and states are a history of such reaches. We show this approach can solve a challenging class of pick-place and regrasping problems where the exact geometry of the objects to be handled is unknown. The only information our method requires is: 1) the sensor perception available to the robot at test time; 2) prior knowledge of the general class of objects for which the system was trained. We evaluate our method using objects belonging to two different categories, mugs and bottles, both in simulation and on real hardware. Results show a major improvement relative to a shape primitives baseline

arXiv.org e-Print Archive

Crossref

Detecting the presence of large buildings in natural images

Author: Le Borgne Hervé
Malobabić Jovanka
Murphy Noel
O'Connor Noel E.
Publication venue
Publication date: 01/06/2005
Field of study

This paper addresses the issue of classification of lowlevel features into high-level semantic concepts for the purpose of semantic annotation of consumer photographs. We adopt a multi-scale approach that relies on edge detection to extract an edge orientation-based feature description of the image, and apply an SVM learning technique to infer the presence of a dominant building object in a general purpose collection of digital photographs. The approach exploits prior knowledge on the image context through an assumption that all input images are �outdoor�, i.e. indoor/outdoor classification (the context determination stage) has been performed. The proposed approach is validated on a diverse dataset of 1720 images and its performance compared with that of the MPEG-7 edge histogram descriptor

Irish Universities

DCU Online Research Access Service

An improved image segmentation algorithm for salient object detection

Author: Geva Shlomo
Li Zhengrong
Liu Yuee
Tjondronegoro Dian
Zhang Jinglan
Publication venue: IEEE Computer Society
Publication date: 01/01/2008
Field of study

Semantic object detection is one of the most important and challenging problems in image analysis. Segmentation is an optimal approach to detect salient objects, but often fails to generate meaningful regions due to over-segmentation. This paper presents an improved semantic segmentation approach which is based on JSEG algorithm and utilizes multiple region merging criteria. The experimental results demonstrate that the proposed algorithm is encouraging and effective in salient object detection

Queensland University of Technology ePrints Archive

A deep representation for depth images from synthetic data

Author: Caputo Barbara
Carlucci Fabio Maria
Russo Paolo
Publication venue
Publication date: 30/09/2016
Field of study

Convolutional Neural Networks (CNNs) trained on large scale RGB databases have become the secret sauce in the majority of recent approaches for object categorization from RGB-D data. Thanks to colorization techniques, these methods exploit the filters learned from 2D images to extract meaningful representations in 2.5D. Still, the perceptual signature of these two kind of images is very different, with the first usually strongly characterized by textures, and the second mostly by silhouettes of objects. Ideally, one would like to have two CNNs, one for RGB and one for depth, each trained on a suitable data collection, able to capture the perceptual properties of each channel for the task at hand. This has not been possible so far, due to the lack of a suitable depth database. This paper addresses this issue, proposing to opt for synthetically generated images rather than collecting by hand a 2.5D large scale database. While being clearly a proxy for real data, synthetic images allow to trade quality for quantity, making it possible to generate a virtually infinite amount of data. We show that the filters learned from such data collection, using the very same architecture typically used on visual data, learns very different filters, resulting in depth features (a) able to better characterize the different facets of depth images, and (b) complementary with respect to those derived from CNNs pre-trained on 2D datasets. Experiments on two publicly available databases show the power of our approach

arXiv.org e-Print Archive

Crossref

Archivio della ricerca- Università di Roma La Sapienza

Recommended from our members

A model of ganglion axon pathways accounts for percepts elicited by retinal implants.

Author: Beyeler Michael
Boynton Geoffrey M
Fine Ione
Nanduri Devyani
Rokem Ariel
Weiland James D
Publication venue: eScholarship, University of California
Publication date: 01/06/2019
Field of study

Degenerative retinal diseases such as retinitis pigmentosa and macular degeneration cause irreversible vision loss in more than 10 million people worldwide. Retinal prostheses, now implanted in over 250 patients worldwide, electrically stimulate surviving cells in order to evoke neuronal responses that are interpreted by the brain as visual percepts ('phosphenes'). However, instead of seeing focal spots of light, current implant users perceive highly distorted phosphenes that vary in shape both across subjects and electrodes. We characterized these distortions by asking users of the Argus retinal prosthesis system (Second Sight Medical Products Inc.) to draw electrically elicited percepts on a touchscreen. Using ophthalmic fundus imaging and computational modeling, we show that elicited percepts can be accurately predicted by the topographic organization of optic nerve fiber bundles in each subject's retina, successfully replicating visual percepts ranging from 'blobs' to oriented 'streaks' and 'wedges' depending on the retinal location of the stimulating electrode. This provides the first evidence that activation of passing axon fibers accounts for the rich repertoire of phosphene shape commonly reported in psychophysical experiments, which can severely distort the quality of the generated visual experience. Overall our findings argue for more detailed modeling of biological detail across neural engineering applications

eScholarship - University of California

Cumulative object categorization in clutter

Author: Balint-Benczedi Ferenc
Beetz Michael
Martinez Mozos Oscar
Marton Zoltan-Csaba
Pangercic Dejan
Publication venue: ACIN: Automation and Control Institute, University of Technology, Vienna, Austria)
Publication date: 27/06/2013
Field of study

In this paper we present an approach based on scene- or part-graphs for geometrically categorizing touching and occluded objects. We use additive RGBD feature descriptors and hashing of graph conﬁguration parameters for describing the spatial arrangement of constituent parts. The presented experiments quantify that this method outperforms our earlier part-voting and sliding window classiﬁcation. We evaluated our approach on cluttered scenes, and by using a 3D dataset containing over 15000 Kinect scans of over 100 objects which were grouped into general geometric categories. Additionally, color, geometric, and combined features were compared for categorization tasks

University of Lincoln Institutional Repository

Institute of Transport Research:Publications

Leveraging Deep Visual Descriptors for Hierarchical Efficient Localization

Author: Cadena Cesar
Debraine Frédéric
Dymczyk Marcin
Sarlin Paul-Edouard
Siegwart Roland
Publication venue
Publication date: 01/01/2018
Field of study

Many robotics applications require precise pose estimates despite operating in large and changing environments. This can be addressed by visual localization, using a pre-computed 3D model of the surroundings. The pose estimation then amounts to finding correspondences between 2D keypoints in a query image and 3D points in the model using local descriptors. However, computational power is often limited on robotic platforms, making this task challenging in large-scale environments. Binary feature descriptors significantly speed up this 2D-3D matching, and have become popular in the robotics community, but also strongly impair the robustness to perceptual aliasing and changes in viewpoint, illumination and scene structure. In this work, we propose to leverage recent advances in deep learning to perform an efficient hierarchical localization. We first localize at the map level using learned image-wide global descriptors, and subsequently estimate a precise pose from 2D-3D matches computed in the candidate places only. This restricts the local search and thus allows to efficiently exploit powerful non-binary descriptors usually dismissed on resource-constrained devices. Our approach results in state-of-the-art localization performance while running in real-time on a popular mobile platform, enabling new prospects for robotics research.Comment: CoRL 2018 Camera-ready (fix typos and update citations

arXiv.org e-Print Archive

Repository for Publications and Research Data