13 research outputs found
Object Detection Through Exploration With A Foveated Visual Field
We present a foveated object detector (FOD) as a biologically-inspired
alternative to the sliding window (SW) approach which is the dominant method of
search in computer vision object detection. Similar to the human visual system,
the FOD has higher resolution at the fovea and lower resolution at the visual
periphery. Consequently, more computational resources are allocated at the
fovea and relatively fewer at the periphery. The FOD processes the entire
scene, uses retino-specific object detection classifiers to guide eye
movements, aligns its fovea with regions of interest in the input image and
integrates observations across multiple fixations. Our approach combines modern
object detectors from computer vision with a recent model of peripheral pooling
regions found at the V1 layer of the human visual system. We assessed various
eye movement strategies on the PASCAL VOC 2007 dataset and show that the FOD
performs on par with the SW detector while bringing significant computational
cost savings.Comment: An extended version of this manuscript was published in PLOS
Computational Biology (October 2017) at
https://doi.org/10.1371/journal.pcbi.100574
Weakly Supervised Localization and Learning with Generic Knowledge
ISSN:0920-5691ISSN:1573-140
PatchNet
We introduce PatchNets, a compact, hierarchical representation describing structural and appearance characteristics of image regions, for use in image editing. In a PatchNet, an image region with coherent appearance is summarized by a graph node, associated with a single representative patch, while geometric relationships between different regions are encoded by labelled graph edges giving contextual information. The hierarchical structure of a PatchNet allows a coarse-to-fine description of the image. We show how this PatchNet representation can be used as a basis for interactive, library-driven, image editing. The user draws rough sketches to quickly specify editing constraints for the target image. The system then automatically queries an image library to find semantically-compatible candidate regions to meet the editing goal. Contextual image matching is performed using the PatchNet representation, allowing suitable regions to be found and applied in a few seconds, even from a library containing thousands of images