2,689 research outputs found

    Towards Active Image Segmentation: the Foveal Bounded Irregular Pyramid

    Get PDF
    Presentado en: 2nd workshop on Recognition and Action for Scene Understanding York, Inglaterra August 30, 2013It is well established that the units of attention on human vision are not merely spatial but closely related to perceptual objects. This implies a strong relationship between segmentation and attention processes. This interaction is bi-directional: if the segmentation process constraints attention, the way an image is segmented may depend on the specific question asked to an observer, i.e. what she 'attend' in this sense. When the focus of attention is deployed from one visual unit to another, the rest of the scene is perceived but at a lower resolution that the focused object. The result is a multi-resolution visual perception in which the fovea, a dimple on the central retina, provides the highest resolution vision. While much work has recently been focused on computational models for object-based attention, the design and development of multi-resolution structures that can segment the input image according to the focused perceptual unit is largely unexplored. This paper proposes a novel structure for multi-resolution image segmentation that extends the encoding provided by the Bounded Irregular Pyramid. Bottom-up attention is enclosed in the same structure, allowing to set the fovea over the most salient image region. Preliminary results obtained from the segmentation of natural images show that the performance of the approach is good in terms of speed and accuracy.Universidad de Málaga. Campus de Excelencia Internacional Andalucía Tech

    Hierarchical Text Spotter for Joint Text Spotting and Layout Analysis

    Full text link
    We propose Hierarchical Text Spotter (HTS), a novel method for the joint task of word-level text spotting and geometric layout analysis. HTS can recognize text in an image and identify its 4-level hierarchical structure: characters, words, lines, and paragraphs. The proposed HTS is characterized by two novel components: (1) a Unified-Detector-Polygon (UDP) that produces Bezier Curve polygons of text lines and an affinity matrix for paragraph grouping between detected lines; (2) a Line-to-Character-to-Word (L2C2W) recognizer that splits lines into characters and further merges them back into words. HTS achieves state-of-the-art results on multiple word-level text spotting benchmark datasets as well as geometric layout analysis tasks.Comment: Accepted to WACV 202

    Data-Driven Shape Analysis and Processing

    Full text link
    Data-driven methods play an increasingly important role in discovering geometric, structural, and semantic relationships between 3D shapes in collections, and applying this analysis to support intelligent modeling, editing, and visualization of geometric data. In contrast to traditional approaches, a key feature of data-driven approaches is that they aggregate information from a collection of shapes to improve the analysis and processing of individual shapes. In addition, they are able to learn models that reason about properties and relationships of shapes without relying on hard-coded rules or explicitly programmed instructions. We provide an overview of the main concepts and components of these techniques, and discuss their application to shape classification, segmentation, matching, reconstruction, modeling and exploration, as well as scene analysis and synthesis, through reviewing the literature and relating the existing works with both qualitative and numerical comparisons. We conclude our report with ideas that can inspire future research in data-driven shape analysis and processing.Comment: 10 pages, 19 figure

    ARTSCENE: A Neural System for Natural Scene Classification

    Full text link
    How do humans rapidly recognize a scene? How can neural models capture this biological competence to achieve state-of-the-art scene classification? The ARTSCENE neural system classifies natural scene photographs by using multiple spatial scales to efficiently accumulate evidence for gist and texture. ARTSCENE embodies a coarse-to-fine Texture Size Ranking Principle whereby spatial attention processes multiple scales of scenic information, ranging from global gist to local properties of textures. The model can incrementally learn and predict scene identity by gist information alone and can improve performance through selective attention to scenic textures of progressively smaller size. ARTSCENE discriminates 4 landscape scene categories (coast, forest, mountain and countryside) with up to 91.58% correct on a test set, outperforms alternative models in the literature which use biologically implausible computations, and outperforms component systems that use either gist or texture information alone. Model simulations also show that adjacent textures form higher-order features that are also informative for scene recognition.National Science Foundation (NSF SBE-0354378); Office of Naval Research (N00014-01-1-0624

    Fusion of gaze with hierarchical image segmentation for robust object detection

    Get PDF
    Thesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2004.Includes bibliographical references (p. 41).We present Flycatcher, a prototype system illustrating the idea of gaze- based image processing in the context of object segmentation for wearable photography. The prototype includes a wearable eye tracking device that captures real-time eyetraces of a user, and a wearable video camera that captures first-person perspective images of the user's visual environment. The system combines the deliberate eyetraces of the user with hierarchical image segmentation applied to scene images to achieve reliable object segmentation. In evaluations with certain classes of real-world images, fusion of gaze and image segmentation information led to higher object detection accuracy than either signal alone. Flycatcher may be integrated with assistive communication devices, enabling individuals with severe motor impairments to use eye control to communicate about objects in their environment. The system also represents a promising step toward an eye-driven interface for "copy and paste" visual memory augmentation in wearable computing applications.by Jeffrey M. Bartelma.M.Eng
    corecore