Search CORE

20,498 research outputs found

What do we perceive in a glance of a real-world scene?

Author: Iyer Asha
Koch Christof
Li Fei Fei
Perona Pietro
Publication venue: 'Association for Research in Vision and Ophthalmology (ARVO)'
Publication date: 01/01/2007
Field of study

What do we see when we glance at a natural scene and how does it change as the glance becomes longer? We asked naive subjects to report in a free-form format what they saw when looking at briefly presented real-life photographs. Our subjects received no specific information as to the content of each stimulus. Thus, our paradigm differs from previous studies where subjects were cued before a picture was presented and/or were probed with multiple-choice questions. In the first stage, 90 novel grayscale photographs were foveally shown to a group of 22 native-English-speaking subjects. The presentation time was chosen at random from a set of seven possible times (from 27 to 500 ms). A perceptual mask followed each photograph immediately. After each presentation, subjects reported what they had just seen as completely and truthfully as possible. In the second stage, another group of naive individuals was instructed to score each of the descriptions produced by the subjects in the first stage. Individual scores were assigned to more than a hundred different attributes. We show that within a single glance, much object- and scene-level information is perceived by human subjects. The richness of our perception, though, seems asymmetrical. Subjects tend to have a propensity toward perceiving natural scenes as being outdoor rather than indoor. The reporting of sensory- or feature-level information of a scene (such as shading and shape) consistently precedes the reporting of the semantic-level information. But once subjects recognize more semantic-level components of a scene, there is little evidence suggesting any bias toward either scene-level or object-level recognition

CiteSeerX

Caltech Authors

Deep Thermal Imaging: Proximate Material Type Recognition in the Wild through Deep Learning of Spatial Surface Temperature Patterns

Author: Anind
Callister William D.
Cho Youngjun
Cho Youngjun
Hendriks Antonius
Holone H.
Jaderberg Max
Krizhevsky Alex
Kurz Daniel
Leonard John J.
Liu C.
Liu H.
Lloyd J. M.
Marks
Nair Vinod
Ren Shaoqing
Romera-Paredes Bernardino
Victor
Vollmer Michael
Zhu Xiangxin
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 06/03/2018
Field of study

We introduce Deep Thermal Imaging, a new approach for close-range automatic recognition of materials to enhance the understanding of people and ubiquitous technologies of their proximal environment. Our approach uses a low-cost mobile thermal camera integrated into a smartphone to capture thermal textures. A deep neural network classifies these textures into material types. This approach works effectively without the need for ambient light sources or direct contact with materials. Furthermore, the use of a deep learning network removes the need to handcraft the set of features for different materials. We evaluated the performance of the system by training it to recognise 32 material types in both indoor and outdoor environments. Our approach produced recognition accuracies above 98% in 14,860 images of 15 indoor materials and above 89% in 26,584 images of 17 outdoor materials. We conclude by discussing its potentials for real-time use in HCI applications and future directions.Comment: Proceedings of the 2018 CHI Conference on Human Factors in Computing System

arXiv.org e-Print Archive

Crossref

UCL Discovery

End-to-End Learning of Representations for Asynchronous Event-Based Data

Author: Derpanis Konstantinos G.
Gehrig Daniel
Loquercio Antonio
Scaramuzza Davide
Publication venue
Publication date: 20/08/2019
Field of study

Event cameras are vision sensors that record asynchronous streams of per-pixel brightness changes, referred to as "events". They have appealing advantages over frame-based cameras for computer vision, including high temporal resolution, high dynamic range, and no motion blur. Due to the sparse, non-uniform spatiotemporal layout of the event signal, pattern recognition algorithms typically aggregate events into a grid-based representation and subsequently process it by a standard vision pipeline, e.g., Convolutional Neural Network (CNN). In this work, we introduce a general framework to convert event streams into grid-based representations through a sequence of differentiable operations. Our framework comes with two main advantages: (i) allows learning the input event representation together with the task dedicated network in an end to end manner, and (ii) lays out a taxonomy that unifies the majority of extant event representations in the literature and identifies novel ones. Empirically, we show that our approach to learning the event representation end-to-end yields an improvement of approximately 12% on optical flow estimation and object recognition over state-of-the-art methods.Comment: To appear at ICCV 201

arXiv.org e-Print Archive

Crossref

ZORA