27,337 research outputs found

    Event-based Vision: A Survey

    Get PDF
    Event cameras are bio-inspired sensors that differ from conventional frame cameras: Instead of capturing images at a fixed rate, they asynchronously measure per-pixel brightness changes, and output a stream of events that encode the time, location and sign of the brightness changes. Event cameras offer attractive properties compared to traditional cameras: high temporal resolution (in the order of microseconds), very high dynamic range (140 dB vs. 60 dB), low power consumption, and high pixel bandwidth (on the order of kHz) resulting in reduced motion blur. Hence, event cameras have a large potential for robotics and computer vision in challenging scenarios for traditional cameras, such as low-latency, high speed, and high dynamic range. However, novel methods are required to process the unconventional output of these sensors in order to unlock their potential. This paper provides a comprehensive overview of the emerging field of event-based vision, with a focus on the applications and the algorithms developed to unlock the outstanding properties of event cameras. We present event cameras from their working principle, the actual sensors that are available and the tasks that they have been used for, from low-level vision (feature detection and tracking, optic flow, etc.) to high-level vision (reconstruction, segmentation, recognition). We also discuss the techniques developed to process events, including learning-based techniques, as well as specialized processors for these novel sensors, such as spiking neural networks. Additionally, we highlight the challenges that remain to be tackled and the opportunities that lie ahead in the search for a more efficient, bio-inspired way for machines to perceive and interact with the world

    Viewfinder: final activity report

    Get PDF
    The VIEW-FINDER project (2006-2009) is an 'Advanced Robotics' project that seeks to apply a semi-autonomous robotic system to inspect ground safety in the event of a fire. Its primary aim is to gather data (visual and chemical) in order to assist rescue personnel. A base station combines the gathered information with information retrieved from off-site sources. The project addresses key issues related to map building and reconstruction, interfacing local command information with external sources, human-robot interfaces and semi-autonomous robot navigation. The VIEW-FINDER system is a semi-autonomous; the individual robot-sensors operate autonomously within the limits of the task assigned to them, that is, they will autonomously navigate through and inspect an area. Human operators monitor their operations and send high level task requests as well as low level commands through the interface to any nodes in the entire system. The human interface has to ensure the human supervisor and human interveners are provided a reduced but good and relevant overview of the ground and the robots and human rescue workers therein

    Creating virtual models from uncalibrated camera views

    Get PDF
    The reconstruction of photorealistic 3D models from camera views is becoming an ubiquitous element in many applications that simulate physical interaction with the real world. In this paper, we present a low-cost, interactive pipeline aimed at non-expert users, that achieves 3D reconstruction from multiple views acquired with a standard digital camera. 3D models are amenable to access through diverse representation modalities that typically imply trade-offs between level of detail, interaction, and computational costs. Our approach allows users to selectively control the complexity of different surface regions, while requiring only simple 2D image editing operations. An initial reconstruction at coarse resolution is followed by an iterative refining of the surface areas corresponding to the selected regions

    A Model of the Ventral Visual System Based on Temporal Stability and Local Memory

    Get PDF
    The cerebral cortex is a remarkably homogeneous structure suggesting a rather generic computational machinery. Indeed, under a variety of conditions, functions attributed to specialized areas can be supported by other regions. However, a host of studies have laid out an ever more detailed map of functional cortical areas. This leaves us with the puzzle of whether different cortical areas are intrinsically specialized, or whether they differ mostly by their position in the processing hierarchy and their inputs but apply the same computational principles. Here we show that the computational principle of optimal stability of sensory representations combined with local memory gives rise to a hierarchy of processing stages resembling the ventral visual pathway when it is exposed to continuous natural stimuli. Early processing stages show receptive fields similar to those observed in the primary visual cortex. Subsequent stages are selective for increasingly complex configurations of local features, as observed in higher visual areas. The last stage of the model displays place fields as observed in entorhinal cortex and hippocampus. The results suggest that functionally heterogeneous cortical areas can be generated by only a few computational principles and highlight the importance of the variability of the input signals in forming functional specialization

    On Acquisition and Analysis of a Dataset Comprising of Gait, Ear and Semantic data

    No full text
    In outdoor scenarios such as surveillance where there is very little control over the environments, complex computer vision algorithms are often required for analysis. However constrained environments, such as walkways in airports where the surroundings and the path taken by individuals can be controlled, provide an ideal application for such systems. Figure 1.1 depicts an idealised constrained environment. The path taken by the subject is restricted to a narrow path and once inside is in a volume where lighting and other conditions are controlled to facilitate biometric analysis. The ability to control the surroundings and the flow of people greatly simplifes the computer vision task, compared to typical unconstrained environments. Even though biometric datasets with greater than one hundred people are increasingly common, there is still very little known about the inter and intra-subject variation in many biometrics. This information is essential to estimate the recognition capability and limits of automatic recognition systems. In order to accurately estimate the inter- and the intra- class variance, substantially larger datasets are required [40]. Covariates such as facial expression, headwear, footwear type, surface type and carried items are attracting increasing attention; although considering the potentially large impact on an individuals biometrics, large trials need to be conducted to establish how much variance results. This chapter is the first description of the multibiometric data acquired using the University of Southampton's Multi-Biometric Tunnel [26, 37]; a biometric portal using automatic gait, face and ear recognition for identification purposes. The tunnel provides a constrained environment and is ideal for use in high throughput security scenarios and for the collection of large datasets. We describe the current state of data acquisition of face, gait, ear, and semantic data and present early results showing the quality and range of data that has been collected. The main novelties of this dataset in comparison with other multi-biometric datasets are: 1. gait data exists for multiple views and is synchronised, allowing 3D reconstruction and analysis; 2. the face data is a sequence of images allowing for face recognition in video; 3. the ear data is acquired in a relatively unconstrained environment, as a subject walks past; and 4. the semantic data is considerably more extensive than has been available previously. We shall aim to show the advantages of this new data in biometric analysis, though the scope for such analysis is considerably greater than time and space allows for here

    Towards binocular active vision in a robot head system

    Get PDF
    This paper presents the first results of an investigation and pilot study into an active, binocular vision system that combines binocular vergence, object recognition and attention control in a unified framework. The prototype developed is capable of identifying, targeting, verging on and recognizing objects in a highly-cluttered scene without the need for calibration or other knowledge of the camera geometry. This is achieved by implementing all image analysis in a symbolic space without creating explicit pixel-space maps. The system structure is based on the ‘searchlight metaphor’ of biological systems. We present results of a first pilot investigation that yield a maximum vergence error of 6.4 pixels, while seven of nine known objects were recognized in a high-cluttered environment. Finally a “stepping stone” visual search strategy was demonstrated, taking a total of 40 saccades to find two known objects in the workspace, neither of which appeared simultaneously within the Field of View resulting from any individual saccade
    corecore