2,075 research outputs found

    Event-Driven Stereo Visual Tracking Algorithm to Solve Object Occlusion

    Get PDF
    Object tracking is a major problem for many computer vision applications, but it continues to be computationally expensive. The use of bio-inspired neuromorphic event-driven dynamic vision sensors (DVSs) has heralded new methods for vision processing, exploiting reduced amount of data and very precise timing resolutions. Previous studies have shown these neural spiking sensors to be well suited to implementing singlesensor object tracking systems, although they experience difficulties when solving ambiguities caused by object occlusion. DVSs have also performed well in 3-D reconstruction in which event matching techniques are applied in stereo setups. In this paper, we propose a new event-driven stereo object tracking algorithm that simultaneously integrates 3-D reconstruction and cluster tracking, introducing feedback information in both tasks to improve their respective performances. This algorithm, inspired by human vision, identifies objects and learns their position and size in order to solve ambiguities. This strategy has been validated in four different experiments where the 3-D positions of two objects were tracked in a stereo setup even when occlusion occurred. The objects studied in the experiments were: 1) two swinging pens, the distance between which during movement was measured with an error of less than 0.5%; 2) a pen and a box, to confirm the correctness of the results obtained with a more complex object; 3) two straws attached to a fan and rotating at 6 revolutions per second, to demonstrate the high-speed capabilities of this approach; and 4) two people walking in a real-world environment.Ministerio de Economía y Competitividad TEC2012-37868-C04-01Ministerio de Economía y Competitividad TEC2015-63884-C2-1-PJunta de Andalucía TIC-609

    DART: Distribution Aware Retinal Transform for Event-based Cameras

    Full text link
    We introduce a generic visual descriptor, termed as distribution aware retinal transform (DART), that encodes the structural context using log-polar grids for event cameras. The DART descriptor is applied to four different problems, namely object classification, tracking, detection and feature matching: (1) The DART features are directly employed as local descriptors in a bag-of-features classification framework and testing is carried out on four standard event-based object datasets (N-MNIST, MNIST-DVS, CIFAR10-DVS, NCaltech-101). (2) Extending the classification system, tracking is demonstrated using two key novelties: (i) For overcoming the low-sample problem for the one-shot learning of a binary classifier, statistical bootstrapping is leveraged with online learning; (ii) To achieve tracker robustness, the scale and rotation equivariance property of the DART descriptors is exploited for the one-shot learning. (3) To solve the long-term object tracking problem, an object detector is designed using the principle of cluster majority voting. The detection scheme is then combined with the tracker to result in a high intersection-over-union score with augmented ground truth annotations on the publicly available event camera dataset. (4) Finally, the event context encoded by DART greatly simplifies the feature correspondence problem, especially for spatio-temporal slices far apart in time, which has not been explicitly tackled in the event-based vision domain.Comment: 12 pages, revision submitted to TPAMI in Nov 201

    Disparity map generation based on trapezoidal camera architecture for multiview video

    Get PDF
    Visual content acquisition is a strategic functional block of any visual system. Despite its wide possibilities, the arrangement of cameras for the acquisition of good quality visual content for use in multi-view video remains a huge challenge. This paper presents the mathematical description of trapezoidal camera architecture and relationships which facilitate the determination of camera position for visual content acquisition in multi-view video, and depth map generation. The strong point of Trapezoidal Camera Architecture is that it allows for adaptive camera topology by which points within the scene, especially the occluded ones can be optically and geometrically viewed from several different viewpoints either on the edge of the trapezoid or inside it. The concept of maximum independent set, trapezoid characteristics, and the fact that the positions of cameras (with the exception of few) differ in their vertical coordinate description could very well be used to address the issue of occlusion which continues to be a major problem in computer vision with regards to the generation of depth map

    Event-based Vision: A Survey

    Get PDF
    Event cameras are bio-inspired sensors that differ from conventional frame cameras: Instead of capturing images at a fixed rate, they asynchronously measure per-pixel brightness changes, and output a stream of events that encode the time, location and sign of the brightness changes. Event cameras offer attractive properties compared to traditional cameras: high temporal resolution (in the order of microseconds), very high dynamic range (140 dB vs. 60 dB), low power consumption, and high pixel bandwidth (on the order of kHz) resulting in reduced motion blur. Hence, event cameras have a large potential for robotics and computer vision in challenging scenarios for traditional cameras, such as low-latency, high speed, and high dynamic range. However, novel methods are required to process the unconventional output of these sensors in order to unlock their potential. This paper provides a comprehensive overview of the emerging field of event-based vision, with a focus on the applications and the algorithms developed to unlock the outstanding properties of event cameras. We present event cameras from their working principle, the actual sensors that are available and the tasks that they have been used for, from low-level vision (feature detection and tracking, optic flow, etc.) to high-level vision (reconstruction, segmentation, recognition). We also discuss the techniques developed to process events, including learning-based techniques, as well as specialized processors for these novel sensors, such as spiking neural networks. Additionally, we highlight the challenges that remain to be tackled and the opportunities that lie ahead in the search for a more efficient, bio-inspired way for machines to perceive and interact with the world

    EBBINNOT: A Hardware Efficient Hybrid Event-Frame Tracker for Stationary Dynamic Vision Sensors

    Full text link
    As an alternative sensing paradigm, dynamic vision sensors (DVS) have been recently explored to tackle scenarios where conventional sensors result in high data rate and processing time. This paper presents a hybrid event-frame approach for detecting and tracking objects recorded by a stationary neuromorphic sensor, thereby exploiting the sparse DVS output in a low-power setting for traffic monitoring. Specifically, we propose a hardware efficient processing pipeline that optimizes memory and computational needs that enable long-term battery powered usage for IoT applications. To exploit the background removal property of a static DVS, we propose an event-based binary image creation that signals presence or absence of events in a frame duration. This reduces memory requirement and enables usage of simple algorithms like median filtering and connected component labeling for denoise and region proposal respectively. To overcome the fragmentation issue, a YOLO inspired neural network based detector and classifier to merge fragmented region proposals has been proposed. Finally, a new overlap based tracker was implemented, exploiting overlap between detections and tracks is proposed with heuristics to overcome occlusion. The proposed pipeline is evaluated with more than 5 hours of traffic recording spanning three different locations on two different neuromorphic sensors (DVS and CeleX) and demonstrate similar performance. Compared to existing event-based feature trackers, our method provides similar accuracy while needing approx 6 times less computes. To the best of our knowledge, this is the first time a stationary DVS based traffic monitoring solution is extensively compared to simultaneously recorded RGB frame-based methods while showing tremendous promise by outperforming state-of-the-art deep learning solutions.Comment: 16 pages, 13 figure

    Bio-Inspired Stereo Vision Calibration for Dynamic Vision Sensors

    Get PDF
    Many advances have been made in the eld of computer vision. Several recent research trends have focused on mimicking human vision by using a stereo vision system. In multi-camera systems, a calibration process is usually implemented to improve the results accuracy. However, these systems generate a large amount of data to be processed; therefore, a powerful computer is required and, in many cases, this cannot be done in real time. Neuromorphic Engineering attempts to create bio-inspired systems that mimic the information processing that takes place in the human brain. This information is encoded using pulses (or spikes) and the generated systems are much simpler (in computational operations and resources), which allows them to perform similar tasks with much lower power consumption, thus these processes can be developed over specialized hardware with real-time processing. In this work, a bio-inspired stereovision system is presented, where a calibration mechanism for this system is implemented and evaluated using several tests. The result is a novel calibration technique for a neuromorphic stereo vision system, implemented over specialized hardware (FPGA - Field-Programmable Gate Array), which allows obtaining reduced latencies on hardware implementation for stand-alone systems, and working in real time.Ministerio de Economía y Competitividad TEC2016-77785-PMinisterio de Economía y Competitividad TIN2016-80644-

    Motion analysis report

    Get PDF
    Human motion analysis is the task of converting actual human movements into computer readable data. Such movement information may be obtained though active or passive sensing methods. Active methods include physical measuring devices such as goniometers on joints of the body, force plates, and manually operated sensors such as a Cybex dynamometer. Passive sensing de-couples the position measuring device from actual human contact. Passive sensors include Selspot scanning systems (since there is no mechanical connection between the subject's attached LEDs and the infrared sensing cameras), sonic (spark-based) three-dimensional digitizers, Polhemus six-dimensional tracking systems, and image processing systems based on multiple views and photogrammetric calculations
    corecore