2,075 research outputs found
Event-Driven Stereo Visual Tracking Algorithm to Solve Object Occlusion
Object tracking is a major problem for many computer
vision applications, but it continues to be computationally
expensive. The use of bio-inspired neuromorphic event-driven
dynamic vision sensors (DVSs) has heralded new methods for
vision processing, exploiting reduced amount of data and very
precise timing resolutions. Previous studies have shown these
neural spiking sensors to be well suited to implementing singlesensor
object tracking systems, although they experience difficulties
when solving ambiguities caused by object occlusion.
DVSs have also performed well in 3-D reconstruction in which
event matching techniques are applied in stereo setups. In this
paper, we propose a new event-driven stereo object tracking
algorithm that simultaneously integrates 3-D reconstruction
and cluster tracking, introducing feedback information in both
tasks to improve their respective performances. This algorithm,
inspired by human vision, identifies objects and learns their
position and size in order to solve ambiguities. This strategy
has been validated in four different experiments where the
3-D positions of two objects were tracked in a stereo setup even
when occlusion occurred. The objects studied in the experiments
were: 1) two swinging pens, the distance between which during
movement was measured with an error of less than 0.5%;
2) a pen and a box, to confirm the correctness of the results
obtained with a more complex object; 3) two straws attached to
a fan and rotating at 6 revolutions per second, to demonstrate
the high-speed capabilities of this approach; and 4) two people
walking in a real-world environment.Ministerio de Economía y Competitividad TEC2012-37868-C04-01Ministerio de Economía y Competitividad TEC2015-63884-C2-1-PJunta de Andalucía TIC-609
DART: Distribution Aware Retinal Transform for Event-based Cameras
We introduce a generic visual descriptor, termed as distribution aware
retinal transform (DART), that encodes the structural context using log-polar
grids for event cameras. The DART descriptor is applied to four different
problems, namely object classification, tracking, detection and feature
matching: (1) The DART features are directly employed as local descriptors in a
bag-of-features classification framework and testing is carried out on four
standard event-based object datasets (N-MNIST, MNIST-DVS, CIFAR10-DVS,
NCaltech-101). (2) Extending the classification system, tracking is
demonstrated using two key novelties: (i) For overcoming the low-sample problem
for the one-shot learning of a binary classifier, statistical bootstrapping is
leveraged with online learning; (ii) To achieve tracker robustness, the scale
and rotation equivariance property of the DART descriptors is exploited for the
one-shot learning. (3) To solve the long-term object tracking problem, an
object detector is designed using the principle of cluster majority voting. The
detection scheme is then combined with the tracker to result in a high
intersection-over-union score with augmented ground truth annotations on the
publicly available event camera dataset. (4) Finally, the event context encoded
by DART greatly simplifies the feature correspondence problem, especially for
spatio-temporal slices far apart in time, which has not been explicitly tackled
in the event-based vision domain.Comment: 12 pages, revision submitted to TPAMI in Nov 201
Disparity map generation based on trapezoidal camera architecture for multiview video
Visual content acquisition is a strategic functional block of any visual system. Despite its wide possibilities,
the arrangement of cameras for the acquisition of good quality visual content for use in multi-view video
remains a huge challenge. This paper presents the mathematical description of trapezoidal camera
architecture and relationships which facilitate the determination of camera position for visual content
acquisition in multi-view video, and depth map generation. The strong point of Trapezoidal Camera
Architecture is that it allows for adaptive camera topology by which points within the scene, especially the
occluded ones can be optically and geometrically viewed from several different viewpoints either on the
edge of the trapezoid or inside it. The concept of maximum independent set, trapezoid characteristics, and
the fact that the positions of cameras (with the exception of few) differ in their vertical coordinate
description could very well be used to address the issue of occlusion which continues to be a major
problem in computer vision with regards to the generation of depth map
Event-based Vision: A Survey
Event cameras are bio-inspired sensors that differ from conventional frame
cameras: Instead of capturing images at a fixed rate, they asynchronously
measure per-pixel brightness changes, and output a stream of events that encode
the time, location and sign of the brightness changes. Event cameras offer
attractive properties compared to traditional cameras: high temporal resolution
(in the order of microseconds), very high dynamic range (140 dB vs. 60 dB), low
power consumption, and high pixel bandwidth (on the order of kHz) resulting in
reduced motion blur. Hence, event cameras have a large potential for robotics
and computer vision in challenging scenarios for traditional cameras, such as
low-latency, high speed, and high dynamic range. However, novel methods are
required to process the unconventional output of these sensors in order to
unlock their potential. This paper provides a comprehensive overview of the
emerging field of event-based vision, with a focus on the applications and the
algorithms developed to unlock the outstanding properties of event cameras. We
present event cameras from their working principle, the actual sensors that are
available and the tasks that they have been used for, from low-level vision
(feature detection and tracking, optic flow, etc.) to high-level vision
(reconstruction, segmentation, recognition). We also discuss the techniques
developed to process events, including learning-based techniques, as well as
specialized processors for these novel sensors, such as spiking neural
networks. Additionally, we highlight the challenges that remain to be tackled
and the opportunities that lie ahead in the search for a more efficient,
bio-inspired way for machines to perceive and interact with the world
EBBINNOT: A Hardware Efficient Hybrid Event-Frame Tracker for Stationary Dynamic Vision Sensors
As an alternative sensing paradigm, dynamic vision sensors (DVS) have been
recently explored to tackle scenarios where conventional sensors result in high
data rate and processing time. This paper presents a hybrid event-frame
approach for detecting and tracking objects recorded by a stationary
neuromorphic sensor, thereby exploiting the sparse DVS output in a low-power
setting for traffic monitoring. Specifically, we propose a hardware efficient
processing pipeline that optimizes memory and computational needs that enable
long-term battery powered usage for IoT applications. To exploit the background
removal property of a static DVS, we propose an event-based binary image
creation that signals presence or absence of events in a frame duration. This
reduces memory requirement and enables usage of simple algorithms like median
filtering and connected component labeling for denoise and region proposal
respectively. To overcome the fragmentation issue, a YOLO inspired neural
network based detector and classifier to merge fragmented region proposals has
been proposed. Finally, a new overlap based tracker was implemented, exploiting
overlap between detections and tracks is proposed with heuristics to overcome
occlusion. The proposed pipeline is evaluated with more than 5 hours of traffic
recording spanning three different locations on two different neuromorphic
sensors (DVS and CeleX) and demonstrate similar performance. Compared to
existing event-based feature trackers, our method provides similar accuracy
while needing approx 6 times less computes. To the best of our knowledge, this
is the first time a stationary DVS based traffic monitoring solution is
extensively compared to simultaneously recorded RGB frame-based methods while
showing tremendous promise by outperforming state-of-the-art deep learning
solutions.Comment: 16 pages, 13 figure
Bio-Inspired Stereo Vision Calibration for Dynamic Vision Sensors
Many advances have been made in the eld of computer vision. Several recent research trends
have focused on mimicking human vision by using a stereo vision system. In multi-camera systems, a
calibration process is usually implemented to improve the results accuracy. However, these systems generate
a large amount of data to be processed; therefore, a powerful computer is required and, in many cases,
this cannot be done in real time. Neuromorphic Engineering attempts to create bio-inspired systems that
mimic the information processing that takes place in the human brain. This information is encoded using
pulses (or spikes) and the generated systems are much simpler (in computational operations and resources),
which allows them to perform similar tasks with much lower power consumption, thus these processes
can be developed over specialized hardware with real-time processing. In this work, a bio-inspired stereovision
system is presented, where a calibration mechanism for this system is implemented and evaluated
using several tests. The result is a novel calibration technique for a neuromorphic stereo vision system,
implemented over specialized hardware (FPGA - Field-Programmable Gate Array), which allows obtaining
reduced latencies on hardware implementation for stand-alone systems, and working in real time.Ministerio de Economía y Competitividad TEC2016-77785-PMinisterio de Economía y Competitividad TIN2016-80644-
Motion analysis report
Human motion analysis is the task of converting actual human movements into computer readable data. Such movement information may be obtained though active or passive sensing methods. Active methods include physical measuring devices such as goniometers on joints of the body, force plates, and manually operated sensors such as a Cybex dynamometer. Passive sensing de-couples the position measuring device from actual human contact. Passive sensors include Selspot scanning systems (since there is no mechanical connection between the subject's attached LEDs and the infrared sensing cameras), sonic (spark-based) three-dimensional digitizers, Polhemus six-dimensional tracking systems, and image processing systems based on multiple views and photogrammetric calculations
- …