35 research outputs found
Stereo Matching in Address-Event-Representation (AER) Bio-Inspired Binocular Systems in a Field-Programmable Gate Array (FPGA)
In stereo-vision processing, the image-matching step is essential for results, although it
involves a very high computational cost. Moreover, the more information is processed, the more time
is spent by the matching algorithm, and the more ine cient it is. Spike-based processing is a relatively
new approach that implements processing methods by manipulating spikes one by one at the time
they are transmitted, like a human brain. The mammal nervous system can solve much more complex
problems, such as visual recognition by manipulating neuron spikes. The spike-based philosophy
for visual information processing based on the neuro-inspired address-event-representation (AER)
is currently achieving very high performance. The aim of this work was to study the viability of a
matching mechanism in stereo-vision systems, using AER codification and its implementation in
a field-programmable gate array (FPGA). Some studies have been done before in an AER system
with monitored data using a computer; however, this kind of mechanism has not been implemented
directly on hardware. To this end, an epipolar geometry basis applied to AER systems was studied
and implemented, with other restrictions, in order to achieve good results in a real-time scenario.
The results and conclusions are shown, and the viability of its implementation is proven.Ministerio de Economía y Competitividad TEC2016-77785-
Event-based Vision: A Survey
Event cameras are bio-inspired sensors that differ from conventional frame
cameras: Instead of capturing images at a fixed rate, they asynchronously
measure per-pixel brightness changes, and output a stream of events that encode
the time, location and sign of the brightness changes. Event cameras offer
attractive properties compared to traditional cameras: high temporal resolution
(in the order of microseconds), very high dynamic range (140 dB vs. 60 dB), low
power consumption, and high pixel bandwidth (on the order of kHz) resulting in
reduced motion blur. Hence, event cameras have a large potential for robotics
and computer vision in challenging scenarios for traditional cameras, such as
low-latency, high speed, and high dynamic range. However, novel methods are
required to process the unconventional output of these sensors in order to
unlock their potential. This paper provides a comprehensive overview of the
emerging field of event-based vision, with a focus on the applications and the
algorithms developed to unlock the outstanding properties of event cameras. We
present event cameras from their working principle, the actual sensors that are
available and the tasks that they have been used for, from low-level vision
(feature detection and tracking, optic flow, etc.) to high-level vision
(reconstruction, segmentation, recognition). We also discuss the techniques
developed to process events, including learning-based techniques, as well as
specialized processors for these novel sensors, such as spiking neural
networks. Additionally, we highlight the challenges that remain to be tackled
and the opportunities that lie ahead in the search for a more efficient,
bio-inspired way for machines to perceive and interact with the world
Low Latency Rendering with Dataflow Architectures
The research presented in this thesis concerns latency in VR and synthetic environments. Latency is the end-to-end delay experienced by the user of an interactive computer system, between their physical actions and the perceived response to these actions. Latency is a product of the various processing, transport and buffering delays present in any current computer system. For many computer mediated applications, latency can be distracting, but it is not critical to the utility of the application. Synthetic environments on the other hand attempt to facilitate direct interaction with a digitised world. Direct interaction here implies the formation of a sensorimotor loop between the user and the digitised world - that is, the user makes predictions about how their actions affect the world, and see these predictions realised. By facilitating the formation of the this loop, the synthetic environment allows users to directly sense the digitised world, rather than the interface, and induce perceptions, such as that of the digital world existing as a distinct physical place. This has many applications for knowledge transfer and efficient interaction through the use of enhanced communication cues. The complication is, the formation of the sensorimotor loop that underpins this is highly dependent on the fidelity of the virtual stimuli, including latency. The main research questions we ask are how can the characteristics of dataflow computing be leveraged to improve the temporal fidelity of the visual stimuli, and what implications does this have on other aspects of the fidelity. Secondarily, we ask what effects latency itself has on user interaction. We test the effects of latency on physical interaction at levels previously hypothesized but unexplored. We also test for a previously unconsidered effect of latency on higher level cognitive functions. To do this, we create prototype image generators for interactive systems and virtual reality, using dataflow computing platforms. We integrate these into real interactive systems to gain practical experience of how the real perceptible benefits of alternative rendering approaches, but also what implications are when they are subject to the constraints of real systems. We quantify the differences of our systems compared with traditional systems using latency and objective image fidelity measures. We use our novel systems to perform user studies into the effects of latency. Our high performance apparatuses allow experimentation at latencies lower than previously tested in comparable studies. The low latency apparatuses are designed to minimise what is currently the largest delay in traditional rendering pipelines and we find that the approach is successful in this respect. Our 3D low latency apparatus achieves lower latencies and higher fidelities than traditional systems. The conditions under which it can do this are highly constrained however. We do not foresee dataflow computing shouldering the bulk of the rendering workload in the future but rather facilitating the augmentation of the traditional pipeline with a very high speed local loop. This may be an image distortion stage or otherwise. Our latency experiments revealed that many predictions about the effects of low latency should be re-evaluated and experimenting in this range requires great care
EDFLOW: Event Driven Optical Flow Camera with Keypoint Detection and Adaptive Block Matching
Event cameras such as the Dynamic Vision Sensor (DVS) are useful because of their low latency, sparse output, and high dynamic range. In this paper, we propose a DVS+FPGA camera platform and use it to demonstrate the hardware implementation of event-based corner keypoint detection and adaptive block-matching optical flow. To adapt sample rate dynamically, events are accumulated in event slices using the area event count slice exposure method. The area event count is feedback controlled by the average optical flow matching distance. Corners are detected by streaks of accumulated events on event slice rings of radius 3 and 4 pixels. Corner detection takes about 6 clock cycles (16 MHz event rate at the 100MHz clock frequency) At the corners, flow vectors are computed in 100 clock cycles (1 MHz event rate). The multiscale block match size is 25x25 pixels and the flow vectors span up to 30-pixel match distance. The FPGA processes the sum-of-absolute distance block matching at 123 GOp/s, the equivalent of 1230 Op/clock cycle. EDFLOW is several times more accurate on MVSEC drone and driving optical flow benchmarking sequences than the previous best DVS FPGA optical flow implementation, and achieves similar accuracy to the CNN-based EV-Flownet, although it burns about 100 times less power. The EDFLOW design and benchmarking videos are available at https://sites.google.com/view/edflow21/home
Real-time Visual Flow Algorithms for Robotic Applications
Vision offers important sensor cues to modern robotic platforms.
Applications such as control of aerial vehicles, visual servoing,
simultaneous localization and mapping, navigation and more
recently, learning, are examples where visual information is
fundamental to accomplish tasks. However, the use of computer
vision algorithms carries the computational cost of extracting
useful information from the stream of raw pixel data. The most
sophisticated algorithms use complex mathematical formulations
leading typically to computationally expensive, and consequently,
slow implementations. Even with modern computing resources,
high-speed and high-resolution video feed can only be used for
basic image processing operations. For a vision algorithm to be
integrated on a robotic system, the output of the algorithm
should be provided in real time, that is, at least at the same
frequency as the control logic of the robot. With robotic
vehicles becoming more dynamic and ubiquitous, this places higher
requirements to the vision processing pipeline.
This thesis addresses the problem of estimating dense visual flow
information in real time. The contributions of this work are
threefold. First, it introduces a new filtering algorithm for the
estimation of dense optical flow at frame rates as fast as 800 Hz
for 640x480 image resolution. The algorithm follows a
update-prediction architecture to estimate dense optical flow
fields incrementally over time. A fundamental component of the
algorithm is the modeling of the spatio-temporal evolution of the
optical flow field by means of partial differential equations.
Numerical predictors can implement such PDEs to propagate current
estimation of flow forward in time. Experimental validation of
the algorithm is provided using high-speed ground truth image
dataset as well as real-life video data at 300 Hz.
The second contribution is a new type of visual flow named
structure flow. Mathematically, structure flow is the
three-dimensional scene flow scaled by the inverse depth at each
pixel in the image. Intuitively, it is the complete velocity
field associated with image motion, including both optical flow
and scale-change or apparent divergence of the image. Analogously
to optic flow, structure flow provides a robotic vehicle with
perception of the motion of the environment as seen by the
camera. However, structure flow encodes the full 3D image motion
of the scene whereas optic flow only encodes the component on the
image plane. An algorithm to estimate structure flow from image
and depth measurements is proposed based on the same filtering
idea used to estimate optical flow.
The final contribution is the spherepix data structure for
processing spherical images. This data structure is the numerical
back-end used for the real-time implementation of the structure
flow filter. It consists of a set of overlapping patches covering
the surface of the sphere. Each individual patch approximately
holds properties such as orthogonality and equidistance of
points, thus allowing efficient implementations of low-level
classical 2D convolution based image processing routines such as
Gaussian filters and numerical derivatives.
These algorithms are implemented on GPU hardware and can be
integrated to future Robotic Embedded Vision systems to provide
fast visual information to robotic vehicles
Recommended from our members
A Positional Timewarp Accelerator for Mobile Virtual Reality Devices
Mobile virtual reality devices are becoming more common, and yet their performance is still too low to be considered ideal. Frame rate and latency are two of the most important areas that should improve in order to provide a high-quality virtual reality experience. Meanwhile, positional tracking is improving the immersive experience of new mobile virtual reality devices by allowing users to physically move about in space and see their corresponding view matched in the virtual world. Timewarping is a technique that can improve the perceived latency and frame rate of virtual reality systems, but the positional variant of timewarping has proven to be difficult to implement on mobile devices due to the performance demands. A depth-informed positional time warp cannot be fully parallelized due to the depth test required for each pixel or group of pixels.This thesis proposes a positional timewarp hardware accelerator for mobile devices. The accelerator accepts a rendered frame and depth image and produces an updated frame corresponding to the user’s head position and orientation. The accelerator is compatible with existing deferred rendering engines for minimal modification of the software structure. Its execution time is directly proportional to the image resolution and is agnostic of the scene complexity. The accelerator’s size can be adjusted to meet the latency requirement for a given image resolution. It can be integrated into a system-on-chip or fabricated as a separate chip.Three examples are designed and simulated to show the performance potential of this accelerator architecture. The designs provide latencies of 15.43 ms, 11.58 ms and 9.27 ms for frame rates of 64.7, 86.4 and 107.9 frames per second, respectively. Although the visual side-effects may be insufficiently few to completely disregard the GPU’s frame rate, the accelerator can still improve the end-to-end positional latency and is also capable of substituting the GPU in the case of dropped frames
Event-Driven Technologies for Reactive Motion Planning: Neuromorphic Stereo Vision and Robot Path Planning and Their Application on Parallel Hardware
Die Robotik wird immer mehr zu einem Schlüsselfaktor des technischen Aufschwungs. Trotz beeindruckender Fortschritte in den letzten Jahrzehnten, übertreffen Gehirne von Säugetieren in den Bereichen Sehen und Bewegungsplanung
noch immer selbst die leistungsfähigsten Maschinen. Industrieroboter sind sehr schnell und präzise, aber ihre Planungsalgorithmen sind in hochdynamischen Umgebungen, wie sie für die Mensch-Roboter-Kollaboration (MRK) erforderlich sind, nicht leistungsfähig genug. Ohne schnelle und adaptive Bewegungsplanung kann sichere MRK nicht garantiert werden. Neuromorphe Technologien, einschließlich visueller Sensoren und Hardware-Chips, arbeiten asynchron und verarbeiten so raum-zeitliche Informationen sehr effizient. Insbesondere ereignisbasierte visuelle Sensoren sind konventionellen, synchronen Kameras bei vielen Anwendungen bereits überlegen. Daher haben ereignisbasierte Methoden
ein großes Potenzial, schnellere und energieeffizientere Algorithmen zur Bewegungssteuerung in der MRK zu ermöglichen. In dieser Arbeit wird ein Ansatz zur flexiblen reaktiven Bewegungssteuerung eines Roboterarms vorgestellt. Dabei
wird die Exterozeption durch ereignisbasiertes Stereosehen erreicht und die Pfadplanung ist in einer neuronalen Repräsentation des Konfigurationsraums implementiert. Die Multiview-3D-Rekonstruktion wird durch eine qualitative Analyse in Simulation evaluiert und auf ein Stereo-System ereignisbasierter Kameras übertragen. Zur Evaluierung der reaktiven kollisionsfreien Online-Planung wird ein Demonstrator mit einem industriellen Roboter genutzt. Dieser wird auch für eine vergleichende Studie zu sample-basierten Planern verwendet. Ergänzt wird
dies durch einen Benchmark von parallelen Hardwarelösungen wozu als Testszenario Bahnplanung in der Robotik gewählt wurde. Die Ergebnisse zeigen, dass die vorgeschlagenen neuronalen Lösungen einen effektiven Weg zur Realisierung einer Robotersteuerung für dynamische Szenarien darstellen. Diese Arbeit schafft eine Grundlage für neuronale Lösungen bei adaptiven Fertigungsprozesse, auch in Zusammenarbeit mit dem Menschen, ohne Einbußen bei Geschwindigkeit und Sicherheit. Damit ebnet sie den Weg für die Integration von dem Gehirn nachempfundener Hardware und Algorithmen in die Industrierobotik und MRK
Biological Vision Inspired Systems in Biomedical Applications
This Master of Philosophy thesis presents two potential biomedical applications of an event-based camera, also known as a neuromorphic vision system (camera) or silicon retina vision sensor. Event-based cameras have drawn significant interest due to their advantages over traditional cameras, including low latency, high data throughput, high dynamic range, and low power consumption. Hence endless research is actively seeking for potential applications of event-based cameras.
Flow cytometry, a highly effective technology renowned for its rapid analysis of cells or particles suspended in a solution, has been extensively utilized across diverse disciplines. These include immunology, virology, molecular biology, cancer biology, and infectious disease monitoring. Conventional imaging flow cytometers generally suffer from motion blur, low dynamic range, and trade-offs between the frame rate (speed) and image resolution. In this thesis, we conducted a feasibility study with algorithmic results to propose an event-based high-throughput flow cytometer.
Navigation devices that demonstrate the capability of guiding blind or vision-impaired people have always remained a challenge over the past decade. The reasons for this could be because of the limited data throughput, undesirable user feedback, and the requirement for power consumption. Hence we here propose a proof-of-concept blind navigation system with an event-based camera
Multi-Frame Rate Rendering
Multi-frame rate rendering is a parallel rendering technique that renders interactive parts of a scene on one graphics card while the rest of the scene is rendered asynchronously on a second graphics card. The resulting color and depth images of both render processes are composited, by optical superposition or digital composition, and displayed. The results of a user study confirm that multi-frame rate rendering can significantly improve the interaction performance. Multi-frame rate rendering is naturally implemented on a graphics cluster. With the recent availability of multiple graphics cards in standalone systems the method can also be implemented on a single computer system where memory bandwidth is much higher compared to off-the-shelf networking technology. This decreases overall latency and further improves interactivity. Multi-frame rate rendering was also investigated on a single graphics processor by interleaving the rendering streams for the interactive elements and the rest of the scene. This approach enables the use of multi-frame rate rendering on low-end graphics systems such as laptops, mobile phones, and PDAs. Advanced multi-frame rate rendering techniques reduce the limitations of the basic approach. The interactive manipulation of light sources and their parameters affects the entire scene. A multi-GPU deferred shading method is presented that splits the rendering task into a rasterization and lighting pass and assigns the passes to the appropriate image generators such that light manipulations at high frame rates become possible. A parallel volume rendering technique allows the manipulation of objects inside a translucent volume at high frame rates. This approach is useful for example in medical applications, where small probes need to be positioned inside a computed-tomography image. Due to the asynchronous nature of multi-frame rate rendering artifacts may occur during migration of objects from the slow to the fast graphics card, and vice versa. Proper state management allows to almost completely avoid these artifacts. Multi-frame rate rendering significantly improves the interactive manipulation of objects and lighting effects. This leads to a considerable increase of the size for 3D scenes that can be manipulated compared to conventional methods.Multi-Frame Rate Rendering ist eine parallele Rendertechnik, die interaktive Teile einer Szene auf einer separaten Graphikkarte berechnet. Die Abbildung des Rests der Szene erfolgt asynchron auf einer anderen Graphikkarte. Die resultierenden Farb- und Tiefenbilder beider Darstellungsprozesse werden mittels optischer Überlagerung oder digitaler Komposition kombiniert und angezeigt. Die Ergebnisse einer Nutzerstudie zeigen, daß Multi-Frame Rate Rendering die Interaktion für große Szenen deutlich beschleunigt. Multi-Frame Rate Rendering ist üblicherweise auf einem Graphikcluster zu implementieren. Mit der Verfügbarkeit mehrerer Graphikkarten für Einzelsysteme kann Multi-Frame Rate Rendering auch für diese realisiert werden. Dies ist von Vorteil, da die Speicherbandbreite um ein Vielfaches höher ist als mit üblichen Netzwerktechnologien. Dadurch verringern sich Latenzen, was zu verbesserter Interaktivität führt. Multi-Frame Rate Rendering wurde auch auf Systemen mit einer Graphikkarte untersucht. Die Bildberechnung für den Rest der Szene muss dazu in kleine Portionen aufgeteilt werden. Die Darstellung erfolgt dann alternierend zu den interaktiven Elementen über mehrere Bilder verteilt. Dieser Ansatz erlaubt die Benutzung von Multi-Frame Rate Rendering auf einfachen Graphiksystemen wie Laptops, Mobiltelefonen and PDAs. Fortgeschrittene Multi-Frame Rate Rendering Techniken erweitern die Anwendbarkeit des Ansatzes erheblich. Die interaktive Manipulation von Lichtquellen beeinflußt die ganze Szene. Um diese Art der Interaktion zu unterstützen, wurde eine Multi-GPU Deferred Shading Methode entwickelt. Der Darstellungsvorgang wird dazu in einen Rasterisierungs- und Beleuchtungsschritt zerlegt, die parallel auf den entsprechenden Grafikkarten erfolgen können. Dadurch kann die Beleuchtung mit hohen Bildwiederholraten unabhängig von der geometrischen Komplexität der Szene erfolgen. Außerdem wurde eine parallele Darstellungstechnik für die interaktive Manipulation von Objekten in hochaufgelösten Volumendaten entwickelt. Dadurch lassen sich zum Beispiel virtuelle Instrumente in hochqualitativ dargestellten Computertomographieaufnahmen interaktiv positionieren. Aufgrund der inhärenten Asynchronität der beiden Darstellungsprozesse des Multi-Frame Rate Rendering Ansatzes können Artifakte während der Objektmigration zwischen den Graphikkarten auftreten. Eine intelligente Zustandsverwaltung in Kombination mit Prediktionstechniken kann diese Artifakte fast gänzlich verhindern, so dass Benutzer diese im allgemeinen nicht bemerken. Multi-Frame Rate Rendering beschleunigt die interaktive Manipulation von Objekten und Beleuchtungseffekten deutlich. Dadurch können deutlich umfangreichere virtuelle Szenarien bearbeitet werden als mit konventionellen Methoden