8 research outputs found

    Instantaneous Stereo Depth Estimation of Real-World Stimuli with a Neuromorphic Stereo-Vision Setup

    Full text link
    The stereo-matching problem, i.e., matching corresponding features in two different views to reconstruct depth, is efficiently solved in biology. Yet, it remains the computational bottleneck for classical machine vision approaches. By exploiting the properties of event cameras, recently proposed Spiking Neural Network (SNN) architectures for stereo vision have the potential of simplifying the stereo-matching problem. Several solutions that combine event cameras with spike-based neuromorphic processors already exist. However, they are either simulated on digital hardware or tested on simplified stimuli. In this work, we use the Dynamic Vision Sensor 3D Human Pose Dataset (DHP19) to validate a brain-inspired event-based stereo-matching architecture implemented on a mixed-signal neuromorphic processor with real-world data. Our experiments show that this SNN architecture, composed of coincidence detectors and disparity sensitive neurons, is able to provide a coarse estimate of the input disparity instantaneously, thereby detecting the presence of a stimulus moving in depth in real-time

    Signal processing architectures for automotive high-resolution MIMO radar systems

    Get PDF
    To date, the digital signal processing for an automotive radar sensor has been handled in an efficient way by general purpose signal processors and microcontrollers. However, increasing resolution requirements for automated driving on the one hand, as well as rapidly growing numbers of manufactured sensors on the other hand, can provoke a paradigm change in the near future. The design and development of highly specialized hardware accelerators could become a viable option - at least for the most demanding processing steps with data rates of several gigabits per second. In this work, application-specific signal processing architectures for future high-resolution multiple-input and multiple-output (MIMO) radar sensors are designed, implemented, investigated and optimized. A focus is set on real-time performance such that even sophisticated algorithms can be computed sufficiently fast. The full processing chain from the received baseband signals to a list of detections is considered, comprising three major steps: Spectrum analysis, target detection and direction of arrival estimation. The developed architectures are further implemented on a field-programmable gate array (FPGA) and important measurements like resource consumption, power dissipation or data throughput are evaluated and compared with other examples from literature. A substantial dataset, based on more than 3600 different parametrizations and variants, has been established with the help of a model-based design space exploration and is provided as part of this work. Finally, an experimental radar sensor has been built and is used under real-world conditions to verify the effectiveness of the proposed signal processing architectures.Bisher wurde die digitale Signalverarbeitung für automobile Radarsensoren auf eine effiziente Art und Weise von universell verwendbaren Mikroprozessoren bewältigt. Jedoch können steigende Anforderungen an das Auflösungsvermögen für hochautomatisiertes Fahren einerseits, sowie schnell wachsende Stückzahlen produzierter Sensoren andererseits, einen Paradigmenwechsel in naher Zukunft bewirken. Die Entwicklung von hochgradig spezialisierten Hardwarebeschleunigern könnte sich als eine praktikable Alternative etablieren - zumindest für die anspruchsvollsten Rechenschritte mit Datenraten von mehreren Gigabits pro Sekunde. In dieser Arbeit werden anwendungsspezifische Signalverarbeitungsarchitekturen für zukünftige, hochauflösende, MIMO Radarsensoren entworfen, realisiert, untersucht und optimiert. Der Fokus liegt dabei stets auf der Echtzeitfähigkeit, sodass selbst anspruchsvolle Algorithmen in einer ausreichend kurzen Zeit berechnet werden können. Die komplette Signalverarbeitungskette, beginnend von den empfangenen Signalen im Basisband bis hin zu einer Liste von Detektion, wird in dieser Arbeit behandelt. Die Kette gliedert sich im Wesentlichen in drei größere Teilschritte: Spektralanalyse, Zieldetektion und Winkelschätzung. Des Weiteren werden die entwickelten Architekturen auf einem FPGA implementiert und wichtige Kennzahlen wie Ressourcenverbrauch, Stromverbrauch oder Datendurchsatz ausgewertet und mit anderen Beispielen aus der Literatur verglichen. Ein umfangreicher Datensatz, welcher mehr als 3600 verschiedene Parametrisierungen und Varianten beinhaltet, wurde mit Hilfe einer modellbasierten Entwurfsraumexploration erstellt und ist in dieser Arbeit enthalten. Schließlich wurde ein experimenteller Radarsensor aufgebaut und dazu benutzt, die entworfenen Signalverarbeitungsarchitekturen unter realen Umgebungsbedingungen zu verifizieren

    About the development of visual search algorithms and their hardware implementations

    Get PDF
    2015 - 2016The main goal of my work is to exploit the benefits of a hardware implementation of a 3D visual search pipeline. The term visual search refers to the task of searching objects in the environment starting from the real world representation. Object recognition today is mainly based on scene descriptors, an unique description for special spots in the data structure. This task has been implemented traditionally for years using just plain images: an image descriptor is a feature vector used to describe a position in the images. Matching descriptors present in different viewing of the same scene should allows the same spot to be found from different angles, therefore a good descriptor should be robust with respect to changes in: scene luminosity, camera affine transformations (rotation, scale and translation), camera noise and object affine transformations. Clearly, by using 2D images it is not possible to be robust with respect to the change in the projective space, e.g. if the object is rotated with respect to the up camera axes its 2D projection will dramatically change. For this reason, alongside 2D descriptors, many techniques have been proposed to solve the projective transformation problem using 3D descriptors that allow to map the shape of the objects and consequently the surface real appearance. This category of descriptors relies on 3D Point Cloud and Disparity Map to build a reliable feature vector which is invariant to the projective transformation. More sophisticated techniques are needed to obtain the 3D representation of the scene and, if necessary, the texture of the 3D model and obviously these techniques are also more computationally intensive than the simple image capture. The field of 3D model acquisition is very broad, it is possible to distinguish between two main categories: active and passive methods. In the active methods category we can find special devices able to obtain 3D information projecting special light and. Generally an infrared projector is coupled with a camera: while the infrared light projects a well known and fixed pattern, the camera will receive the information of the patterns reflection on a certain surface and the distortion in the pattern will give the precise depth of every point in the scene. These kind of sensors are of i i “output” — 2017/6/22 — 18:23 — page 3 — #3 i i i i i i 3 course expensive and not very efficient from the power consumption point of view, since a lot of power is wasted projecting light and the use of lasers also imposes eye safety rules on frame rate and transmissed power. Another way to obtain 3D models is to use passive stereo vision techniques, where two (or more) cameras are required which only acquire the scene appearance. Using the two (or more) images as input for a stereo matching algorithm it is possible to reconstruct the 3D world. Since more computational resources will be needed for this task, hardware acceleration can give an impressive performance boost over pure software approach. In this work I will explore the principal steps of a visual search pipeline composed by a 3D vision and a 3D description system. Both systems will take advantage of a parallelized architecture prototyped in RTL and implemented on an FPGA platform. This is a huge research field and in this work I will try to explain the reason for all the choices I made for my implementation, e.g. chosen algorithms, applied heuristics to accelerate the performance and selected device. In the first chapter we explain the Visual Search issues, showing the main components required by a Visual Search pipeline. Then I show the implemented architecture for a stereo vision system based on a Bio-informatics inspired approach, where the final system can process up to 30fps at 1024 × 768 pixels. After that a clever method for boosting the performance of 3D descriptor is presented and as last chapter the final architecture for the SHOT descriptor on FPGA will be presented. [edited by author]L’obiettivo principale di questo lavoro e’ quello di esplorare i benefici di una implementazione hardware per una pipeline di visual search 3D. Il termine visual search si riferisce al problema di ricerca di oggetti nell’ambiente. L’object recognition ai giorni nostri e’ principalmente basato sull’uso di descrittori della scena, una descrizione univoca per i punti salienti. Questo compito e’ stato implementato per anni utilizzando immagini: il descrittore di un punto dell’immagine e’ un semplice vettore di caratteristiche. Accoppiando i descrittori presenti in differenti viste della stessa scena permette di trovare punti nello spazio visibili da entrambe le viste. Chiaramente, utilizzando immagini 2D non e’ possibile avere descrittori che sono robusti a cambiamenti della prospettiva, per questo motivo, molte tecniche sono state proposte per risolvere questo problema utilizzando descrittori 3D. Questa categoria di descrittori si avvale di 3D point cloud e mappe di disparita’. Ovviamente tecniche piu’ sofisticate sono necessarie per ottenere la rappresentazione 3D della scena. Il campo dell’acquisizione 3D e’ molto vasto ed e’ possibile distinguere tra due categorie di sensori: sensori attivi e passivi. Tra i sensori attivi possiamo annoverare dispositivi in grado di proiettare un pattern di luce infrarossa sulla scena, questo pattern noto presenta delle variazioni dovute agli oggetti presenti nella scena. Una camera infrarossi riceve l’immagine distorta del pattern e deduce la geometria della scena. Questo tipo di dispositivi non sono molto efficienti dal punto di vista energetico dato che un sacco di corrente viene consumata per proiettare il pattern. Un altro modo per ottenere un modello 3D e’ quello di usare sensori passivi, una coppia di telecamere puo’ essere utilizzata per ottenere informazioni utilizzando metodi di triangolazione. Questi metodi pero’ richiedono un sacco di potenza computazionale nel caso di applicazioni real time, per questo motivo e’ necessario utilizzare dispositivi ad-hoc quali architetture hardware dedicate implementate mediante l’uso di FPGA e ASIC. In questo lavoro ho esplorato gli step principali di una pipeline per la visual search composta da un sistema di visione 3D e uno per la descrizione di punti. Entrambi i sistemi si avvalgono di achitetture hardware dedicate prototipate in RTL e implementate su FPGA. Questo e’ un grosso campo di lavoro e provo ad esplorare i benefici di una implementazione harwadere per l’accelerazione degli algoritmi stessi e il risparmi di energia elettrica. [a cura dell'autore]XV n.s

    A cyclopean perspective on mouse visual cortex

    Get PDF

    Human factors in the perception of stereoscopic images

    Get PDF
    Research into stereoscopic displays is largely divided into how stereo 3D content looks, a field concerned with distortion, and how such content feels to the viewer, that is, comfort. However, seldom are these measures presented simultaneously. Both comfortable displays with unacceptable 3D and uncomfortable displays with great 3D are undesirable. These two scenarios can render conclusions based on research into these measures both moot and impractical. Furthermore, there is a consensus that more disparity correlates directly with greater viewer discomfort. These experiments, and the dissertation thereof, challenge this notion and argue for a more nuanced argument related to acquisition factors such as interaxial distance (IA) and post processing in the form of horizontal image translation (HIT). Indeed, this research seeks to measure tolerance limits for viewing comfort and perceptual distortions across different camera separations. In the experiments, HIT and IA were altered together. Following Banks et al. (2009), our stimuli were simple stereoscopic hinges, and we measured the perceived angle as a function of camera separation. We compared the predictions based on a ray-tracing model with the perceived 3D shape obtained psychophysically. Participants were asked to judge the angles of 250 hinges at different camera separations (IA and HIT remained linked across a 20 to 100mm range, but the angles ranged between 50° and 130°). In turn, comfort data was obtained using a five-point Likert scale for each trial. Stimuli were presented in orthoscopic conditions with screen and observer field of view (FOV) matched at 45°. The 3D hinge and experimental parameters were run across three distinct series of experiments. The first series involved replicating a typical laboratory scenario where screen position was unchanged (Experiment I), the other presenting scenarios representative of real-world applications for a single viewer (Experiments II, III, and IV), and the last presenting real-world applications for multiple viewers (Experiment V). While the laboratory scenario revealed greatest viewer comfort occurred when a virtual hinge was placed on the screen plane, the single-viewer experiment revealed into-the-screen stereo stimuli was judged flatter while out-of-screen content was perceived more veridically. The multi-viewer scenario revealed a marked decline in comfort for off-axis viewing, but no commensurate effect on distortion; importantly, hinge angles were judged as being the same regardless of off-axis viewing for angles of up to 45. More specifically, the main results are as follows. 1) Increased viewing distance enhances viewer comfort for stereoscopic perception. 2) The amount of disparity present was not correlated with comfort. Comfort is not correlated with angular distortion. 3) Distortion is affected by hinge placement on-screen. There is only a significant effect on comfort when the Camera Separation is at 60mm. 4) A perceptual bias between into the depth orientation of the screen stimuli, in to the screen stimuli were judged as flatter than out of the screen stimuli. 5) Perceived distortion not being affected by oblique viewing. Oblique viewing does not affect perceived comfort. In conclusion, the laboratory experiment highlights the limitations of extrapolating a controlled empirical stimulus into a less controlled “real world” environment. The typical usage scenarios consistently reveal no correlation between the amount of screen disparity (parallax) in the stimulus and the comfort rating. The final usage scenario reveals a perceptual constancy in off-axis viewer conditions for angles of up to 45, which, as reported, is not reflected by a typical ray-tracing model. Stereoscopic presentation with non-orthoscopic HIT may give comfortable 3D. However, there is good reason to believe that this 3D is not being perceived veridically. Comfortable 3D is often incorrectly converged due to the differences between distances specified by disparity and monocular cues. This conflict between monocular and stereo cues in the presentation of S3D content leads to loss of veridicality i.e. a perception of flatness. Therefore, correct HIT is recommended as the starting point for creating realistic and comfortable 3D, and this factor is shown by data to be far more important than limiting screen disparity (i.e. parallax). Based on these findings, this study proposes a predictive model of stereoscopic space for 3D content generators who require flexibility in acquisition parameters. This is important as there is no data for viewing conditions where the acquisition parameters are changed

    Neural architectures for stereo vision

    No full text
    Stereoscopic vision delivers a sense of depth based on binocular information but additionally acts as a mechanism for achieving correspondence between patterns arriving at the left and right eyes. We analyse quantitatively the cortical architecture for stereo scopic vision in two areas of macaque visual cortex. For primary visual cortex V1, the result is consistent with a module that is isotropic in cortical space with a diameter of at least 3mm in surface extent. This implies that the module for stereo is larger than repeat distance between ocular dominance columns in V1. By contrast, in the extrastriate cortical area V5/MT, which has a specialized architecture for stereo depth, the module for representation of stereo is about 1mm in surface extent, so the representation of stereo in V5/MT is more compressed than V1 in terms of neural wiring of the neocortex. The surface extent estimated for stereo in V5/MT is consistent with measurements of its specialized domains for binocular disparity. Within V1, we suggest that long–range horizontal, anatomical connections form functional modules that serve both binocular and monocular pattern recognition : this common function may explain the distortion and disruption of monocular pattern vision observed in amblyopia.</p

    A Spike-Based Neuromorphic Architecture of Stereo Vision

    Get PDF
    The problem of finding stereo correspondences in binocular vision is solved effortlessly in nature and yet it is still a critical bottleneck for artificial machine vision systems. As temporal information is a crucial feature in this process, the advent of event-based vision sensors and dedicated event-based processors promises to offer an effective approach to solving the stereo matching problem. Indeed, event-based neuromorphic hardware provides an optimal substrate for fast, asynchronous computation, that can make explicit use of precise temporal coincidences. However, although several biologically-inspired solutions have already been proposed, the performance benefits of combining event-based sensing with asynchronous and parallel computation are yet to be explored. Here we present a hardware spike-based stereo-vision system that leverages the advantages of brain-inspired neuromorphic computing by interfacing two event-based vision sensors to an event-based mixed-signal analog/digital neuromorphic processor. We describe a prototype interface designed to enable the emulation of a stereo-vision system on neuromorphic hardware and we quantify the stereo matching performance with two datasets. Our results provide a path toward the realization of low-latency, end-to-end event-based, neuromorphic architectures for stereo vision