49 research outputs found
StereoSpike: Depth Learning with a Spiking Neural Network
Depth estimation is an important computer vision task, useful in particular
for navigation in autonomous vehicles, or for object manipulation in robotics.
Here we solved it using an end-to-end neuromorphic approach, combining two
event-based cameras and a Spiking Neural Network (SNN) with a slightly modified
U-Net-like encoder-decoder architecture, that we named StereoSpike. More
specifically, we used the Multi Vehicle Stereo Event Camera Dataset (MVSEC). It
provides a depth ground-truth, which was used to train StereoSpike in a
supervised manner, using surrogate gradient descent. We propose a novel readout
paradigm to obtain a dense analog prediction -- the depth of each pixel -- from
the spikes of the decoder. We demonstrate that this architecture generalizes
very well, even better than its non-spiking counterparts, leading to
state-of-the-art test accuracy. To the best of our knowledge, it is the first
time that such a large-scale regression problem is solved by a fully spiking
network. Finally, we show that low firing rates (<10%) can be obtained via
regularization, with a minimal cost in accuracy. This means that StereoSpike
could be efficiently implemented on neuromorphic chips, opening the door for
low power and real time embedded systems
RoboDepth: Robust Out-of-Distribution Depth Estimation under Corruptions
Depth estimation from monocular images is pivotal for real-world visual
perception systems. While current learning-based depth estimation models train
and test on meticulously curated data, they often overlook out-of-distribution
(OoD) situations. Yet, in practical settings -- especially safety-critical ones
like autonomous driving -- common corruptions can arise. Addressing this
oversight, we introduce a comprehensive robustness test suite, RoboDepth,
encompassing 18 corruptions spanning three categories: i) weather and lighting
conditions; ii) sensor failures and movement; and iii) data processing
anomalies. We subsequently benchmark 42 depth estimation models across indoor
and outdoor scenes to assess their resilience to these corruptions. Our
findings underscore that, in the absence of a dedicated robustness evaluation
framework, many leading depth estimation models may be susceptible to typical
corruptions. We delve into design considerations for crafting more robust depth
estimation models, touching upon pre-training, augmentation, modality, model
capacity, and learning paradigms. We anticipate our benchmark will establish a
foundational platform for advancing robust OoD depth estimation.Comment: NeurIPS 2023; 45 pages, 25 figures, 13 tables; Code at
https://github.com/ldkong1205/RoboDept
tRNS boosts visual perceptual learning in participants with bilateral macular degeneration
Perceptual learning (PL) has shown promise in enhancing residual visual functions in patients with age-related macular degeneration (MD), however it requires prolonged training and evidence of generalization to untrained visual functions is limited. Recent studies suggest that combining transcranial random noise stimulation (tRNS) with perceptual learning produces faster and larger visual improvements in participants with normal vision. Thus, this approach might hold the key to improve PL effects in MD. To test this, we trained two groups of MD participants on a contrast detection task with (nâ=â5) or without (nâ=â7) concomitant occipital tRNS. The training consisted of a lateral masking paradigm in which the participant had to detect a central low contrast Gabor target. Transfer tasks, including contrast sensitivity, near and far visual acuity, and visual crowding, were measured at pre-, mid and post-tests. Combining tRNS and perceptual learning led to greater improvements in the trained task, evidenced by a larger increment in contrast sensitivity and reduced inhibition at the shortest target to flankersâ distance. The overall amount of transfer was similar between the two groups. These results suggest that coupling tRNS and perceptual learning has promising potential applications as a clinical rehabilitation strategy to improve vision in MD patients
Optical flow estimation from event-based cameras and spiking neural networks
Event-based cameras are raising interest within the computer vision community. These sensors operate with asynchronous pixels, emitting events, or âspikesâ, when the luminance change at a given pixel since the last event surpasses a certain threshold. Thanks to their inherent qualities, such as their low power consumption, low latency, and high dynamic range, they seem particularly tailored to applications with challenging temporal constraints and safety requirements. Event-based sensors are an excellent fit for Spiking Neural Networks (SNNs), since the coupling of an asynchronous sensor with neuromorphic hardware can yield real-time systems with minimal power requirements. In this work, we seek to develop one such system, using both event sensor data from the DSEC dataset and spiking neural networks to estimate optical flow for driving scenarios. We propose a U-Net-like SNN which, after supervised training, is able to make dense optical flow estimations. To do so, we encourage both minimal norm for the error vector and minimal angle between ground-truth and predicted flow, training our model with back-propagation using a surrogate gradient. In addition, the use of 3d convolutions allows us to capture the dynamic nature of the data by increasing the temporal receptive fields. Upsampling after each decoding stage ensures that each decoder's output contributes to the final estimation. Thanks to separable convolutions, we have been able to develop a light model (when compared to competitors) that can nonetheless yield reasonably accurate optical flow estimates
Sub-optimality of the early visual system explained through biologically plausible plasticity
The early visual cortex is the site of crucial pre-processing for more complex, biologically relevant computations that drive perception and, ultimately, behaviour. This pre-processing is often viewed as an optimisation which enables the most efficient representation of visual input. However, measurements in monkey and cat suggest that receptive fields in the primary visual cortex are often noisy, blobby, and symmetrical, making them sub-optimal for operations such as edge-detection. We propose that this suboptimality occurs because the receptive fields do not emerge through a global minimisation of the generative error, but through locally operating biological mechanisms such as spike-timing dependent plasticity. Using an orientation discrimination paradigm, we show that while sub-optimal, such models offer a much better description of biology at multiple levels: single-cell, population coding, and perception. Taken together, our results underline the need to carefully consider the distinction between information-theoretic and biological notions of optimality in early sensorial populations