798 research outputs found

    End-to-End Learning of Representations for Asynchronous Event-Based Data

    Full text link
    Event cameras are vision sensors that record asynchronous streams of per-pixel brightness changes, referred to as "events". They have appealing advantages over frame-based cameras for computer vision, including high temporal resolution, high dynamic range, and no motion blur. Due to the sparse, non-uniform spatiotemporal layout of the event signal, pattern recognition algorithms typically aggregate events into a grid-based representation and subsequently process it by a standard vision pipeline, e.g., Convolutional Neural Network (CNN). In this work, we introduce a general framework to convert event streams into grid-based representations through a sequence of differentiable operations. Our framework comes with two main advantages: (i) allows learning the input event representation together with the task dedicated network in an end to end manner, and (ii) lays out a taxonomy that unifies the majority of extant event representations in the literature and identifies novel ones. Empirically, we show that our approach to learning the event representation end-to-end yields an improvement of approximately 12% on optical flow estimation and object recognition over state-of-the-art methods.Comment: To appear at ICCV 201

    Event-based Vision: A Survey

    Get PDF
    Event cameras are bio-inspired sensors that differ from conventional frame cameras: Instead of capturing images at a fixed rate, they asynchronously measure per-pixel brightness changes, and output a stream of events that encode the time, location and sign of the brightness changes. Event cameras offer attractive properties compared to traditional cameras: high temporal resolution (in the order of microseconds), very high dynamic range (140 dB vs. 60 dB), low power consumption, and high pixel bandwidth (on the order of kHz) resulting in reduced motion blur. Hence, event cameras have a large potential for robotics and computer vision in challenging scenarios for traditional cameras, such as low-latency, high speed, and high dynamic range. However, novel methods are required to process the unconventional output of these sensors in order to unlock their potential. This paper provides a comprehensive overview of the emerging field of event-based vision, with a focus on the applications and the algorithms developed to unlock the outstanding properties of event cameras. We present event cameras from their working principle, the actual sensors that are available and the tasks that they have been used for, from low-level vision (feature detection and tracking, optic flow, etc.) to high-level vision (reconstruction, segmentation, recognition). We also discuss the techniques developed to process events, including learning-based techniques, as well as specialized processors for these novel sensors, such as spiking neural networks. Additionally, we highlight the challenges that remain to be tackled and the opportunities that lie ahead in the search for a more efficient, bio-inspired way for machines to perceive and interact with the world

    On Neuromechanical Approaches for the Study of Biological Grasp and Manipulation

    Full text link
    Biological and robotic grasp and manipulation are undeniably similar at the level of mechanical task performance. However, their underlying fundamental biological vs. engineering mechanisms are, by definition, dramatically different and can even be antithetical. Even our approach to each is diametrically opposite: inductive science for the study of biological systems vs. engineering synthesis for the design and construction of robotic systems. The past 20 years have seen several conceptual advances in both fields and the quest to unify them. Chief among them is the reluctant recognition that their underlying fundamental mechanisms may actually share limited common ground, while exhibiting many fundamental differences. This recognition is particularly liberating because it allows us to resolve and move beyond multiple paradoxes and contradictions that arose from the initial reasonable assumption of a large common ground. Here, we begin by introducing the perspective of neuromechanics, which emphasizes that real-world behavior emerges from the intimate interactions among the physical structure of the system, the mechanical requirements of a task, the feasible neural control actions to produce it, and the ability of the neuromuscular system to adapt through interactions with the environment. This allows us to articulate a succinct overview of a few salient conceptual paradoxes and contradictions regarding under-determined vs. over-determined mechanics, under- vs. over-actuated control, prescribed vs. emergent function, learning vs. implementation vs. adaptation, prescriptive vs. descriptive synergies, and optimal vs. habitual performance. We conclude by presenting open questions and suggesting directions for future research. We hope this frank assessment of the state-of-the-art will encourage and guide these communities to continue to interact and make progress in these important areas

    Multi-scale Evolutionary Neural Architecture Search for Deep Spiking Neural Networks

    Full text link
    Spiking Neural Networks (SNNs) have received considerable attention not only for their superiority in energy efficient with discrete signal processing, but also for their natural suitability to integrate multi-scale biological plasticity. However, most SNNs directly adopt the structure of the well-established DNN, rarely automatically design Neural Architecture Search (NAS) for SNNs. The neural motifs topology, modular regional structure and global cross-brain region connection of the human brain are the product of natural evolution and can serve as a perfect reference for designing brain-inspired SNN architecture. In this paper, we propose a Multi-Scale Evolutionary Neural Architecture Search (MSE-NAS) for SNN, simultaneously considering micro-, meso- and macro-scale brain topologies as the evolutionary search space. MSE-NAS evolves individual neuron operation, self-organized integration of multiple circuit motifs, and global connectivity across motifs through a brain-inspired indirect evaluation function, Representational Dissimilarity Matrices (RDMs). This training-free fitness function could greatly reduce computational consumption and NAS's time, and its task-independent property enables the searched SNNs to exhibit excellent transferbility and scalability. Extensive experiments demonstrate that the proposed algorithm achieves state-of-the-art (SOTA) performance with shorter simulation steps on static datasets (CIFAR10, CIFAR100) and neuromorphic datasets (CIFAR10-DVS and DVS128-Gesture). The thorough analysis also illustrates the significant performance improvement and consistent bio-interpretability deriving from the topological evolution at different scales and the RDMs fitness function

    Automotive Object Detection via Learning Sparse Events by Temporal Dynamics of Spiking Neurons

    Full text link
    Event-based sensors, with their high temporal resolution (1us) and dynamical range (120dB), have the potential to be deployed in high-speed platforms such as vehicles and drones. However, the highly sparse and fluctuating nature of events poses challenges for conventional object detection techniques based on Artificial Neural Networks (ANNs). In contrast, Spiking Neural Networks (SNNs) are well-suited for representing event-based data due to their inherent temporal dynamics. In particular, we demonstrate that the membrane potential dynamics can modulate network activity upon fluctuating events and strengthen features of sparse input. In addition, the spike-triggered adaptive threshold can stabilize training which further improves network performance. Based on this, we develop an efficient spiking feature pyramid network for event-based object detection. Our proposed SNN outperforms previous SNNs and sophisticated ANNs with attention mechanisms, achieving a mean average precision (map50) of 47.7% on the Gen1 benchmark dataset. This result significantly surpasses the previous best SNN by 9.7% and demonstrates the potential of SNNs for event-based vision. Our model has a concise architecture while maintaining high accuracy and much lower computation cost as a result of sparse computation. Our code will be publicly available
    • …
    corecore