421 research outputs found
Steep, Spatially Graded Recruitment of Feedback Inhibition by Sparse Dentate Granule Cell Activity
The dentate gyrus of the hippocampus is thought to subserve important physiological functions, such as 'pattern separation'. In chronic temporal lobe epilepsy, the dentate gyrus constitutes a strong inhibitory gate for the propagation of seizure activity into the hippocampus proper. Both examples are thought to depend critically on a steep recruitment of feedback inhibition by active dentate granule cells. Here, I used two complementary experimental approaches to quantitatively investigate the recruitment of feedback inhibition in the dentate gyrus. I showed that the activity of approximately 4% of granule cells suffices to recruit maximal feedback inhibition within the local circuit. Furthermore, the inhibition elicited by a local population of granule cells is distributed non-uniformly over the extent of the granule cell layer. Locally and remotely activated inhibition differ in several key aspects, namely their amplitude, recruitment, latency and kinetic properties. Finally, I show that net feedback inhibition facilitates during repetitive stimulation. Taken together, these data provide the first quantitative functional description of a canonical feedback inhibitory microcircuit motif. They establish that sparse granule cell activity, within the range observed in-vivo, steeply recruits spatially and temporally graded feedback inhibition
Surface analysis and visualization from multi-light image collections
Multi-Light Image Collections (MLICs) are stacks of photos of a scene acquired with a fixed viewpoint and a varying surface illumination that provides large amounts of visual and geometric information. Over the last decades, a wide variety of methods have been devised to extract information from MLICs and have shown its use in different application domains to support daily activities. In this thesis, we present methods that leverage a MLICs for surface analysis and visualization. First, we provide background information: acquisition setup, light calibration and application areas where MLICs have been successfully used for the research of daily analysis work. Following, we discuss the use of MLIC for surface visualization and analysis and available tools used to support the analysis. Here, we discuss methods that strive to support the direct exploration of the captured MLIC, methods that generate relightable models from MLIC, non-photorealistic visualization methods that rely on MLIC, methods that estimate normal map from MLIC and we point out visualization tools used to do MLIC analysis. In chapter 3 we propose novel benchmark datasets (RealRTI, SynthRTI and SynthPS) that can be used to evaluate algorithms that rely on MLIC and discusses available benchmark for validation of photometric algorithms that can be also used to validate other MLIC-based algorithms. In chapter 4, we evaluate the performance of different photometric stereo algorithms using SynthPS for cultural heritage applications. RealRTI and SynthRTI have been used to evaluate the performance of (Neural)RTI method. Then, in chapter 5, we present a neural network-based RTI method, aka NeuralRTI, a framework for pixel-based encoding and relighting of RTI data. In this method using a simple autoencoder architecture, we show that it is possible to obtain a highly compressed representation that better preserves the original information and provides increased quality of virtual images relighted from novel directions, particularly in the case of challenging glossy materials. Finally, in chapter 6, we present a method for the detection of crack on the surface of paintings from multi-light image acquisitions and that can be used as well on single images and conclude our presentation
Graph-based Spatial-temporal Feature Learning for Neuromorphic Vision Sensing
Neuromorphic vision sensing (NVS)\ devices represent visual information as
sequences of asynchronous discrete events (a.k.a., "spikes") in response to
changes in scene reflectance. Unlike conventional active pixel sensing (APS),
NVS allows for significantly higher event sampling rates at substantially
increased energy efficiency and robustness to illumination changes. However,
feature representation for NVS is far behind its APS-based counterparts,
resulting in lower performance in high-level computer vision tasks. To fully
utilize its sparse and asynchronous nature, we propose a compact graph
representation for NVS, which allows for end-to-end learning with graph
convolution neural networks. We couple this with a novel end-to-end feature
learning framework that accommodates both appearance-based and motion-based
tasks. The core of our framework comprises a spatial feature learning module,
which utilizes residual-graph convolutional neural networks (RG-CNN), for
end-to-end learning of appearance-based features directly from graphs. We
extend this with our proposed Graph2Grid block and temporal feature learning
module for efficiently modelling temporal dependencies over multiple graphs and
a long temporal extent. We show how our framework can be configured for object
classification, action recognition and action similarity labeling. Importantly,
our approach preserves the spatial and temporal coherence of spike events,
while requiring less computation and memory. The experimental validation shows
that our proposed framework outperforms all recent methods on standard
datasets. Finally, to address the absence of large real-world NVS datasets for
complex recognition tasks, we introduce, evaluate and make available the
American Sign Language letters (ASL-DVS), as well as human action dataset
(UCF101-DVS, HMDB51-DVS and ASLAN-DVS).Comment: 16 pages, 5 figures. This work is a journal extension of our ICCV'19
paper arXiv:1908.0664
Self-Organization of Spiking Neural Networks for Visual Object Recognition
On one hand, the visual system has the ability to differentiate between very similar
objects. On the other hand, we can also recognize the same object in images that vary
drastically, due to different viewing angle, distance, or illumination. The ability to
recognize the same object under different viewing conditions is called invariant object
recognition. Such object recognition capabilities are not immediately available after
birth, but are acquired through learning by experience in the visual world.
In many viewing situations different views of the same object are seen in a tem-
poral sequence, e.g. when we are moving an object in our hands while watching it.
This creates temporal correlations between successive retinal projections that can be
used to associate different views of the same object. Theorists have therefore pro-
posed a synaptic plasticity rule with a built-in memory trace (trace rule).
In this dissertation I present spiking neural network models that offer possible
explanations for learning of invariant object representations. These models are based
on the following hypotheses:
1. Instead of a synaptic trace rule, persistent firing of recurrently connected groups
of neurons can serve as a memory trace for invariance learning.
2. Short-range excitatory lateral connections enable learning of self-organizing
topographic maps that represent temporal as well as spatial correlations.
3. When trained with sequences of object views, such a network can learn repre-
sentations that enable invariant object recognition by clustering different views
of the same object within a local neighborhood.
4. Learning of representations for very similar stimuli can be enabled by adaptive
inhibitory feedback connections.
The study presented in chapter 3.1 details an implementation of a spiking neural
network to test the first three hypotheses. This network was tested with stimulus
sets that were designed in two feature dimensions to separate the impact of tempo-
ral and spatial correlations on learned topographic maps. The emerging topographic
maps showed patterns that were dependent on the temporal order of object views
during training. Our results show that pooling over local neighborhoods of the to-
pographic map enables invariant recognition.
Chapter 3.2 focuses on the fourth hypothesis. There we examine how the adaptive
feedback inhibition (AFI) can improve the ability of a network to discriminate between
very similar patterns. The results show that with AFI learning is faster, and the
network learns selective representations for stimuli with higher levels of overlap
than without AFI.
Results of chapter 3.1 suggest a functional role for topographic object representa-
tions that are known to exist in the inferotemporal cortex, and suggests a mechanism
for the development of such representations. The AFI model implements one aspect
of predictive coding: subtraction of a prediction from the actual input of a system. The
successful implementation in a biologically plausible network of spiking neurons
shows that predictive coding can play a role in cortical circuits
Object detection and recognition with event driven cameras
This thesis presents study, analysis and implementation of algorithms
to perform object detection and recognition using an event-based cam
era. This sensor represents a novel paradigm which opens a wide range
of possibilities for future developments of computer vision. In partic
ular it allows to produce a fast, compressed, illumination invariant
output, which can be exploited for robotic tasks, where fast dynamics
and signi\ufb01cant illumination changes are frequent. The experiments
are carried out on the neuromorphic version of the iCub humanoid
platform. The robot is equipped with a novel dual camera setup
mounted directly in the robot\u2019s eyes, used to generate data with a
moving camera. The motion causes the presence of background clut
ter in the event stream.
In such scenario the detection problem has been addressed with an at
tention mechanism, speci\ufb01cally designed to respond to the presence of
objects, while discarding clutter. The proposed implementation takes
advantage of the nature of the data to simplify the original proto
object saliency model which inspired this work.
Successively, the recognition task was \ufb01rst tackled with a feasibility
study to demonstrate that the event stream carries su\ufb03cient informa
tion to classify objects and then with the implementation of a spiking
neural network. The feasibility study provides the proof-of-concept
that events are informative enough in the context of object classi\ufb01
cation, whereas the spiking implementation improves the results by
employing an architecture speci\ufb01cally designed to process event data.
The spiking network was trained with a three-factor local learning rule
which overcomes weight transport, update locking and non-locality
problem.
The presented results prove that both detection and classi\ufb01cation can
be carried-out in the target application using the event data
The computational magic of the ventral stream: sketch of a theory (and why some deep architectures work).
This paper explores the theoretical consequences of a simple assumption: the computational goal of the feedforward path in the ventral stream -- from V1, V2, V4 and to IT -- is to discount image transformations, after learning them during development
From Pixels to Spikes: Efficient Multimodal Learning in the Presence of Domain Shift
Computer vision aims to provide computers with a conceptual understanding of images or video by learning a high-level representation. This representation is typically derived from the pixel domain (i.e., RGB channels) for tasks such as image classification or action recognition. In this thesis, we explore how RGB inputs can either be pre-processed or supplemented with other compressed visual modalities, in order to improve the accuracy-complexity tradeoff for various computer vision tasks. Beginning with RGB-domain data only, we propose a multi-level, Voronoi based spatial partitioning of images, which are individually processed by a convolutional neural network (CNN), to improve the scale invariance of the embedding. We combine this with a novel and efficient approach for optimal bit allocation within the quantized cell representations. We evaluate this proposal on the content-based image retrieval task, which constitutes finding similar images in a dataset to a given query. We then move to the more challenging domain of action recognition, where a video sequence is classified according to its constituent action. In this case, we demonstrate how the RGB modality can be supplemented with a flow modality, comprising motion vectors extracted directly from the video codec. The motion vectors (MVs) are used both as input to a CNN and as an activity sensor for providing selective macroblock (MB) decoding of RGB frames instead of full-frame decoding. We independently train two CNNs on RGB and MV correspondences and then fuse their scores during inference, demonstrating faster end-to-end processing and competitive classification accuracy to recent work. In order to explore the use of more efficient sensing modalities, we replace the MV stream with a neuromorphic vision sensing (NVS) stream for action recognition. NVS hardware mimics the biological retina and operates with substantially lower power and at significantly higher sampling rates than conventional active pixel sensing (APS) cameras. Due to the lack of training data in this domain, we generate emulated NVS frames directly from consecutive RGB frames and use these to train a teacher-student framework that additionally leverages on the abundance of optical flow training data. In the final part of this thesis, we introduce a novel unsupervised domain adaptation method for further minimizing the domain shift between emulated (source) and real (target) NVS data domains
- …