174 research outputs found
Biased Competition in Visual Processing Hierarchies: A Learning Approach Using Multiple Cues
In this contribution, we present a large-scale hierarchical system for object detection fusing bottom-up (signal-driven) processing results with top-down (model or task-driven) attentional modulation. Specifically, we focus on the question of how the autonomous learning of invariant models can be embedded into a performing system and how such models can be used to define object-specific attentional modulation signals. Our system implements bi-directional data flow in a processing hierarchy. The bottom-up data flow proceeds from a preprocessing level to the hypothesis level where object hypotheses created by exhaustive object detection algorithms are represented in a roughly retinotopic way. A competitive selection mechanism is used to determine the most confident hypotheses, which are used on the system level to train multimodal models that link object identity to invariant hypothesis properties. The top-down data flow originates at the system level, where the trained multimodal models are used to obtain space- and feature-based attentional modulation signals, providing biases for the competitive selection process at the hypothesis level. This results in object-specific hypothesis facilitation/suppression in certain image regions which we show to be applicable to different object detection mechanisms. In order to demonstrate the benefits of this approach, we apply the system to the detection of cars in a variety of challenging traffic videos. Evaluating our approach on a publicly available dataset containing approximately 3,500 annotated video images from more than 1Â h of driving, we can show strong increases in performance and generalization when compared to object detection in isolation. Furthermore, we compare our results to a late hypothesis rejection approach, showing that early coupling of top-down and bottom-up information is a favorable approach especially when processing resources are constrained
Coverage, Continuity and Visual Cortical Architecture
The primary visual cortex of many mammals contains a continuous
representation of visual space, with a roughly repetitive aperiodic map of
orientation preferences superimposed. It was recently found that orientation
preference maps (OPMs) obey statistical laws which are apparently invariant
among species widely separated in eutherian evolution. Here, we examine whether
one of the most prominent models for the optimization of cortical maps, the
elastic net (EN) model, can reproduce this common design. The EN model
generates representations which optimally trade of stimulus space coverage and
map continuity. While this model has been used in numerous studies, no
analytical results about the precise layout of the predicted OPMs have been
obtained so far. We present a mathematical approach to analytically calculate
the cortical representations predicted by the EN model for the joint mapping of
stimulus position and orientation. We find that in all previously studied
regimes, predicted OPM layouts are perfectly periodic. An unbiased search
through the EN parameter space identifies a novel regime of aperiodic OPMs with
pinwheel densities lower than found in experiments. In an extreme limit,
aperiodic OPMs quantitatively resembling experimental observations emerge.
Stabilization of these layouts results from strong nonlocal interactions rather
than from a coverage-continuity-compromise. Our results demonstrate that
optimization models for stimulus representations dominated by nonlocal
suppressive interactions are in principle capable of correctly predicting the
common OPM design. They question that visual cortical feature representations
can be explained by a coverage-continuity-compromise.Comment: 100 pages, including an Appendix, 21 + 7 figure
Neural organisation of innate behaviour in zebrafish larvae
Animalsâ inner worlds are a hazy imitation of reality, shaped by evolution. Of the infinitude of stimuli that can arise in their natural environment, only a few will bear significance for an animalâs survival and reproductive success. Thus, neural circuits have evolved to extract only these relevant stimuli from the background and connect them to downstream effectors. Sometimes, competing representations of the outside world arise in the brain, and these must be resolved to ensure adaptive behaviour. Through the study of an animalâs behaviour, we can learn about its inner world: which stimuli it cares about; the desires these stimuli engender within it; and how its movements enact and extinguish those desires, allowing new stimuli to emerge that reorchestrate the inner world and refresh the cycle. Here, I present three studies that investigate the emergence of this world in the neural circuits of zebrafish larvae.
In the first study, I mapped the behavioural sequences of zebrafish larvae as they pursued and consumed prey. Manipulating their vision with genetic mutants, virtual reality, and lesion studies revealed the dynamic features of stimuli that drive switches in the behaviour. I showed that, by chaining kinematically varied swim types into regular sequences, larvae bring prey to a binocular zone in the near visual field. Here, the fused representation of the stimulus across hemispheres releases stereotyped strike manoeuvres, tuned to the distance to the prey.
In the second study, I helped investigate how visual circuits build representations of prey and predator stimuli. Measuring the responses of neurons to visual stimuli revealed how feature selectivity arises from the integration of upstream inputs. Features are unevenly represented across space, matching predicted changes in prey percepts as animals progress through their hunting sequences. When neurons tuned to specific features were ablated, I showed that the detection of prey was altered, no longer eliciting the usual hunting responses from animals.
In the third study, I contributed to the discovery of a circuit in the brain that coordinates behavioural responses to competing stimuli. When confronted with multiple threats, animals either ignore one and escape from the other, or average their locations and escape in an intermediate direction. I showed that these two strategies are mediated by distinct swims types. Inhibiting specific neurons in the brain reduced directional escapes, but not intermediate ones, revealing a circuit that contributes to a bottom-up attention mechanism.
Together, these three studies reveal the organisation of behaviour within neural circuits of the larval zebrafish brain. Finally, I consider the broader networks in the brain that might implement and modulate responses to salient visual stimuli, and how these circuits could serve as a substrate for behavioural evolution
Modelling individual variations in brain structure and function using multimodal MRI
Every brain is different. Understanding this variability is crucial for investigating
the neural substrate underlying individualsâ unique behaviour and developing
personalised diagnosis and treatments. This thesis presents novel computational
approaches to study individual variability in brain structure and function using
magnetic resonance imaging (MRI) data. It comprises three main chapters, each
addressing a specific challenge in the field.
In Chapter 3, the thesis proposes a novel Image Quality Transfer (IQT) technique,
HQ-augmentation, to accurately localise a Deep Brain Stimulation (DBS) target
in low-quality clinical-like data. Leveraging high-quality diffusion MRI datasets
from the Human Connectome Project (HCP), the HQ-augmentation approach is
robust to corruptions in data quality while preserving the individual anatomical
variability of the DBS target. It outperforms existing alternatives and generalises
to unseen low-quality diffusion MRI datasets with different acquisition protocols,
such as the UK Biobank (UKB) dataset.
In Chapter 4, the thesis presents a framework for enhancing prediction accuracy
of individual task-fMRI activation profiles using the variability of resting-state
fMRI. Assuming resting-state functional modes underlie task-evoked activity, this
chapter demonstrates that shape and intensity of individualised task activations can
be separately modelled. This chapter introduced the concept of "residualisation"
and showed that training on residuals leads to better individualised predictions.
The frameworkâs prediction accuracy, validated on HCP and UKB data, is on
par with task-fMRI test-retest reliability, suggesting potential for supplementing
traditional task localisers.
In Chapter 5, the thesis presents a novel framework for individualised retinotopic
mapping using resting-state fMRI, from the primary visual cortex to visual cortex
area 4. The proposed approach reproduces task-elicited retinotopy and captures individual differences in retinotopic organisation. The proposed framework delineates
borders of early visual areas more accurately than group-average parcellation and is
effective with both high-field 7T and more common 3T resting-state fMRI data, providing a valuable alternative to resource-intensive retinotopy task-fMRI experiments.
Overall, this thesis demonstrates the potential of advanced MRI analysis
techniques to study individual variability in brain structure and function, paving
the way for improved clinical applications tailored to individual patients and a
better understanding of neural mechanisms underlying unique human behaviour
A topological solution to object segmentation and tracking
The world is composed of objects, the ground, and the sky. Visual perception
of objects requires solving two fundamental challenges: segmenting visual input
into discrete units, and tracking identities of these units despite appearance
changes due to object deformation, changing perspective, and dynamic occlusion.
Current computer vision approaches to segmentation and tracking that approach
human performance all require learning, raising the question: can objects be
segmented and tracked without learning? Here, we show that the mathematical
structure of light rays reflected from environment surfaces yields a natural
representation of persistent surfaces, and this surface representation provides
a solution to both the segmentation and tracking problems. We describe how to
generate this surface representation from continuous visual input, and
demonstrate that our approach can segment and invariantly track objects in
cluttered synthetic video despite severe appearance changes, without requiring
learning.Comment: 21 pages, 6 main figures, 3 supplemental figures, and supplementary
material containing mathematical proof
Learning, Moving, And Predicting With Global Motion Representations
In order to effectively respond to and influence the world they inhabit, animals and other intelligent agents must understand and predict the state of the world and its dynamics. An agent that can characterize how the world moves is better equipped to engage it. Current methods of motion computation rely on local representations of motion (such as optical flow) or simple, rigid global representations (such as camera motion). These methods are useful, but they are difficult to estimate reliably and limited in their applicability to real-world settings, where agents frequently must reason about complex, highly nonrigid motion over long time horizons. In this dissertation, I present methods developed with the goal of building more flexible and powerful notions of motion needed by agents facing the challenges of a dynamic, nonrigid world. This work is organized around a view of motion as a global phenomenon that is not adequately addressed by local or low-level descriptions, but that is best understood when analyzed at the level of whole images and scenes. I develop methods to: (i) robustly estimate camera motion from noisy optical flow estimates by exploiting the global, statistical relationship between the optical flow field and camera motion under projective geometry; (ii) learn representations of visual motion directly from unlabeled image sequences using learning rules derived from a formulation of image transformation in terms of its group properties; (iii) predict future frames of a video by learning a joint representation of the instantaneous state of the visual world and its motion, using a view of motion as transformations of world state. I situate this work in the broader context of ongoing computational and biological investigations into the problem of estimating motion for intelligent perception and action
- âŠ