4,350 research outputs found
Adaptive visual sampling
PhDVarious visual tasks may be analysed in the context of sampling from the visual field. In visual
psychophysics, human visual sampling strategies have often been shown at a high-level to
be driven by various information and resource related factors such as the limited capacity of
the human cognitive system, the quality of information gathered, its relevance in context and
the associated efficiency of recovering it. At a lower-level, we interpret many computer vision
tasks to be rooted in similar notions of contextually-relevant, dynamic sampling strategies
which are geared towards the filtering of pixel samples to perform reliable object association. In
the context of object tracking, the reliability of such endeavours is fundamentally rooted in the
continuing relevance of object models used for such filtering, a requirement complicated by realworld
conditions such as dynamic lighting that inconveniently and frequently cause their rapid
obsolescence. In the context of recognition, performance can be hindered by the lack of learned
context-dependent strategies that satisfactorily filter out samples that are irrelevant or blunt the
potency of models used for discrimination. In this thesis we interpret the problems of visual
tracking and recognition in terms of dynamic spatial and featural sampling strategies and, in this
vein, present three frameworks that build on previous methods to provide a more flexible and
effective approach.
Firstly, we propose an adaptive spatial sampling strategy framework to maintain statistical object
models for real-time robust tracking under changing lighting conditions. We employ colour
features in experiments to demonstrate its effectiveness. The framework consists of five parts:
(a) Gaussian mixture models for semi-parametric modelling of the colour distributions of multicolour
objects; (b) a constructive algorithm that uses cross-validation for automatically determining
the number of components for a Gaussian mixture given a sample set of object colours; (c) a
sampling strategy for performing fast tracking using colour models; (d) a Bayesian formulation
enabling models of object and the environment to be employed together in filtering samples by
discrimination; and (e) a selectively-adaptive mechanism to enable colour models to cope with
changing conditions and permit more robust tracking.
Secondly, we extend the concept to an adaptive spatial and featural sampling strategy to deal
with very difficult conditions such as small target objects in cluttered environments undergoing
severe lighting fluctuations and extreme occlusions. This builds on previous work on dynamic
feature selection during tracking by reducing redundancy in features selected at each stage as
well as more naturally balancing short-term and long-term evidence, the latter to facilitate model
rigidity under sharp, temporary changes such as occlusion whilst permitting model flexibility
under slower, long-term changes such as varying lighting conditions. This framework consists of
two parts: (a) Attribute-based Feature Ranking (AFR) which combines two attribute measures;
discriminability and independence to other features; and (b) Multiple Selectively-adaptive Feature
Models (MSFM) which involves maintaining a dynamic feature reference of target object
appearance. We call this framework Adaptive Multi-feature Association (AMA). Finally, we present an adaptive spatial and featural sampling strategy that extends established
Local Binary Pattern (LBP) methods and overcomes many severe limitations of the traditional
approach such as limited spatial support, restricted sample sets and ad hoc joint and disjoint statistical
distributions that may fail to capture important structure. Our framework enables more
compact, descriptive LBP type models to be constructed which may be employed in conjunction
with many existing LBP techniques to improve their performance without modification. The
framework consists of two parts: (a) a new LBP-type model known as Multiscale Selected Local
Binary Features (MSLBF); and (b) a novel binary feature selection algorithm called Binary Histogram
Intersection Minimisation (BHIM) which is shown to be more powerful than established
methods used for binary feature selection such as Conditional Mutual Information Maximisation
(CMIM) and AdaBoost
YoloCurvSeg: You Only Label One Noisy Skeleton for Vessel-style Curvilinear Structure Segmentation
Weakly-supervised learning (WSL) has been proposed to alleviate the conflict
between data annotation cost and model performance through employing
sparsely-grained (i.e., point-, box-, scribble-wise) supervision and has shown
promising performance, particularly in the image segmentation field. However,
it is still a very challenging problem due to the limited supervision,
especially when only a small number of labeled samples are available.
Additionally, almost all existing WSL segmentation methods are designed for
star-convex structures which are very different from curvilinear structures
such as vessels and nerves. In this paper, we propose a novel sparsely
annotated segmentation framework for curvilinear structures, named YoloCurvSeg,
based on image synthesis. A background generator delivers image backgrounds
that closely match real distributions through inpainting dilated skeletons. The
extracted backgrounds are then combined with randomly emulated curves generated
by a Space Colonization Algorithm-based foreground generator and through a
multilayer patch-wise contrastive learning synthesizer. In this way, a
synthetic dataset with both images and curve segmentation labels is obtained,
at the cost of only one or a few noisy skeleton annotations. Finally, a
segmenter is trained with the generated dataset and possibly an unlabeled
dataset. The proposed YoloCurvSeg is evaluated on four publicly available
datasets (OCTA500, CORN, DRIVE and CHASEDB1) and the results show that
YoloCurvSeg outperforms state-of-the-art WSL segmentation methods by large
margins. With only one noisy skeleton annotation (respectively 0.14%, 0.03%,
1.40%, and 0.65% of the full annotation), YoloCurvSeg achieves more than 97% of
the fully-supervised performance on each dataset. Code and datasets will be
released at https://github.com/llmir/YoloCurvSeg.Comment: 11 pages, 10 figures, submitted to IEEE Transactions on Medical
Imaging (TMI
The Timing of Vision – How Neural Processing Links to Different Temporal Dynamics
In this review, we describe our recent attempts to model the neural correlates of visual perception with biologically inspired networks of spiking neurons, emphasizing the dynamical aspects. Experimental evidence suggests distinct processing modes depending on the type of task the visual system is engaged in. A first mode, crucial for object recognition, deals with rapidly extracting the glimpse of a visual scene in the first 100 ms after its presentation. The promptness of this process points to mainly feedforward processing, which relies on latency coding, and may be shaped by spike timing-dependent plasticity (STDP). Our simulations confirm the plausibility and efficiency of such a scheme. A second mode can be engaged whenever one needs to perform finer perceptual discrimination through evidence accumulation on the order of 400 ms and above. Here, our simulations, together with theoretical considerations, show how predominantly local recurrent connections and long neural time-constants enable the integration and build-up of firing rates on this timescale. In particular, we review how a non-linear model with attractor states induced by strong recurrent connectivity provides straightforward explanations for several recent experimental observations. A third mode, involving additional top-down attentional signals, is relevant for more complex visual scene processing. In the model, as in the brain, these top-down attentional signals shape visual processing by biasing the competition between different pools of neurons. The winning pools may not only have a higher firing rate, but also more synchronous oscillatory activity. This fourth mode, oscillatory activity, leads to faster reaction times and enhanced information transfers in the model. This has indeed been observed experimentally. Moreover, oscillatory activity can format spike times and encode information in the spike phases with respect to the oscillatory cycle. This phenomenon is referred to as “phase-of-firing coding,” and experimental evidence for it is accumulating in the visual system. Simulations show that this code can again be efficiently decoded by STDP. Future work should focus on continuous natural vision, bio-inspired hardware vision systems, and novel experimental paradigms to further distinguish current modeling approaches
Change blindness: eradication of gestalt strategies
Arrays of eight, texture-defined rectangles were used as stimuli in a one-shot change blindness (CB) task where there was a 50% chance that one rectangle would change orientation between two successive presentations separated by an interval. CB was eliminated by cueing the target rectangle in the first stimulus, reduced by cueing in the interval and unaffected by cueing in the second presentation. This supports the idea that a representation was formed that persisted through the interval before being 'overwritten' by the second presentation (Landman et al, 2003 Vision Research 43149–164]. Another possibility is that participants used some kind of grouping or Gestalt strategy. To test this we changed the spatial position of the rectangles in the second presentation by shifting them along imaginary spokes (by ±1 degree) emanating from the central fixation point. There was no significant difference seen in performance between this and the standard task [F(1,4)=2.565, p=0.185]. This may suggest two things: (i) Gestalt grouping is not used as a strategy in these tasks, and (ii) it gives further weight to the argument that objects may be stored and retrieved from a pre-attentional store during this task
Neurophysiological Influence of Musical Training on Speech Perception
Does musical training affect our perception of speech? For example, does learning to play a musical instrument modify the neural circuitry for auditory processing in a way that improves one's ability to perceive speech more clearly in noisy environments? If so, can speech perception in individuals with hearing loss (HL), who struggle in noisy situations, benefit from musical training? While music and speech exhibit some specialization in neural processing, there is evidence suggesting that skills acquired through musical training for specific acoustical processes may transfer to, and thereby improve, speech perception. The neurophysiological mechanisms underlying the influence of musical training on speech processing and the extent of this influence remains a rich area to be explored. A prerequisite for such transfer is the facilitation of greater neurophysiological overlap between speech and music processing following musical training. This review first establishes a neurophysiological link between musical training and speech perception, and subsequently provides further hypotheses on the neurophysiological implications of musical training on speech perception in adverse acoustical environments and in individuals with HL
Intrinsic activity in the fly brain gates visual information during behavioral choices
The small insect brain is often described as an input/output system that executes reflex-like behaviors. It can also initiate neural activity and behaviors intrinsically, seen as spontaneous behaviors, different arousal states and sleep. However, less is known about how intrinsic activity in neural circuits affects sensory information processing in the insect brain and variability in behavior. Here, by simultaneously monitoring Drosophila's behavioral choices and brain activity in a flight simulator system, we identify intrinsic activity that is associated with the act of selecting between visual stimuli. We recorded neural output (multiunit action potentials and local field potentials) in the left and right optic lobes of a tethered flying Drosophila, while its attempts to follow visual motion (yaw torque) were measured by a torque meter. We show that when facing competing motion stimuli on its left and right, Drosophila typically generate large torque responses that flip from side to side. The delayed onset (0.1-1 s) and spontaneous switch-like dynamics of these responses, and the fact that the flies sometimes oppose the stimuli by flying straight, make this behavior different from the classic steering reflexes. Drosophila, thus, seem to choose one stimulus at a time and attempt to rotate toward its direction. With this behavior, the neural output of the optic lobes alternates; being augmented on the side chosen for body rotation and suppressed on the opposite side, even though the visual input to the fly eyes stays the same. Thus, the flow of information from the fly eyes is gated intrinsically. Such modulation can be noise-induced or intentional; with one possibility being that the fly brain highlights chosen information while ignoring the irrelevant, similar to what we know to occur in higher animals
Effects of Aging and Spectral Shaping on the Sub-cortical (Brainstem) Differentiation of Contrastive Stop Consonants
Purpose: The objectives of this dissertation are to: (1) evaluate the influence of aging on the sub-cortical (brainstem) differentiation of voiced stop consonants (i.e. /b-d-g/); (2) determine whether potential aging deficits at the brainstem level influence behavioral identification of the /b-d-g/ stimuli, (3) investigate whether spectral shaping diminishes any aging impairments at the brainstem level; and (4) if so, whether minimizing these deficits improves the behavioral identification of the speech stimuli.
Subjects: Behavioral and electrophysiological responses were collected from 11 older adults (\u3e 50 years old) with near-normal to normal hearing and were compared to those of 16 normal-hearing younger adults (control group).
Stimuli and Methods: Speech- evoked auditory brainstem responses (Speech-ABRs) were recorded for three 100-ms long /b-d-g/ consonant-vowel exemplars in unshaped and shaped conditions, for a total of six stimuli. Frequency-dependent spectral-shaping enhanced the second formant (F2) transition relative to the rest of the stimulus, such that it reduced gain for low frequencies; and increased gain for mid and high frequencies, the frequency region of the F2 transition in the /b-d-g/ syllables. Behavioral identification of 15-step perceptual unshaped and shaped /b-d-g/ continua was assessed by generating psychometric functions in order to quantify stimuli perception. Speech ABR peak amplitudes and latencies and stop consonant differentiation scores were measured for 6 stimuli (3 unshaped stimuli and 3 shaped stimuli).
Summary of Findings: Older adults exhibited more robust categorical perception, and subtle sub-cortical deficits when compared to younger adults. Individual data showed fewer expected latency patterns for the /b-d-g/ speech-ABRs in older adults as opposed to younger adults, especially for major peaks. Spectral shaping improved the stop consonant differentiation score for major peaks in older adults, such that it moved older adults in the direction of the younger adults’ responses.
Conclusion: Sub-cortical impairments at least those measured in this study do not seem to influence the behavioral differentiation of stop consonants in older adults. On the other hand, cue enhancement by spectral shaping seems to overcome some of the deficits noted at the electrophysiological level. However, due to a possible ceiling effect, improvements to the originally robust perception of older adults, at the behavioral level were not found.
Significance: Aging seems to reduce the sub-cortical responsiveness to dynamic spectral cues without distorting the spectral coding as evident by the “reparable” age-related changes seen at the electrophysiological level. Cue enhancement appears to increase the neural responsiveness of aged but intact neurons, yielding a better sub-cortical differentiation of stop consonants
- …