4,350 research outputs found

    Adaptive visual sampling

    Get PDF
    PhDVarious visual tasks may be analysed in the context of sampling from the visual field. In visual psychophysics, human visual sampling strategies have often been shown at a high-level to be driven by various information and resource related factors such as the limited capacity of the human cognitive system, the quality of information gathered, its relevance in context and the associated efficiency of recovering it. At a lower-level, we interpret many computer vision tasks to be rooted in similar notions of contextually-relevant, dynamic sampling strategies which are geared towards the filtering of pixel samples to perform reliable object association. In the context of object tracking, the reliability of such endeavours is fundamentally rooted in the continuing relevance of object models used for such filtering, a requirement complicated by realworld conditions such as dynamic lighting that inconveniently and frequently cause their rapid obsolescence. In the context of recognition, performance can be hindered by the lack of learned context-dependent strategies that satisfactorily filter out samples that are irrelevant or blunt the potency of models used for discrimination. In this thesis we interpret the problems of visual tracking and recognition in terms of dynamic spatial and featural sampling strategies and, in this vein, present three frameworks that build on previous methods to provide a more flexible and effective approach. Firstly, we propose an adaptive spatial sampling strategy framework to maintain statistical object models for real-time robust tracking under changing lighting conditions. We employ colour features in experiments to demonstrate its effectiveness. The framework consists of five parts: (a) Gaussian mixture models for semi-parametric modelling of the colour distributions of multicolour objects; (b) a constructive algorithm that uses cross-validation for automatically determining the number of components for a Gaussian mixture given a sample set of object colours; (c) a sampling strategy for performing fast tracking using colour models; (d) a Bayesian formulation enabling models of object and the environment to be employed together in filtering samples by discrimination; and (e) a selectively-adaptive mechanism to enable colour models to cope with changing conditions and permit more robust tracking. Secondly, we extend the concept to an adaptive spatial and featural sampling strategy to deal with very difficult conditions such as small target objects in cluttered environments undergoing severe lighting fluctuations and extreme occlusions. This builds on previous work on dynamic feature selection during tracking by reducing redundancy in features selected at each stage as well as more naturally balancing short-term and long-term evidence, the latter to facilitate model rigidity under sharp, temporary changes such as occlusion whilst permitting model flexibility under slower, long-term changes such as varying lighting conditions. This framework consists of two parts: (a) Attribute-based Feature Ranking (AFR) which combines two attribute measures; discriminability and independence to other features; and (b) Multiple Selectively-adaptive Feature Models (MSFM) which involves maintaining a dynamic feature reference of target object appearance. We call this framework Adaptive Multi-feature Association (AMA). Finally, we present an adaptive spatial and featural sampling strategy that extends established Local Binary Pattern (LBP) methods and overcomes many severe limitations of the traditional approach such as limited spatial support, restricted sample sets and ad hoc joint and disjoint statistical distributions that may fail to capture important structure. Our framework enables more compact, descriptive LBP type models to be constructed which may be employed in conjunction with many existing LBP techniques to improve their performance without modification. The framework consists of two parts: (a) a new LBP-type model known as Multiscale Selected Local Binary Features (MSLBF); and (b) a novel binary feature selection algorithm called Binary Histogram Intersection Minimisation (BHIM) which is shown to be more powerful than established methods used for binary feature selection such as Conditional Mutual Information Maximisation (CMIM) and AdaBoost

    YoloCurvSeg: You Only Label One Noisy Skeleton for Vessel-style Curvilinear Structure Segmentation

    Full text link
    Weakly-supervised learning (WSL) has been proposed to alleviate the conflict between data annotation cost and model performance through employing sparsely-grained (i.e., point-, box-, scribble-wise) supervision and has shown promising performance, particularly in the image segmentation field. However, it is still a very challenging problem due to the limited supervision, especially when only a small number of labeled samples are available. Additionally, almost all existing WSL segmentation methods are designed for star-convex structures which are very different from curvilinear structures such as vessels and nerves. In this paper, we propose a novel sparsely annotated segmentation framework for curvilinear structures, named YoloCurvSeg, based on image synthesis. A background generator delivers image backgrounds that closely match real distributions through inpainting dilated skeletons. The extracted backgrounds are then combined with randomly emulated curves generated by a Space Colonization Algorithm-based foreground generator and through a multilayer patch-wise contrastive learning synthesizer. In this way, a synthetic dataset with both images and curve segmentation labels is obtained, at the cost of only one or a few noisy skeleton annotations. Finally, a segmenter is trained with the generated dataset and possibly an unlabeled dataset. The proposed YoloCurvSeg is evaluated on four publicly available datasets (OCTA500, CORN, DRIVE and CHASEDB1) and the results show that YoloCurvSeg outperforms state-of-the-art WSL segmentation methods by large margins. With only one noisy skeleton annotation (respectively 0.14%, 0.03%, 1.40%, and 0.65% of the full annotation), YoloCurvSeg achieves more than 97% of the fully-supervised performance on each dataset. Code and datasets will be released at https://github.com/llmir/YoloCurvSeg.Comment: 11 pages, 10 figures, submitted to IEEE Transactions on Medical Imaging (TMI

    The Timing of Vision – How Neural Processing Links to Different Temporal Dynamics

    Get PDF
    In this review, we describe our recent attempts to model the neural correlates of visual perception with biologically inspired networks of spiking neurons, emphasizing the dynamical aspects. Experimental evidence suggests distinct processing modes depending on the type of task the visual system is engaged in. A first mode, crucial for object recognition, deals with rapidly extracting the glimpse of a visual scene in the first 100 ms after its presentation. The promptness of this process points to mainly feedforward processing, which relies on latency coding, and may be shaped by spike timing-dependent plasticity (STDP). Our simulations confirm the plausibility and efficiency of such a scheme. A second mode can be engaged whenever one needs to perform finer perceptual discrimination through evidence accumulation on the order of 400 ms and above. Here, our simulations, together with theoretical considerations, show how predominantly local recurrent connections and long neural time-constants enable the integration and build-up of firing rates on this timescale. In particular, we review how a non-linear model with attractor states induced by strong recurrent connectivity provides straightforward explanations for several recent experimental observations. A third mode, involving additional top-down attentional signals, is relevant for more complex visual scene processing. In the model, as in the brain, these top-down attentional signals shape visual processing by biasing the competition between different pools of neurons. The winning pools may not only have a higher firing rate, but also more synchronous oscillatory activity. This fourth mode, oscillatory activity, leads to faster reaction times and enhanced information transfers in the model. This has indeed been observed experimentally. Moreover, oscillatory activity can format spike times and encode information in the spike phases with respect to the oscillatory cycle. This phenomenon is referred to as “phase-of-firing coding,” and experimental evidence for it is accumulating in the visual system. Simulations show that this code can again be efficiently decoded by STDP. Future work should focus on continuous natural vision, bio-inspired hardware vision systems, and novel experimental paradigms to further distinguish current modeling approaches

    Change blindness: eradication of gestalt strategies

    Get PDF
    Arrays of eight, texture-defined rectangles were used as stimuli in a one-shot change blindness (CB) task where there was a 50% chance that one rectangle would change orientation between two successive presentations separated by an interval. CB was eliminated by cueing the target rectangle in the first stimulus, reduced by cueing in the interval and unaffected by cueing in the second presentation. This supports the idea that a representation was formed that persisted through the interval before being 'overwritten' by the second presentation (Landman et al, 2003 Vision Research 43149–164]. Another possibility is that participants used some kind of grouping or Gestalt strategy. To test this we changed the spatial position of the rectangles in the second presentation by shifting them along imaginary spokes (by ±1 degree) emanating from the central fixation point. There was no significant difference seen in performance between this and the standard task [F(1,4)=2.565, p=0.185]. This may suggest two things: (i) Gestalt grouping is not used as a strategy in these tasks, and (ii) it gives further weight to the argument that objects may be stored and retrieved from a pre-attentional store during this task

    Neurophysiological Influence of Musical Training on Speech Perception

    Get PDF
    Does musical training affect our perception of speech? For example, does learning to play a musical instrument modify the neural circuitry for auditory processing in a way that improves one's ability to perceive speech more clearly in noisy environments? If so, can speech perception in individuals with hearing loss (HL), who struggle in noisy situations, benefit from musical training? While music and speech exhibit some specialization in neural processing, there is evidence suggesting that skills acquired through musical training for specific acoustical processes may transfer to, and thereby improve, speech perception. The neurophysiological mechanisms underlying the influence of musical training on speech processing and the extent of this influence remains a rich area to be explored. A prerequisite for such transfer is the facilitation of greater neurophysiological overlap between speech and music processing following musical training. This review first establishes a neurophysiological link between musical training and speech perception, and subsequently provides further hypotheses on the neurophysiological implications of musical training on speech perception in adverse acoustical environments and in individuals with HL

    Intrinsic activity in the fly brain gates visual information during behavioral choices

    Get PDF
    The small insect brain is often described as an input/output system that executes reflex-like behaviors. It can also initiate neural activity and behaviors intrinsically, seen as spontaneous behaviors, different arousal states and sleep. However, less is known about how intrinsic activity in neural circuits affects sensory information processing in the insect brain and variability in behavior. Here, by simultaneously monitoring Drosophila's behavioral choices and brain activity in a flight simulator system, we identify intrinsic activity that is associated with the act of selecting between visual stimuli. We recorded neural output (multiunit action potentials and local field potentials) in the left and right optic lobes of a tethered flying Drosophila, while its attempts to follow visual motion (yaw torque) were measured by a torque meter. We show that when facing competing motion stimuli on its left and right, Drosophila typically generate large torque responses that flip from side to side. The delayed onset (0.1-1 s) and spontaneous switch-like dynamics of these responses, and the fact that the flies sometimes oppose the stimuli by flying straight, make this behavior different from the classic steering reflexes. Drosophila, thus, seem to choose one stimulus at a time and attempt to rotate toward its direction. With this behavior, the neural output of the optic lobes alternates; being augmented on the side chosen for body rotation and suppressed on the opposite side, even though the visual input to the fly eyes stays the same. Thus, the flow of information from the fly eyes is gated intrinsically. Such modulation can be noise-induced or intentional; with one possibility being that the fly brain highlights chosen information while ignoring the irrelevant, similar to what we know to occur in higher animals

    Effects of Aging and Spectral Shaping on the Sub-cortical (Brainstem) Differentiation of Contrastive Stop Consonants

    Get PDF
    Purpose: The objectives of this dissertation are to: (1) evaluate the influence of aging on the sub-cortical (brainstem) differentiation of voiced stop consonants (i.e. /b-d-g/); (2) determine whether potential aging deficits at the brainstem level influence behavioral identification of the /b-d-g/ stimuli, (3) investigate whether spectral shaping diminishes any aging impairments at the brainstem level; and (4) if so, whether minimizing these deficits improves the behavioral identification of the speech stimuli. Subjects: Behavioral and electrophysiological responses were collected from 11 older adults (\u3e 50 years old) with near-normal to normal hearing and were compared to those of 16 normal-hearing younger adults (control group). Stimuli and Methods: Speech- evoked auditory brainstem responses (Speech-ABRs) were recorded for three 100-ms long /b-d-g/ consonant-vowel exemplars in unshaped and shaped conditions, for a total of six stimuli. Frequency-dependent spectral-shaping enhanced the second formant (F2) transition relative to the rest of the stimulus, such that it reduced gain for low frequencies; and increased gain for mid and high frequencies, the frequency region of the F2 transition in the /b-d-g/ syllables. Behavioral identification of 15-step perceptual unshaped and shaped /b-d-g/ continua was assessed by generating psychometric functions in order to quantify stimuli perception. Speech ABR peak amplitudes and latencies and stop consonant differentiation scores were measured for 6 stimuli (3 unshaped stimuli and 3 shaped stimuli). Summary of Findings: Older adults exhibited more robust categorical perception, and subtle sub-cortical deficits when compared to younger adults. Individual data showed fewer expected latency patterns for the /b-d-g/ speech-ABRs in older adults as opposed to younger adults, especially for major peaks. Spectral shaping improved the stop consonant differentiation score for major peaks in older adults, such that it moved older adults in the direction of the younger adults’ responses. Conclusion: Sub-cortical impairments at least those measured in this study do not seem to influence the behavioral differentiation of stop consonants in older adults. On the other hand, cue enhancement by spectral shaping seems to overcome some of the deficits noted at the electrophysiological level. However, due to a possible ceiling effect, improvements to the originally robust perception of older adults, at the behavioral level were not found. Significance: Aging seems to reduce the sub-cortical responsiveness to dynamic spectral cues without distorting the spectral coding as evident by the “reparable” age-related changes seen at the electrophysiological level. Cue enhancement appears to increase the neural responsiveness of aged but intact neurons, yielding a better sub-cortical differentiation of stop consonants
    corecore