1,254 research outputs found
A robust FLIR target detection employing an auto-convergent pulse coupled neural network
© 2019 Informa UK Limited, trading as Taylor & Francis Group. Automatic target detection (ATD) of a small target along with its true shape from highly cluttered forward-looking infrared (FLIR) imagery is crucial. FLIR imagery is low contrast in nature, which makes it difficult to discriminate the target from its immediate background. Here, pulse-coupled neural network (PCNN) is extended with auto-convergent criteria to provide an efficient ATD tool. The proposed auto-convergent PCNN (AC-PCNN) segments the target from its background in an adaptive manner to identify the target region when the target is camouflaged or contains higher visual clutter. Then, selection of region of interest followed by template matching is augmented to capture the accurate shape of a target in a real scenario. The outcomes of the proposed method are validated through well-known statistical methods and found superior performance over other conventional methods
Bottom-up visual attention model for still image: a preliminary study
The philosophy of human visual attention is scientifically explained in the field of cognitive psychology and neuroscience then computationally modeled in the field of computer science and engineering. Visual attention models have been applied in computer vision systems such as object detection, object recognition, image segmentation, image and video compression, action recognition, visual tracking, and so on. This work studies bottom-up visual attention, namely human fixation prediction and salient object detection models. The preliminary study briefly covers from the biological perspective of visual attention, including visual pathway, the theory of visual attention, to the computational model of bottom-up visual attention that generates saliency map. The study compares some models at each stage and observes whether the stage is inspired by biological architecture, concept, or behavior of human visual attention. From the study, the use of low-level features, center-surround mechanism, sparse representation, and higher-level guidance with intrinsic cues dominate the bottom-up visual attention approaches. The study also highlights the correlation between bottom-up visual attention and curiosity
Recommended from our members
fMRI correlates of object-based attentional facilitation versus suppression of irrelevant stimuli, dependent on global grouping and endogenous cueing.
BACKGROUND: Theories of object-based attention often make two assumptions: that attentional resources are facilitatory, and that they spread automatically within grouped objects. Consistent with this, ignored visual stimuli can be easier to process, or more distracting, when perceptually grouped with an attended target stimulus. But in past studies, the ignored stimuli often shared potentially relevant features or locations with the target. In this fMRI study, we measured the effects of attention and grouping on Blood Oxygenation Level Dependent (BOLD) responses in the human brain to entirely task-irrelevant events.Two checkerboards were displayed each in opposite hemifields, while participants responded to check-size changes in one pre-cued hemifield, which varied between blocks. Grouping (or segmentation) between hemifields was manipulated between blocks, using common (versus distinct) motion cues. Task-irrelevant transient events were introduced by randomly changing the colour of either checkerboard, attended or ignored, at unpredictable intervals. The above assumptions predict heightened BOLD signals for irrelevant events in attended versus ignored hemifields for ungrouped contexts, but less such attentional modulation under grouping, due to automatic spreading of facilitation across hemifields. We found the opposite pattern, in primary visual cortex. For ungrouped stimuli, BOLD signals associated with task-irrelevant changes were lower, not higher, in the attended versus ignored hemifield; furthermore, attentional modulation was not reduced but actually inverted under grouping, with higher signals for events in the attended versus ignored hemifield
Cortico-cortical feedback engages active dendrites in visual cortex
Sensory processing in the neocortex requires both feedforward and feedback information flow between cortical areas 1. In feedback processing, higher-level representations provide contextual information to lower levels, and facilitate perceptual functions such as contour integration and figure–ground segmentation 2,3. However, we have limited understanding of the circuit and cellular mechanisms that mediate feedback influence. Here we use long-range all-optical connectivity mapping in mice to show that feedback influence from the lateromedial higher visual area (LM) to the primary visual cortex (V1) is spatially organized. When the source and target of feedback represent the same area of visual space, feedback is relatively suppressive. By contrast, when the source is offset from the target in visual space, feedback is relatively facilitating. Two-photon calcium imaging data show that this facilitating feedback is nonlinearly integrated in the apical tuft dendrites of V1 pyramidal neurons: retinotopically offset (surround) visual stimuli drive local dendritic calcium signals indicative of regenerative events, and two-photon optogenetic activation of LM neurons projecting to identified feedback-recipient spines in V1 can drive similar branch-specific local calcium signals. Our results show how neocortical feedback connectivity and nonlinear dendritic integration can together form a substrate to support both predictive and cooperative contextual interactions
Multigranularity Representations for Human Inter-Actions: Pose, Motion and Intention
Tracking people and their body pose in videos is a central problem in computer vision. Standard tracking representations reason about temporal coherence of detected people and body parts. They have difficulty tracking targets under partial occlusions or rare body poses, where detectors often fail, since the number of training examples is often too small to deal with the exponential variability of such configurations.
We propose tracking representations that track and segment people and their body pose in videos by exploiting information at multiple detection and segmentation granularities when available, whole body, parts or point trajectories.
Detections and motion estimates provide contradictory information in case of false alarm detections or leaking motion affinities. We consolidate contradictory information via graph steering, an algorithm for simultaneous detection and co-clustering in a two-granularity graph of motion trajectories and detections, that corrects motion leakage between correctly detected objects, while being robust to false alarms or spatially inaccurate detections.
We first present a motion segmentation framework that exploits long range motion of point trajectories and large spatial support of image regions.
We show resulting video segments adapt to targets under partial occlusions and deformations.
Second, we augment motion-based representations with object detection for dealing with motion leakage. We demonstrate how to combine dense optical flow trajectory affinities with repulsions from confident detections to reach a global consensus of detection and tracking in crowded scenes.
Third, we study human motion and pose estimation.
We segment hard to detect, fast moving body limbs from their surrounding clutter and match them against pose exemplars to detect body pose under fast motion. We employ on-the-fly human body kinematics to improve tracking of body joints under wide deformations.
We use motion segmentability of body parts for re-ranking a set of body joint candidate trajectories and jointly infer multi-frame body pose and video segmentation.
We show empirically that such multi-granularity tracking representation is worthwhile, obtaining significantly more accurate multi-object tracking and detailed body pose estimation in popular datasets
Evaluation of the effectiveness of simple nuclei-segmentation methods on Caenorhabditis elegans embryogenesis images
BACKGROUND: For the analysis of spatio-temporal dynamics, various automated processing methods have been developed for nuclei segmentation. These methods tend to be complex for segmentation of images with crowded nuclei, preventing the simple reapplication of the methods to other problems. Thus, it is useful to evaluate the ability of simple methods to segment images with various degrees of crowded nuclei. RESULTS: Here, we selected six simple methods from various watershed based and local maxima detection based methods that are frequently used for nuclei segmentation, and evaluated their segmentation accuracy for each developmental stage of the Caenorhabditis elegans. We included a 4D noise filter, in addition to 2D and 3D noise filters, as a pre-processing step to evaluate the potential of simple methods as widely as possible. By applying the methods to image data between the 50- to 500-cell developmental stages at 50-cell intervals, the error rate for nuclei detection could be reduced to ≤ 2.1% at every stage until the 350-cell stage. The fractions of total errors throughout the stages could be reduced to ≤ 2.4%. The error rates improved at most of the stages and the total errors improved when a 4D noise filter was used. The methods with the least errors were two watershed-based methods with 4D noise filters. For all the other methods, the error rate and the fraction of errors could be reduced to ≤ 4.2% and ≤ 4.1%, respectively. The minimum error rate for each stage between the 400- to 500-cell stages ranged from 6.0% to 8.4%. However, similarities between the computational and manual segmentations measured by volume overlap and Hausdorff distance were not good. The methods were also applied to Drosophila and zebrafish embryos and found to be effective. CONCLUSIONS: The simple segmentation methods were found to be useful for detecting nuclei until the 350-cell stage, but not very useful after the 400-cell stage. The incorporation of a 4D noise filter to the simple methods could improve their performances. Error types and the temporal biases of errors were dependent on the methods used. Combining multiple simple methods could also give good segmentations
Adaptive visual sampling
PhDVarious visual tasks may be analysed in the context of sampling from the visual field. In visual
psychophysics, human visual sampling strategies have often been shown at a high-level to
be driven by various information and resource related factors such as the limited capacity of
the human cognitive system, the quality of information gathered, its relevance in context and
the associated efficiency of recovering it. At a lower-level, we interpret many computer vision
tasks to be rooted in similar notions of contextually-relevant, dynamic sampling strategies
which are geared towards the filtering of pixel samples to perform reliable object association. In
the context of object tracking, the reliability of such endeavours is fundamentally rooted in the
continuing relevance of object models used for such filtering, a requirement complicated by realworld
conditions such as dynamic lighting that inconveniently and frequently cause their rapid
obsolescence. In the context of recognition, performance can be hindered by the lack of learned
context-dependent strategies that satisfactorily filter out samples that are irrelevant or blunt the
potency of models used for discrimination. In this thesis we interpret the problems of visual
tracking and recognition in terms of dynamic spatial and featural sampling strategies and, in this
vein, present three frameworks that build on previous methods to provide a more flexible and
effective approach.
Firstly, we propose an adaptive spatial sampling strategy framework to maintain statistical object
models for real-time robust tracking under changing lighting conditions. We employ colour
features in experiments to demonstrate its effectiveness. The framework consists of five parts:
(a) Gaussian mixture models for semi-parametric modelling of the colour distributions of multicolour
objects; (b) a constructive algorithm that uses cross-validation for automatically determining
the number of components for a Gaussian mixture given a sample set of object colours; (c) a
sampling strategy for performing fast tracking using colour models; (d) a Bayesian formulation
enabling models of object and the environment to be employed together in filtering samples by
discrimination; and (e) a selectively-adaptive mechanism to enable colour models to cope with
changing conditions and permit more robust tracking.
Secondly, we extend the concept to an adaptive spatial and featural sampling strategy to deal
with very difficult conditions such as small target objects in cluttered environments undergoing
severe lighting fluctuations and extreme occlusions. This builds on previous work on dynamic
feature selection during tracking by reducing redundancy in features selected at each stage as
well as more naturally balancing short-term and long-term evidence, the latter to facilitate model
rigidity under sharp, temporary changes such as occlusion whilst permitting model flexibility
under slower, long-term changes such as varying lighting conditions. This framework consists of
two parts: (a) Attribute-based Feature Ranking (AFR) which combines two attribute measures;
discriminability and independence to other features; and (b) Multiple Selectively-adaptive Feature
Models (MSFM) which involves maintaining a dynamic feature reference of target object
appearance. We call this framework Adaptive Multi-feature Association (AMA). Finally, we present an adaptive spatial and featural sampling strategy that extends established
Local Binary Pattern (LBP) methods and overcomes many severe limitations of the traditional
approach such as limited spatial support, restricted sample sets and ad hoc joint and disjoint statistical
distributions that may fail to capture important structure. Our framework enables more
compact, descriptive LBP type models to be constructed which may be employed in conjunction
with many existing LBP techniques to improve their performance without modification. The
framework consists of two parts: (a) a new LBP-type model known as Multiscale Selected Local
Binary Features (MSLBF); and (b) a novel binary feature selection algorithm called Binary Histogram
Intersection Minimisation (BHIM) which is shown to be more powerful than established
methods used for binary feature selection such as Conditional Mutual Information Maximisation
(CMIM) and AdaBoost
- …