1,313 research outputs found
A computational visual saliency model for images.
Human eyes receive an enormous amount of information from the visual world. It is highly difficult to simultaneously process this excessive information for the human brain. Hence the human visual system will selectively process the incoming information by attending only the relevant regions of interest in a scene. Visual saliency characterises some parts of a scene that appears to stand out from its neighbouring regions and attracts the human gaze. Modelling saliency-based visual attention has been an active research area in recent years. Saliency models have found vital importance in many areas of computer vision tasks such as image and video compression, object segmentation, target tracking, remote sensing and robotics. Many of these applications deal with high-resolution images and real-time videos and it is a challenge to process this excessive amount of information with limited computational resources. Employing saliency models in these applications will limit the processing of irrelevant information and further will improve their efficiency and performance. Therefore, a saliency model with good prediction accuracy and low computation time is highly essential. This thesis presents a low-computation wavelet-based visual saliency model designed to predict the regions of human eye fixations in images. The proposed model uses two-channel information luminance (Y) and chrominance (Cr) in YCbCr colour space for saliency computation. These two channels are decomposed to their lowest resolution using two-dimensional Discrete Wavelet Transform (DWT) to extract the local contrast features at multiple scales. The extracted local contrast features are integrated at multiple levels using a two-dimensional entropy-based feature combination scheme to derive a combined map. The combined map is normalized and enhanced using natural logarithm transformation to derive a final saliency map. The performance of the model has been evaluated qualitatively and quantitatively using two large benchmark image datasets. The experimental results show that the proposed model has achieved better prediction accuracy both qualitatively and quantitatively with a significant reduction in computation time when compared to the existing benchmark models. It has achieved nearly 25% computational savings when compared to the benchmark model with the lowest computation time
Change blindness: eradication of gestalt strategies
Arrays of eight, texture-defined rectangles were used as stimuli in a one-shot change blindness (CB) task where there was a 50% chance that one rectangle would change orientation between two successive presentations separated by an interval. CB was eliminated by cueing the target rectangle in the first stimulus, reduced by cueing in the interval and unaffected by cueing in the second presentation. This supports the idea that a representation was formed that persisted through the interval before being 'overwritten' by the second presentation (Landman et al, 2003 Vision Research 43149–164]. Another possibility is that participants used some kind of grouping or Gestalt strategy. To test this we changed the spatial position of the rectangles in the second presentation by shifting them along imaginary spokes (by ±1 degree) emanating from the central fixation point. There was no significant difference seen in performance between this and the standard task [F(1,4)=2.565, p=0.185]. This may suggest two things: (i) Gestalt grouping is not used as a strategy in these tasks, and (ii) it gives further weight to the argument that objects may be stored and retrieved from a pre-attentional store during this task
A vision system for mobile maritime surveillance platforms
Mobile surveillance systems play an important role to minimise security and safety threats in high-risk or hazardous environments. Providing a mobile marine surveillance platform with situational awareness of its environment is important for mission success. An essential part of situational awareness is the ability to detect and subsequently track potential target objects.Typically, the exact type of target objects is unknown, hence detection is addressed as a problem of finding parts of an image that stand out in relation to their surrounding regions or are atypical to the domain. Contrary to existing saliency methods, this thesis proposes the use of a domain specific visual attention approach for detecting potential regions of interest in maritime imagery. For this, low-level features that are indicative of maritime targets are identified. These features are then evaluated with respect to their local, regional, and global significance. Together with a domain specific background segmentation technique, the features are combined in a Bayesian classifier to direct visual attention to potential target objects.The maritime environment introduces challenges to the camera system: gusts, wind, swell, or waves can cause the platform to move drastically and unpredictably. Pan-tilt-zoom cameras that are often utilised for surveillance tasks can adjusting their orientation to provide a stable view onto the target. However, in rough maritime environments this requires high-speed and precise inputs. In contrast, omnidirectional cameras provide a full spherical view, which allows the acquisition and tracking of multiple targets at the same time. However, the target itself only occupies a small fraction of the overall view. This thesis proposes a novel, target-centric approach for image stabilisation. A virtual camera is extracted from the omnidirectional view for each target and is adjusted based on the measurements of an inertial measurement unit and an image feature tracker. The combination of these two techniques in a probabilistic framework allows for stabilisation of rotational and translational ego-motion. Furthermore, it has the specific advantage of being robust to loosely calibrated and synchronised hardware since the fusion of tracking and stabilisation means that tracking uncertainty can be used to compensate for errors in calibration and synchronisation. This then completely eliminates the need for tedious calibration phases and the adverse effects of assembly slippage over time.Finally, this thesis combines the visual attention and omnidirectional stabilisation frameworks and proposes a multi view tracking system that is capable of detecting potential target objects in the maritime domain. Although the visual attention framework performed well on the benchmark datasets, the evaluation on real-world maritime imagery produced a high number of false positives. An investigation reveals that the problem is that benchmark data sets are unconsciously being influenced by human shot selection, which greatly simplifies the problem of visual attention. Despite the number of false positives, the tracking approach itself is robust even if a high number of false positives are tracked
Aggregation signature for small object tracking
Small object tracking becomes an increasingly important task, which however
has been largely unexplored in computer vision. The great challenges stem from
the facts that: 1) small objects show extreme vague and variable appearances,
and 2) they tend to be lost easier as compared to normal-sized ones due to the
shaking of lens. In this paper, we propose a novel aggregation signature
suitable for small object tracking, especially aiming for the challenge of
sudden and large drift. We make three-fold contributions in this work. First,
technically, we propose a new descriptor, named aggregation signature, based on
saliency, able to represent highly distinctive features for small objects.
Second, theoretically, we prove that the proposed signature matches the
foreground object more accurately with a high probability. Third,
experimentally, the aggregation signature achieves a high performance on
multiple datasets, outperforming the state-of-the-art methods by large margins.
Moreover, we contribute with two newly collected benchmark datasets, i.e.,
small90 and small112, for visually small object tracking. The datasets will be
available in https://github.com/bczhangbczhang/.Comment: IEEE Transactions on Image Processing, 201
Correlation Filters for Unmanned Aerial Vehicle-Based Aerial Tracking: A Review and Experimental Evaluation
Aerial tracking, which has exhibited its omnipresent dedication and splendid
performance, is one of the most active applications in the remote sensing
field. Especially, unmanned aerial vehicle (UAV)-based remote sensing system,
equipped with a visual tracking approach, has been widely used in aviation,
navigation, agriculture,transportation, and public security, etc. As is
mentioned above, the UAV-based aerial tracking platform has been gradually
developed from research to practical application stage, reaching one of the
main aerial remote sensing technologies in the future. However, due to the
real-world onerous situations, e.g., harsh external challenges, the vibration
of the UAV mechanical structure (especially under strong wind conditions), the
maneuvering flight in complex environment, and the limited computation
resources onboard, accuracy, robustness, and high efficiency are all crucial
for the onboard tracking methods. Recently, the discriminative correlation
filter (DCF)-based trackers have stood out for their high computational
efficiency and appealing robustness on a single CPU, and have flourished in the
UAV visual tracking community. In this work, the basic framework of the
DCF-based trackers is firstly generalized, based on which, 23 state-of-the-art
DCF-based trackers are orderly summarized according to their innovations for
solving various issues. Besides, exhaustive and quantitative experiments have
been extended on various prevailing UAV tracking benchmarks, i.e., UAV123,
UAV123@10fps, UAV20L, UAVDT, DTB70, and VisDrone2019-SOT, which contain 371,903
frames in total. The experiments show the performance, verify the feasibility,
and demonstrate the current challenges of DCF-based trackers onboard UAV
tracking.Comment: 28 pages, 10 figures, submitted to GRS
Cortical Dynamics of Contextually-Cued Attentive Visual Learning and Search: Spatial and Object Evidence Accumulation
How do humans use predictive contextual information to facilitate visual search? How are consistently paired scenic objects and positions learned and used to more efficiently guide search in familiar scenes? For example, a certain combination of objects can define a context for a kitchen and trigger a more efficient search for a typical object, such as a sink, in that context. A neural model, ARTSCENE Search, is developed to illustrate the neural mechanisms of such memory-based contextual learning and guidance, and to explain challenging behavioral data on positive/negative, spatial/object, and local/distant global cueing effects during visual search. The model proposes how global scene layout at a first glance rapidly forms a hypothesis about the target location. This hypothesis is then incrementally refined by enhancing target-like objects in space as a scene is scanned with saccadic eye movements. The model clarifies the functional roles of neuroanatomical, neurophysiological, and neuroimaging data in visual search for a desired goal object. In particular, the model simulates the interactive dynamics of spatial and object contextual cueing in the cortical What and Where streams starting from early visual areas through medial temporal lobe to prefrontal cortex. After learning, model dorsolateral prefrontal cortical cells (area 46) prime possible target locations in posterior parietal cortex based on goalmodulated percepts of spatial scene gist represented in parahippocampal cortex, whereas model ventral prefrontal cortical cells (area 47/12) prime possible target object representations in inferior temporal cortex based on the history of viewed objects represented in perirhinal cortex. The model hereby predicts how the cortical What and Where streams cooperate during scene perception, learning, and memory to accumulate evidence over time to drive efficient visual search of familiar scenes.CELEST, an NSF Science of Learning Center (SBE-0354378); SyNAPSE program of Defense Advanced Research Projects Agency (HR0011-09-3-0001, HR0011-09-C-0011
RGB-D Salient Object Detection: A Survey
Salient object detection (SOD), which simulates the human visual perception
system to locate the most attractive object(s) in a scene, has been widely
applied to various computer vision tasks. Now, with the advent of depth
sensors, depth maps with affluent spatial information that can be beneficial in
boosting the performance of SOD, can easily be captured. Although various RGB-D
based SOD models with promising performance have been proposed over the past
several years, an in-depth understanding of these models and challenges in this
topic remains lacking. In this paper, we provide a comprehensive survey of
RGB-D based SOD models from various perspectives, and review related benchmark
datasets in detail. Further, considering that the light field can also provide
depth maps, we review SOD models and popular benchmark datasets from this
domain as well. Moreover, to investigate the SOD ability of existing models, we
carry out a comprehensive evaluation, as well as attribute-based evaluation of
several representative RGB-D based SOD models. Finally, we discuss several
challenges and open directions of RGB-D based SOD for future research. All
collected models, benchmark datasets, source code links, datasets constructed
for attribute-based evaluation, and codes for evaluation will be made publicly
available at https://github.com/taozh2017/RGBDSODsurveyComment: 24 pages, 12 figures. Has been accepted by Computational Visual Medi
- …