14,304 research outputs found
Vision-based Real-Time Aerial Object Localization and Tracking for UAV Sensing System
The paper focuses on the problem of vision-based obstacle detection and
tracking for unmanned aerial vehicle navigation. A real-time object
localization and tracking strategy from monocular image sequences is developed
by effectively integrating the object detection and tracking into a dynamic
Kalman model. At the detection stage, the object of interest is automatically
detected and localized from a saliency map computed via the image background
connectivity cue at each frame; at the tracking stage, a Kalman filter is
employed to provide a coarse prediction of the object state, which is further
refined via a local detector incorporating the saliency map and the temporal
information between two consecutive frames. Compared to existing methods, the
proposed approach does not require any manual initialization for tracking, runs
much faster than the state-of-the-art trackers of its kind, and achieves
competitive tracking performance on a large number of image sequences.
Extensive experiments demonstrate the effectiveness and superior performance of
the proposed approach.Comment: 8 pages, 7 figure
Applying psychological science to the CCTV review process: a review of cognitive and ergonomic literature
As CCTV cameras are used more and more often to increase security in communities, police are spending a larger proportion of their resources, including time, in processing CCTV images when investigating crimes that have occurred (Levesley & Martin, 2005; Nichols, 2001). As with all tasks, there are ways to approach this task that will facilitate performance and other approaches that will degrade performance, either by increasing errors or by unnecessarily prolonging the process. A clearer understanding of psychological factors influencing the effectiveness of footage review will facilitate future training in best practice with respect to the review of CCTV footage. The goal of this report is to provide such understanding by reviewing research on footage review, research on related tasks that require similar skills, and experimental laboratory research about the cognitive skills underpinning the task. The report is organised to address five challenges to effectiveness of CCTV review: the effects of the degraded nature of CCTV footage, distractions and interrupts, the length of the task, inappropriate mindset, and variability in people’s abilities and experience. Recommendations for optimising CCTV footage review include (1) doing a cognitive task analysis to increase understanding of the ways in which performance might be limited, (2) exploiting technology advances to maximise the perceptual quality of the footage (3) training people to improve the flexibility of their mindset as they perceive and interpret the images seen, (4) monitoring performance either on an ongoing basis, by using psychophysiological measures of alertness, or periodically, by testing screeners’ ability to find evidence in footage developed for such testing, and (5) evaluating the relevance of possible selection tests to screen effective from ineffective screener
Are all the frames equally important?
In this work, we address the problem of measuring and predicting temporal
video saliency - a metric which defines the importance of a video frame for
human attention. Unlike the conventional spatial saliency which defines the
location of the salient regions within a frame (as it is done for still
images), temporal saliency considers importance of a frame as a whole and may
not exist apart from context. The proposed interface is an interactive
cursor-based algorithm for collecting experimental data about temporal
saliency. We collect the first human responses and perform their analysis. As a
result, we show that qualitatively, the produced scores have very explicit
meaning of the semantic changes in a frame, while quantitatively being highly
correlated between all the observers. Apart from that, we show that the
proposed tool can simultaneously collect fixations similar to the ones produced
by eye-tracker in a more affordable way. Further, this approach may be used for
creation of first temporal saliency datasets which will allow training
computational predictive algorithms. The proposed interface does not rely on
any special equipment, which allows to run it remotely and cover a wide
audience.Comment: CHI'20 Late Breaking Work
Saliency-aware Stereoscopic Video Retargeting
Stereo video retargeting aims to resize an image to a desired aspect ratio.
The quality of retargeted videos can be significantly impacted by the stereo
videos spatial, temporal, and disparity coherence, all of which can be impacted
by the retargeting process. Due to the lack of a publicly accessible annotated
dataset, there is little research on deep learning-based methods for stereo
video retargeting. This paper proposes an unsupervised deep learning-based
stereo video retargeting network. Our model first detects the salient objects
and shifts and warps all objects such that it minimizes the distortion of the
salient parts of the stereo frames. We use 1D convolution for shifting the
salient objects and design a stereo video Transformer to assist the retargeting
process. To train the network, we use the parallax attention mechanism to fuse
the left and right views and feed the retargeted frames to a reconstruction
module that reverses the retargeted frames to the input frames. Therefore, the
network is trained in an unsupervised manner. Extensive qualitative and
quantitative experiments and ablation studies on KITTI stereo 2012 and 2015
datasets demonstrate the efficiency of the proposed method over the existing
state-of-the-art methods. The code is available at
https://github.com/z65451/SVR/.Comment: 8 pages excluding references. CVPRW conferenc
Recommended from our members
Visual cognition during real social interaction
Copyright @ 2012 The Authors. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and 85 reproduction in any medium, provided the original author and source are credited. The article was made available through the Brunel University Open Access Publishing Fund.This article has been made available through the Brunel Open Access Publishing Fund.Laboratory studies of social visual cognition often simulate the critical aspects of joint attention by having participants interact with a computer-generated avatar. Recently, there has been a movement toward examining these processes during authentic social interaction. In this review, we will focus on attention to faces, attentional misdirection, and a phenomenon we have termed social inhibition of return (Social IOR), that have revealed aspects of social cognition that were hitherto unknown. We attribute these discoveries to the use of paradigms that allow for more realistic social interactions to take place. We also point to an area that has begun to attract a considerable amount of interest—that of Theory of Mind (ToM) and automatic perspective taking—and suggest that this too might benefit from adopting a similar approach
- …