14,180 research outputs found
On the factors causing processing difficulty of multiple-scene displays
Multiplex viewing of static or dynamic scenes is an increasing feature of screen media. Most existing multiplex experiments have examined detection across increasing scene numbers, but currently no systematic evaluation of the factors that might produce difficulty in processing multiplexes exists. Across five experiments we provide such an evaluation. Experiment 1 characterises difficulty in change detection when the number of scenes is increased. Experiment 2 reveals that the increased difficulty across multiple-scene displays is caused by the total amount of visual information accounts for differences in change detection times, regardless of whether this information is presented across multiple scenes, or contained in one scene. Experiment 3 shows that whether quadrants of a display were drawn from the same, or different scenes did not affect change detection performance. Experiment 4 demonstrates that knowing which scene the change will occur in means participants can perform at monoplex level. Finally, Experiment 5 finds that changes of central interest in multiplexed scenes are detected far easier than marginal interest changes to such an extent that a centrally interesting object removal in nine screens is detected more rapidly than a marginally interesting object removal in four screens. Processing multiple-screen displays therefore seems dependent on the amount of information, and the importance of that information to the task, rather than simply the number of scenes in the display. We discuss the theoretical and applied implications of these findings
Fine-To-Coarse Global Registration of RGB-D Scans
RGB-D scanning of indoor environments is important for many applications,
including real estate, interior design, and virtual reality. However, it is
still challenging to register RGB-D images from a hand-held camera over a long
video sequence into a globally consistent 3D model. Current methods often can
lose tracking or drift and thus fail to reconstruct salient structures in large
environments (e.g., parallel walls in different rooms). To address this
problem, we propose a "fine-to-coarse" global registration algorithm that
leverages robust registrations at finer scales to seed detection and
enforcement of new correspondence and structural constraints at coarser scales.
To test global registration algorithms, we provide a benchmark with 10,401
manually-clicked point correspondences in 25 scenes from the SUN3D dataset.
During experiments with this benchmark, we find that our fine-to-coarse
algorithm registers long RGB-D sequences better than previous methods
SINet: A Scale-insensitive Convolutional Neural Network for Fast Vehicle Detection
Vision-based vehicle detection approaches achieve incredible success in
recent years with the development of deep convolutional neural network (CNN).
However, existing CNN based algorithms suffer from the problem that the
convolutional features are scale-sensitive in object detection task but it is
common that traffic images and videos contain vehicles with a large variance of
scales. In this paper, we delve into the source of scale sensitivity, and
reveal two key issues: 1) existing RoI pooling destroys the structure of small
scale objects, 2) the large intra-class distance for a large variance of scales
exceeds the representation capability of a single network. Based on these
findings, we present a scale-insensitive convolutional neural network (SINet)
for fast detecting vehicles with a large variance of scales. First, we present
a context-aware RoI pooling to maintain the contextual information and original
structure of small scale objects. Second, we present a multi-branch decision
network to minimize the intra-class distance of features. These lightweight
techniques bring zero extra time complexity but prominent detection accuracy
improvement. The proposed techniques can be equipped with any deep network
architectures and keep them trained end-to-end. Our SINet achieves
state-of-the-art performance in terms of accuracy and speed (up to 37 FPS) on
the KITTI benchmark and a new highway dataset, which contains a large variance
of scales and extremely small objects.Comment: Accepted by IEEE Transactions on Intelligent Transportation Systems
(T-ITS
Recommended from our members
Detection and classification of acoustic scenes and events: an IEEE AASP challenge
A neural network approach to audio-assisted movie dialogue detection
A novel framework for audio-assisted dialogue detection based on indicator functions and neural networks is investigated. An indicator function defines that an actor is present at a particular time instant. The cross-correlation function of a pair of indicator functions and the magnitude of the corresponding cross-power spectral density are fed as input to neural networks for dialogue detection. Several types of artificial neural networks, including multilayer perceptrons, voted perceptrons, radial basis function networks, support vector machines, and particle swarm optimization-based multilayer perceptrons are tested. Experiments are carried out to validate the feasibility of the aforementioned approach by using ground-truth indicator functions determined by human observers on 6 different movies. A total of 41 dialogue instances and another 20 non-dialogue instances is employed. The average detection accuracy achieved is high, ranging between 84.78%±5.499% and 91.43%±4.239%
Image Captioning and Classification of Dangerous Situations
Current robot platforms are being employed to collaborate with humans in a
wide range of domestic and industrial tasks. These environments require
autonomous systems that are able to classify and communicate anomalous
situations such as fires, injured persons, car accidents; or generally, any
potentially dangerous situation for humans. In this paper we introduce an
anomaly detection dataset for the purpose of robot applications as well as the
design and implementation of a deep learning architecture that classifies and
describes dangerous situations using only a single image as input. We report a
classification accuracy of 97 % and METEOR score of 16.2. We will make the
dataset publicly available after this paper is accepted
- …