4,280 research outputs found
Hybrid image representation methods for automatic image annotation: a survey
In most automatic image annotation systems, images are represented with low level features using either global
methods or local methods. In global methods, the entire image is used as a unit. Local methods divide images into blocks where fixed-size sub-image blocks are adopted as sub-units; or into regions by using segmented regions as sub-units in images. In contrast to typical automatic image annotation methods that use either global or local features exclusively, several recent methods have considered incorporating the two kinds of information, and believe that the combination of the two levels of features is
beneficial in annotating images. In this paper, we provide a
survey on automatic image annotation techniques according to
one aspect: feature extraction, and, in order to complement
existing surveys in literature, we focus on the emerging image annotation methods: hybrid methods that combine both global and local features for image representation
An introduction to time-resolved decoding analysis for M/EEG
The human brain is constantly processing and integrating information in order
to make decisions and interact with the world, for tasks from recognizing a
familiar face to playing a game of tennis. These complex cognitive processes
require communication between large populations of neurons. The non-invasive
neuroimaging methods of electroencephalography (EEG) and magnetoencephalography
(MEG) provide population measures of neural activity with millisecond precision
that allow us to study the temporal dynamics of cognitive processes. However,
multi-sensor M/EEG data is inherently high dimensional, making it difficult to
parse important signal from noise. Multivariate pattern analysis (MVPA) or
"decoding" methods offer vast potential for understanding high-dimensional
M/EEG neural data. MVPA can be used to distinguish between different conditions
and map the time courses of various neural processes, from basic sensory
processing to high-level cognitive processes. In this chapter, we discuss the
practical aspects of performing decoding analyses on M/EEG data as well as the
limitations of the method, and then we discuss some applications for
understanding representational dynamics in the human brain
Multi-View Region Adaptive Multi-temporal DMM and RGB Action Recognition
Human action recognition remains an important yet challenging task. This work
proposes a novel action recognition system. It uses a novel Multiple View
Region Adaptive Multi-resolution in time Depth Motion Map (MV-RAMDMM)
formulation combined with appearance information. Multiple stream 3D
Convolutional Neural Networks (CNNs) are trained on the different views and
time resolutions of the region adaptive Depth Motion Maps. Multiple views are
synthesised to enhance the view invariance. The region adaptive weights, based
on localised motion, accentuate and differentiate parts of actions possessing
faster motion. Dedicated 3D CNN streams for multi-time resolution appearance
information (RGB) are also included. These help to identify and differentiate
between small object interactions. A pre-trained 3D-CNN is used here with
fine-tuning for each stream along with multiple class Support Vector Machines
(SVM)s. Average score fusion is used on the output. The developed approach is
capable of recognising both human action and human-object interaction. Three
public domain datasets including: MSR 3D Action,Northwestern UCLA multi-view
actions and MSR 3D daily activity are used to evaluate the proposed solution.
The experimental results demonstrate the robustness of this approach compared
with state-of-the-art algorithms.Comment: 14 pages, 6 figures, 13 tables. Submitte
Convolutional Feature Masking for Joint Object and Stuff Segmentation
The topic of semantic segmentation has witnessed considerable progress due to
the powerful features learned by convolutional neural networks (CNNs). The
current leading approaches for semantic segmentation exploit shape information
by extracting CNN features from masked image regions. This strategy introduces
artificial boundaries on the images and may impact the quality of the extracted
features. Besides, the operations on the raw image domain require to compute
thousands of networks on a single image, which is time-consuming. In this
paper, we propose to exploit shape information via masking convolutional
features. The proposal segments (e.g., super-pixels) are treated as masks on
the convolutional feature maps. The CNN features of segments are directly
masked out from these maps and used to train classifiers for recognition. We
further propose a joint method to handle objects and "stuff" (e.g., grass, sky,
water) in the same framework. State-of-the-art results are demonstrated on
benchmarks of PASCAL VOC and new PASCAL-CONTEXT, with a compelling
computational speed.Comment: IEEE Conference on Computer Vision and Pattern Recognition (CVPR),
201
A novel application of deep learning with image cropping: a smart city use case for flood monitoring
© 2020, The Author(s). Event monitoring is an essential application of Smart City platforms. Real-time monitoring of gully and drainage blockage is an important part of flood monitoring applications. Building viable IoT sensors for detecting blockage is a complex task due to the limitations of deploying such sensors in situ. Image classification with deep learning is a potential alternative solution. However, there are no image datasets of gullies and drainages. We were faced with such challenges as part of developing a flood monitoring application in a European Union-funded project. To address these issues, we propose a novel image classification approach based on deep learning with an IoT-enabled camera to monitor gullies and drainages. This approach utilises deep learning to develop an effective image classification model to classify blockage images into different class labels based on the severity. In order to handle the complexity of video-based images, and subsequent poor classification accuracy of the model, we have carried out experiments with the removal of image edges by applying image cropping. The process of cropping in our proposed experimentation is aimed to concentrate only on the regions of interest within images, hence leaving out some proportion of image edges. An image dataset from crowd-sourced publicly accessible images has been curated to train and test the proposed model. For validation, model accuracies were compared considering model with and without image cropping. The cropping-based image classification showed improvement in the classification accuracy. This paper outlines the lessons from our experimentation that have a wider impact on many similar use cases involving IoT-based cameras as part of smart city event monitoring platforms
- âŠ