963 research outputs found
Borrow from Anywhere: Pseudo Multi-modal Object Detection in Thermal Imagery
Can we improve detection in the thermal domain by borrowing features from
rich domains like visual RGB? In this paper, we propose a pseudo-multimodal
object detector trained on natural image domain data to help improve the
performance of object detection in thermal images. We assume access to a
large-scale dataset in the visual RGB domain and relatively smaller dataset (in
terms of instances) in the thermal domain, as is common today. We propose the
use of well-known image-to-image translation frameworks to generate pseudo-RGB
equivalents of a given thermal image and then use a multi-modal architecture
for object detection in the thermal image. We show that our framework
outperforms existing benchmarks without the explicit need for paired training
examples from the two domains. We also show that our framework has the ability
to learn with less data from thermal domain when using our approach. Our code
and pre-trained models are made available at
https://github.com/tdchaitanya/MMTODComment: Accepted at Perception Beyond Visible Spectrum Workshop, CVPR 201
Complex Human Action Recognition in Live Videos Using Hybrid FR-DL Method
Automated human action recognition is one of the most attractive and
practical research fields in computer vision, in spite of its high
computational costs. In such systems, the human action labelling is based on
the appearance and patterns of the motions in the video sequences; however, the
conventional methodologies and classic neural networks cannot use temporal
information for action recognition prediction in the upcoming frames in a video
sequence. On the other hand, the computational cost of the preprocessing stage
is high. In this paper, we address challenges of the preprocessing phase, by an
automated selection of representative frames among the input sequences.
Furthermore, we extract the key features of the representative frame rather
than the entire features. We propose a hybrid technique using background
subtraction and HOG, followed by application of a deep neural network and
skeletal modelling method. The combination of a CNN and the LSTM recursive
network is considered for feature selection and maintaining the previous
information, and finally, a Softmax-KNN classifier is used for labelling human
activities. We name our model as Feature Reduction & Deep Learning based action
recognition method, or FR-DL in short. To evaluate the proposed method, we use
the UCF dataset for the benchmarking which is widely-used among researchers in
action recognition research. The dataset includes 101 complicated activities in
the wild. Experimental results show a significant improvement in terms of
accuracy and speed in comparison with six state-of-the-art articles
Illumination-Based Data Augmentation for Robust Background Subtraction
A core challenge in background subtraction (BGS) is handling videos with sudden illumination changes in consecutive frames. In this paper, we tackle the problem from a data point-of-view using data augmentation. Our method performs data augmentation that not only creates endless data on the fly, but also features semantic transformations of illumination which enhance the generalisation of the model. It successfully simulates flashes and shadows by applying the Euclidean distance transform over a binary mask generated randomly. Such data allows us to effectively train an illumination-invariant deep learning model for BGS. Experimental results demonstrate the contribution of the synthetics in the ability of the models to perform BGS even when significant illumination changes take place
- …