Search CORE

963 research outputs found

Borrow from Anywhere: Pseudo Multi-modal Object Detection in Thermal Imagery

Author: Akolekar Ninad
Balasubramanian Vineeth N
Devaguptapu Chaitanya
Sharma Manuj M
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 15/07/2020
Field of study

Can we improve detection in the thermal domain by borrowing features from rich domains like visual RGB? In this paper, we propose a pseudo-multimodal object detector trained on natural image domain data to help improve the performance of object detection in thermal images. We assume access to a large-scale dataset in the visual RGB domain and relatively smaller dataset (in terms of instances) in the thermal domain, as is common today. We propose the use of well-known image-to-image translation frameworks to generate pseudo-RGB equivalents of a given thermal image and then use a multi-modal architecture for object detection in the thermal image. We show that our framework outperforms existing benchmarks without the explicit need for paired training examples from the two domains. We also show that our framework has the ability to learn with less data from thermal domain when using our approach. Our code and pre-trained models are made available at https://github.com/tdchaitanya/MMTODComment: Accepted at Perception Beyond Visible Spectrum Workshop, CVPR 201

arXiv.org e-Print Archive

Crossref

Complex Human Action Recognition in Live Videos Using Hybrid FR-DL Method

Author: Rezaei Mahdi
Serpush Fatemeh
Publication venue
Publication date: 01/01/2020
Field of study

Automated human action recognition is one of the most attractive and practical research fields in computer vision, in spite of its high computational costs. In such systems, the human action labelling is based on the appearance and patterns of the motions in the video sequences; however, the conventional methodologies and classic neural networks cannot use temporal information for action recognition prediction in the upcoming frames in a video sequence. On the other hand, the computational cost of the preprocessing stage is high. In this paper, we address challenges of the preprocessing phase, by an automated selection of representative frames among the input sequences. Furthermore, we extract the key features of the representative frame rather than the entire features. We propose a hybrid technique using background subtraction and HOG, followed by application of a deep neural network and skeletal modelling method. The combination of a CNN and the LSTM recursive network is considered for feature selection and maintaining the previous information, and finally, a Softmax-KNN classifier is used for labelling human activities. We name our model as Feature Reduction & Deep Learning based action recognition method, or FR-DL in short. To evaluate the proposed method, we use the UCF dataset for the benchmarking which is widely-used among researchers in action recognition research. The dataset includes 101 complicated activities in the wild. Experimental results show a significant improvement in terms of accuracy and speed in comparison with six state-of-the-art articles

arXiv.org e-Print Archive

Crossref

Occupancy Analysis of the Outdoor Football Fields

Author: Huda Noor Ul
Publication venue: Aalborg Universitetsforlag
Publication date: 01/01/2023
Field of study

VBN

Illumination-Based Data Augmentation for Robust Background Subtraction

Author: Ho Edmond
Sakkos Dimitrios
Shum Hubert
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/08/2019
Field of study

A core challenge in background subtraction (BGS) is handling videos with sudden illumination changes in consecutive frames. In this paper, we tackle the problem from a data point-of-view using data augmentation. Our method performs data augmentation that not only creates endless data on the fly, but also features semantic transformations of illumination which enhance the generalisation of the model. It successfully simulates flashes and shadows by applying the Euclidean distance transform over a binary mask generated randomly. Such data allows us to effectively train an illumination-invariant deep learning model for BGS. Experimental results demonstrate the contribution of the synthetics in the ability of the models to perform BGS even when significant illumination changes take place

arXiv.org e-Print Archive

Northumbria Research Link

Crossref