Multiple-model fully convolutional neural networks for single object tracking on thermal infrared video

Abstract

The availability of affordable thermal infrared (TIR) camera has instigated its usage in various research fields, especially for the cases that require images to be captured in dark surroundings. One of the low-level tasks required by most TIR-based researches is the need to track an object throughout a video sequence. The main challenge posed by TIR camera usage is the lack of texture to differentiate two nearby objects of the same class. According to the VOT-TIR 2016 challenge, the best fully convolutional neural network (FCNN)-based tracker has only managed to obtain the third place. The discriminative ability of the FCNN tracker is not fully utilized because of the homogenous appearance pattern of the tracked object. This paper aims to improve FCNN-based tracker ability to predict object location through comprehensive sampling approach as well as better scoring scheme. Hence, a multiple-model FCNN is proposed, in which a small set of fully connected layers is updated on the top of pre-trained convolutional neural networks. The possible object locations are generated based on a two-stage sampling that combines stochastically distributed samples and clustered foreground contour information. The best sample is selected according to a combined score of appearance similarity, predicted location, and model reliability. The small set of appearance models is updated by using positive and negative training samples, accumulated from two periods of time which are the recent and parent node intervals. To further improve training accuracy, the samples are generated according to a set of adaptive variances that depends on the trustworthiness of the tracker output. The results show an improvement over TCNN, an FCNN-based tracker that won the VOT 2016 challenge with the expected average overlap increasing from 0.248 to 0.257. The performance enhancement is attributed to the better robustness with a 20% reduction in tracking failure rate compared to the TCNN

    Similar works