17 research outputs found
Optimisation of a Siamese Neural Network for Real-Time Energy Efficient Object Tracking
In this paper the research on optimisation of visual object tracking using a
Siamese neural network for embedded vision systems is presented. It was assumed
that the solution shall operate in real-time, preferably for a high resolution
video stream, with the lowest possible energy consumption. To meet these
requirements, techniques such as the reduction of computational precision and
pruning were considered. Brevitas, a tool dedicated for optimisation and
quantisation of neural networks for FPGA implementation, was used. A number of
training scenarios were tested with varying levels of optimisations - from
integer uniform quantisation with 16 bits to ternary and binary networks. Next,
the influence of these optimisations on the tracking performance was evaluated.
It was possible to reduce the size of the convolutional filters up to 10 times
in relation to the original network. The obtained results indicate that using
quantisation can significantly reduce the memory and computational complexity
of the proposed network while still enabling precise tracking, thus allow to
use it in embedded vision systems. Moreover, quantisation of weights positively
affects the network training by decreasing overfitting.Comment: 12 pages, accepted for ICCVG 202
Explicit Visual Prompts for Visual Object Tracking
How to effectively exploit spatio-temporal information is crucial to capture
target appearance changes in visual tracking. However, most deep learning-based
trackers mainly focus on designing a complicated appearance model or template
updating strategy, while lacking the exploitation of context between
consecutive frames and thus entailing the \textit{when-and-how-to-update}
dilemma. To address these issues, we propose a novel explicit visual prompts
framework for visual tracking, dubbed \textbf{EVPTrack}. Specifically, we
utilize spatio-temporal tokens to propagate information between consecutive
frames without focusing on updating templates. As a result, we cannot only
alleviate the challenge of \textit{when-to-update}, but also avoid the
hyper-parameters associated with updating strategies. Then, we utilize the
spatio-temporal tokens to generate explicit visual prompts that facilitate
inference in the current frame. The prompts are fed into a transformer encoder
together with the image tokens without additional processing. Consequently, the
efficiency of our model is improved by avoiding \textit{how-to-update}. In
addition, we consider multi-scale information as explicit visual prompts,
providing multiscale template features to enhance the EVPTrack's ability to
handle target scale changes. Extensive experimental results on six benchmarks
(i.e., LaSOT, LaSOT\rm , GOT-10k, UAV123, TrackingNet, and TNL2K.)
validate that our EVPTrack can achieve competitive performance at a real-time
speed by effectively exploiting both spatio-temporal and multi-scale
information. Code and models are available at
https://github.com/GXNU-ZhongLab/EVPTrack
SiamLST: Learning Spatial and Channel-wise Transform for Visual Tracking
Siamese network based trackers regard visual tracking as a similarity matching task between the target template and search region patches, and achieve a good balance between accuracy and speed in recent years. However, existing trackers do not effectively exploit the spatial and inter-channel cues, which lead to the redundancy of pre-trained model parameters. In this paper, we design a novel visual tracker based on a Learnable Spatial and Channel-wise Transform in Siamese network (SiamLST). The SiamLST tracker includes a powerful feature extraction backbone and an efficient cross-correlation method. The proposed algorithm takes full advantages of CNN and the learnable sparse transform module to represent the template and search patches, which effectively exploit the spatial and channel-wise correlations to deal with complicated scenarios, such as motion blur, in-plane rotation and partial occlusion. Experimental results conducted on multiple tracking benchmarks including OTB2015, VOT2016, GOT-10k and VOT2018 demonstrate that the proposed SiamLST has excellent tracking performances