17 research outputs found

    Optimisation of a Siamese Neural Network for Real-Time Energy Efficient Object Tracking

    Full text link
    In this paper the research on optimisation of visual object tracking using a Siamese neural network for embedded vision systems is presented. It was assumed that the solution shall operate in real-time, preferably for a high resolution video stream, with the lowest possible energy consumption. To meet these requirements, techniques such as the reduction of computational precision and pruning were considered. Brevitas, a tool dedicated for optimisation and quantisation of neural networks for FPGA implementation, was used. A number of training scenarios were tested with varying levels of optimisations - from integer uniform quantisation with 16 bits to ternary and binary networks. Next, the influence of these optimisations on the tracking performance was evaluated. It was possible to reduce the size of the convolutional filters up to 10 times in relation to the original network. The obtained results indicate that using quantisation can significantly reduce the memory and computational complexity of the proposed network while still enabling precise tracking, thus allow to use it in embedded vision systems. Moreover, quantisation of weights positively affects the network training by decreasing overfitting.Comment: 12 pages, accepted for ICCVG 202

    Explicit Visual Prompts for Visual Object Tracking

    Full text link
    How to effectively exploit spatio-temporal information is crucial to capture target appearance changes in visual tracking. However, most deep learning-based trackers mainly focus on designing a complicated appearance model or template updating strategy, while lacking the exploitation of context between consecutive frames and thus entailing the \textit{when-and-how-to-update} dilemma. To address these issues, we propose a novel explicit visual prompts framework for visual tracking, dubbed \textbf{EVPTrack}. Specifically, we utilize spatio-temporal tokens to propagate information between consecutive frames without focusing on updating templates. As a result, we cannot only alleviate the challenge of \textit{when-to-update}, but also avoid the hyper-parameters associated with updating strategies. Then, we utilize the spatio-temporal tokens to generate explicit visual prompts that facilitate inference in the current frame. The prompts are fed into a transformer encoder together with the image tokens without additional processing. Consequently, the efficiency of our model is improved by avoiding \textit{how-to-update}. In addition, we consider multi-scale information as explicit visual prompts, providing multiscale template features to enhance the EVPTrack's ability to handle target scale changes. Extensive experimental results on six benchmarks (i.e., LaSOT, LaSOT\rm ext_{ext}, GOT-10k, UAV123, TrackingNet, and TNL2K.) validate that our EVPTrack can achieve competitive performance at a real-time speed by effectively exploiting both spatio-temporal and multi-scale information. Code and models are available at https://github.com/GXNU-ZhongLab/EVPTrack

    SiamLST: Learning Spatial and Channel-wise Transform for Visual Tracking

    Get PDF
    Siamese network based trackers regard visual tracking as a similarity matching task between the target template and search region patches, and achieve a good balance between accuracy and speed in recent years. However, existing trackers do not effectively exploit the spatial and inter-channel cues, which lead to the redundancy of pre-trained model parameters. In this paper, we design a novel visual tracker based on a Learnable Spatial and Channel-wise Transform in Siamese network (SiamLST). The SiamLST tracker includes a powerful feature extraction backbone and an efficient cross-correlation method. The proposed algorithm takes full advantages of CNN and the learnable sparse transform module to represent the template and search patches, which effectively exploit the spatial and channel-wise correlations to deal with complicated scenarios, such as motion blur, in-plane rotation and partial occlusion. Experimental results conducted on multiple tracking benchmarks including OTB2015, VOT2016, GOT-10k and VOT2018 demonstrate that the proposed SiamLST has excellent tracking performances
    corecore