11 research outputs found
Joint Group Feature Selection and Discriminative Filter Learning for Robust Visual Object Tracking
We propose a new Group Feature Selection method for Discriminative
Correlation Filters (GFS-DCF) based visual object tracking. The key innovation
of the proposed method is to perform group feature selection across both
channel and spatial dimensions, thus to pinpoint the structural relevance of
multi-channel features to the filtering system. In contrast to the widely used
spatial regularisation or feature selection methods, to the best of our
knowledge, this is the first time that channel selection has been advocated for
DCF-based tracking. We demonstrate that our GFS-DCF method is able to
significantly improve the performance of a DCF tracker equipped with deep
neural network features. In addition, our GFS-DCF enables joint feature
selection and filter learning, achieving enhanced discrimination and
interpretability of the learned filters.
To further improve the performance, we adaptively integrate historical
information by constraining filters to be smooth across temporal frames, using
an efficient low-rank approximation. By design, specific
temporal-spatial-channel configurations are dynamically learned in the tracking
process, highlighting the relevant features, and alleviating the performance
degrading impact of less discriminative representations and reducing
information redundancy. The experimental results obtained on OTB2013, OTB2015,
VOT2017, VOT2018 and TrackingNet demonstrate the merits of our GFS-DCF and its
superiority over the state-of-the-art trackers. The code is publicly available
at https://github.com/XU-TIANYANG/GFS-DCF
A Robust Structured Tracker Using Local Deep Features
Deep features extracted from convolutional neural networks have been recently utilized in visual tracking to obtain a generic and semantic representation of target candidates. In this paper, we propose a robust structured tracker using local deep features (STLDF). This tracker exploits the deep features of local patches inside target candidates and sparsely represents them by a set of templates in the particle filter framework. The proposed STLDF utilizes a new optimization model, which employs a group-sparsity regularization term to adopt local and spatial information of the target candidates and attain the spatial layout structure among them. To solve the optimization model, we propose an efficient and fast numerical algorithm that consists of two subproblems with the close-form solutions. Different evaluations in terms of success and precision on the benchmarks of challenging image sequences (e.g., OTB50 and OTB100) demonstrate the superior performance of the STLDF against several state-of-the-art trackers
Fast Robust Subspace Tracking via PCA in Sparse Data-Dependent Noise
This work studies the robust subspace tracking (ST) problem. Robust ST can be
simply understood as a (slow) time-varying subspace extension of robust PCA. It
assumes that the true data lies in a low-dimensional subspace that is either
fixed or changes slowly with time. The goal is to track the changing subspaces
over time in the presence of additive sparse outliers and to do this quickly
(with a short delay). We introduce a "fast" mini-batch robust ST solution that
is provably correct under mild assumptions. Here "fast" means two things: (i)
the subspace changes can be detected and the subspaces can be tracked with
near-optimal delay, and (ii) the time complexity of doing this is the same as
that of simple (non-robust) PCA. Our main result assumes piecewise constant
subspaces (needed for identifiability), but we also provide a corollary for the
case when there is a little change at each time.
A second contribution is a novel non-asymptotic guarantee for PCA in linearly
data-dependent noise. An important setting where this is useful is for linearly
data dependent noise that is sparse with support that changes enough over time.
The analysis of the subspace update step of our proposed robust ST solution
uses this result.Comment: To appear in IEEE Journal of Special Areas in Information Theor
Deep Learning and Optimization in Visual Target Tracking
Visual tracking is the process of estimating states of a moving object in a dynamic frame sequence. It has been considered as one of the most paramount and challenging topics in computer vision. Although numerous tracking methods have been introduced, developing a robust algorithm that can handle different challenges still remains unsolved. In this dissertation, we introduce four different trackers and evaluate their performance in terms of tracking accuracy on challenging frame sequences. Each of these trackers aims to address the drawbacks of their peers. The first developed method is called a structured multi-task multi-view tracking (SMTMVT) method, which exploits the sparse appearance model in the particle filter frame work to track targets under different challenges. Specifically, we extract features of the target candidates from different views and sparsely represent them by a linear combination of templates of different views. Unlike the conventional sparse trackers, SMTMVT not only jointly considers the relationship between different tasks and different views but also retains the structures among different views in a robust multi-task multi-view formulation. The second developed method is called a structured group local sparse tracker (SGLST), which exploits local patches inside target candidates in the particle filter framework. Unlike the conventional local sparse trackers, the proposed optimization model in SGLST not only adopts local and spatial information of the target candidates but also attains the spatial layout structure among them by employing a group-sparsity regularization term. To solve the optimization model, we propose an efficient numerical algorithm consisting of two subproblems with closed-form solutions. The third developed tracker is called a robust structured tracker using local deep features (STLDF). This tracker exploits the deep features of local patches inside target candidates and sparsely represents them by a set of templates in the particle filter framework. The proposed STLDF utilizes a new optimization model, which employs a group-sparsity regularization term to adopt local and spatial information of the target candidates and attain the spatial layout structure among them. To solve the optimization model, we adopt the alternating direction method of multiplier (ADMM) to design a fast and parallel numerical algorithm by deriving the augmented Lagrangian of the optimization model into two closed-form solution problems: the quadratic problem and the Euclidean norm projection onto probability simplex constraints problem. The fourth developed tracker is called an appearance variation adaptation (AVA) tracker, which aligns the feature distributions of target regions over time by learning an adaptation mask in an adversarial network. The proposed adversarial network consists of a generator and a discriminator network that compete with each other over optimizing a discriminator loss in a mini-max optimization problem. Specifically, the discriminator network aims to distinguish recent target regions from earlier ones by minimizing the discriminator loss, while the generator network aims to produce an adaptation mask to maximize the discriminator loss. We incorporate a gradient reverse layer in the adversarial network to solve the aforementioned mini-max optimization in an end-to-end manner. We compare the performance of the proposed four trackers with the most recent state-of-the-art trackers by doing extensive experiments on publicly available frame sequences, including OTB50, OTB100, VOT2016, and VOT2018 tracking benchmarks