154 research outputs found
Learning Adaptive Discriminative Correlation Filters via Temporal Consistency Preserving Spatial Feature Selection for Robust Visual Tracking
With efficient appearance learning models, Discriminative Correlation Filter
(DCF) has been proven to be very successful in recent video object tracking
benchmarks and competitions. However, the existing DCF paradigm suffers from
two major issues, i.e., spatial boundary effect and temporal filter
degradation. To mitigate these challenges, we propose a new DCF-based tracking
method. The key innovations of the proposed method include adaptive spatial
feature selection and temporal consistent constraints, with which the new
tracker enables joint spatial-temporal filter learning in a lower dimensional
discriminative manifold. More specifically, we apply structured spatial
sparsity constraints to multi-channel filers. Consequently, the process of
learning spatial filters can be approximated by the lasso regularisation. To
encourage temporal consistency, the filter model is restricted to lie around
its historical value and updated locally to preserve the global structure in
the manifold. Last, a unified optimisation framework is proposed to jointly
select temporal consistency preserving spatial features and learn
discriminative filters with the augmented Lagrangian method. Qualitative and
quantitative evaluations have been conducted on a number of well-known
benchmarking datasets such as OTB2013, OTB50, OTB100, Temple-Colour, UAV123 and
VOT2018. The experimental results demonstrate the superiority of the proposed
method over the state-of-the-art approaches
LabelPrompt: Effective Prompt-based Learning for Relation Classification
Recently, prompt-based learning has become a very popular solution in many
Natural Language Processing (NLP) tasks by inserting a template into model
input, which converts the task into a cloze-style one to smoothing out
differences between the Pre-trained Language Model (PLM) and the current task.
But in the case of relation classification, it is difficult to map the masked
output to the relation labels because of its abundant semantic information,
e.g. org:founded_by''. Therefore, a pre-trained model still needs enough
labelled data to fit the relations. To mitigate this challenge, in this paper,
we present a novel prompt-based learning method, namely LabelPrompt, for the
relation classification task. It is an extraordinary intuitive approach by a
motivation: ``GIVE MODEL CHOICES!''. First, we define some additional tokens to
represent the relation labels, which regards these tokens as the verbalizer
with semantic initialisation and constructs them with a prompt template method.
Then we revisit the inconsistency of the predicted relation and the given
entities, an entity-aware module with the thought of contrastive learning is
designed to mitigate the problem. At last, we apply an attention query strategy
to self-attention layers to resolve two types of tokens, prompt tokens and
sequence tokens. The proposed strategy effectively improves the adaptation
capability of prompt-based learning in the relation classification task when
only a small labelled data is available. Extensive experimental results
obtained on several bench-marking datasets demonstrate the superiority of the
proposed LabelPrompt method, particularly in the few-shot scenario
Joint Group Feature Selection and Discriminative Filter Learning for Robust Visual Object Tracking
We propose a new Group Feature Selection method for Discriminative
Correlation Filters (GFS-DCF) based visual object tracking. The key innovation
of the proposed method is to perform group feature selection across both
channel and spatial dimensions, thus to pinpoint the structural relevance of
multi-channel features to the filtering system. In contrast to the widely used
spatial regularisation or feature selection methods, to the best of our
knowledge, this is the first time that channel selection has been advocated for
DCF-based tracking. We demonstrate that our GFS-DCF method is able to
significantly improve the performance of a DCF tracker equipped with deep
neural network features. In addition, our GFS-DCF enables joint feature
selection and filter learning, achieving enhanced discrimination and
interpretability of the learned filters.
To further improve the performance, we adaptively integrate historical
information by constraining filters to be smooth across temporal frames, using
an efficient low-rank approximation. By design, specific
temporal-spatial-channel configurations are dynamically learned in the tracking
process, highlighting the relevant features, and alleviating the performance
degrading impact of less discriminative representations and reducing
information redundancy. The experimental results obtained on OTB2013, OTB2015,
VOT2017, VOT2018 and TrackingNet demonstrate the merits of our GFS-DCF and its
superiority over the state-of-the-art trackers. The code is publicly available
at https://github.com/XU-TIANYANG/GFS-DCF
An Accelerated Correlation Filter Tracker
Recent visual object tracking methods have witnessed a continuous improvement
in the state-of-the-art with the development of efficient discriminative
correlation filters (DCF) and robust deep neural network features. Despite the
outstanding performance achieved by the above combination, existing advanced
trackers suffer from the burden of high computational complexity of the deep
feature extraction and online model learning. We propose an accelerated ADMM
optimisation method obtained by adding a momentum to the optimisation sequence
iterates, and by relaxing the impact of the error between DCF parameters and
their norm. The proposed optimisation method is applied to an innovative
formulation of the DCF design, which seeks the most discriminative spatially
regularised feature channels. A further speed up is achieved by an adaptive
initialisation of the filter optimisation process. The significantly increased
convergence of the DCF filter is demonstrated by establishing the optimisation
process equivalence with a continuous dynamical system for which the
convergence properties can readily be derived. The experimental results
obtained on several well-known benchmarking datasets demonstrate the efficiency
and robustness of the proposed ACFT method, with a tracking accuracy comparable
to the start-of-the-art trackers
LRRNet: A Novel Representation Learning Guided Fusion Network for Infrared and Visible Images
Deep learning based fusion methods have been achieving promising performance
in image fusion tasks. This is attributed to the network architecture that
plays a very important role in the fusion process. However, in general, it is
hard to specify a good fusion architecture, and consequently, the design of
fusion networks is still a black art, rather than science. To address this
problem, we formulate the fusion task mathematically, and establish a
connection between its optimal solution and the network architecture that can
implement it. This approach leads to a novel method proposed in the paper of
constructing a lightweight fusion network. It avoids the time-consuming
empirical network design by a trial-and-test strategy. In particular we adopt a
learnable representation approach to the fusion task, in which the construction
of the fusion network architecture is guided by the optimisation algorithm
producing the learnable model. The low-rank representation (LRR) objective is
the foundation of our learnable model. The matrix multiplications, which are at
the heart of the solution are transformed into convolutional operations, and
the iterative process of optimisation is replaced by a special feed-forward
network. Based on this novel network architecture, an end-to-end lightweight
fusion network is constructed to fuse infrared and visible light images. Its
successful training is facilitated by a detail-to-semantic information loss
function proposed to preserve the image details and to enhance the salient
features of the source images. Our experiments show that the proposed fusion
network exhibits better fusion performance than the state-of-the-art fusion
methods on public datasets. Interestingly, our network requires a fewer
training parameters than other existing methods.Comment: 14 pages, 15 figures, 8 table
- …