5 research outputs found
Adaptive Framework for Robust Visual Tracking
Visual tracking is a difficult and challenging problem, for numerous reasons such as small object size, pose angle variations, occlusion, and camera motion. Object tracking has many real-world applications such as surveillance systems, moving organs in medical imaging, and robotics. Traditional tracking methods lack a recovery mechanism that can be used in situations when the tracked objects drift away from ground truth. In this paper, we propose a novel framework for tracking moving objects based on a composite framework and a reporter mechanism. The composite framework tracks moving objects using different trackers and produces pairs of forward/backward tracklets. A robustness score is then calculated for each tracker using its forward/backward tracklet pair to find the most reliable moving object trajectory. The reporter serves as the recovery mechanism to correct the moving object trajectory when the robustness score is very low, mainly using a combination of particle filter and template matching. The proposed framework can handle partial and heavy occlusions; moreover, the structure of the framework enables integration of other user-specific trackers. Extensive experiments on recent benchmarks show that the proposed framework outperforms other current state-of-the-art trackers due to its powerful trajectory analysis and recovery mechanism; the framework improved the area under the curve from 68% to 70.8% on OTB-100 benchmark
Visual object tracking in dynamic scenes
Visual object tracking is a fundamental task in the field computer vision. Visual object tracking
is widely used in numerous applications which include, but are not limited to video surveillance,
image understanding, robotics, and human-computer interaction. In essence, visual object
tracking is the problem of estimating the states/trajectory of the object of interest over time.
Unlike other tasks such as object detection where the number of classes/categories are defined
beforehand, the only available information of the object of interest is at the first frame.
Even though, Deep Learning (DL) has revolutionised most computer vision tasks, visual
object tracking still imposes several challenges. The nature of visual object tracking task is
stochastic, where no prior-knowledge is available about the object of interest during the training
or testing/inference. Moreover, visual object tracking is a class-agnostic task, as opposed object
detection and segmentation tasks. In this thesis, the main objective is to develop and advance
the visual object trackers using novel designs of deep learning frameworks and mathematical
formulations.
To take advantage of different trackers, a novel framework is developed to track moving
objects based on a composite framework and a reporter mechanism. The composite framework
has built-in trackers and user-defined trackers to track the object of interest. The framework
contains a module to calculate the robustness for each tracker and a reporter mechanism serves
as a recovery mechanism if trackers fail to locate the object of interest.
Different trackers may fail to track the object of interest, thus, a more robust framework
based on Siamese network architecture, namely DensSiam, is proposed to use the concept of dense layers and connects each dense layer in the network to all layers in a feed-forward fashion
with a similarity-learning function. DensSiam also includes a Self-Attention mechanism to
force the network to pay more attention to non-local features during offline training.
Generally, Siamese trackers do not fully utilize semantic and objectness information from
pre-trained networks that have been trained on an image classification task. To solve this problem
a novel architecture design is proposed , dubbed DomainSiam, to learn a Domain-Aware
that fully utilizes semantic and objectness information while producing a class-agnostic track
using a ridge regression network. Moreover, to reduce the sparsity problem, we solve the ridge
regression problem with a differentiable weighted-dynamic loss function.
Siamese trackers have high speed and work in real-time, however, they lack high accuracy.
To overcome this challenge, a novel dynamic policy gradient Agent-Environment architecture
with Siamese network (DP-Siam) is proposed to train the tracker to increase the accuracy and
the expected average overlap while running in real-time. DP-Siam is trained offline with reinforcement
learning to produce a continuous action that predicts the optimal object location.
One of the common design block in most object trackers in the literature is the backbone
network, where the backbone network is trained in the feature space. To design a backbone
network that maps from feature space to another space (i.e., joint-nullspace) and more suitable
for object tracking and classification, a novel framework is proposed. The new framework is
called NullSpaceNet has a clear interpretation for the feature representation and the features in
this space are more separable. NullSpaceNet is utilized in object tracking by regularizing the
discriminative joint-nullspace backbone network. The novel tracker is called NullSpaceRDAR,
and encourages the network to have a representation for the target-specific information for the
object of interest in the joint-nullspace. In contrast to feature space where objects from a specific
class are categorized into one category however, it is insensitive to intra-class variations. Furthermore, we use the NullSpaceNet backbone to learn a tracker, dubbed NullSpaceRDAR,
with a regularized discriminative joint-nullspace backbone network that is specifically
designed for object tracking. In the regularized discriminative joint-nullspace, the features from
the same target-specific are collapsed into one point in the joint-null space and different targetspecific
features are collapsed into different points in the joint-nullspace. Consequently, the
joint-nullspace forces the network to be sensitive to the variations of the object from the same
class (intra-class variations). Moreover, a dynamic adaptive loss function is proposed to select
the suitable loss function from a super-set family of losses based on the training data to make
NullSpaceRDAR more robust to different challenges