35 research outputs found
Unsupervised Green Object Tracker (GOT) without Offline Pre-training
Supervised trackers trained on labeled data dominate the single object
tracking field for superior tracking accuracy. The labeling cost and the huge
computational complexity hinder their applications on edge devices.
Unsupervised learning methods have also been investigated to reduce the
labeling cost but their complexity remains high. Aiming at lightweight
high-performance tracking, feasibility without offline pre-training, and
algorithmic transparency, we propose a new single object tracking method,
called the green object tracker (GOT), in this work. GOT conducts an ensemble
of three prediction branches for robust box tracking: 1) a global object-based
correlator to predict the object location roughly, 2) a local patch-based
correlator to build temporal correlations of small spatial units, and 3) a
superpixel-based segmentator to exploit the spatial information of the target
frame. GOT offers competitive tracking accuracy with state-of-the-art
unsupervised trackers, which demand heavy offline pre-training, at a lower
computation cost. GOT has a tiny model size (<3k parameters) and low inference
complexity (around 58M FLOPs per frame). Since its inference complexity is
between 0.1%-10% of DL trackers, it can be easily deployed on mobile and edge
devices
GUSOT: Green and Unsupervised Single Object Tracking for Long Video Sequences
Supervised and unsupervised deep trackers that rely on deep learning
technologies are popular in recent years. Yet, they demand high computational
complexity and a high memory cost. A green unsupervised single-object tracker,
called GUSOT, that aims at object tracking for long videos under a
resource-constrained environment is proposed in this work. Built upon a
baseline tracker, UHP-SOT++, which works well for short-term tracking, GUSOT
contains two additional new modules: 1) lost object recovery, and 2)
color-saliency-based shape proposal. They help resolve the tracking loss
problem and offer a more flexible object proposal, respectively. Thus, they
enable GUSOT to achieve higher tracking accuracy in the long run. We conduct
experiments on the large-scale dataset LaSOT with long video sequences, and
show that GUSOT offers a lightweight high-performance tracking solution that
finds applications in mobile and edge computing platforms
LGSQE: Lightweight Generated Sample Quality Evaluatoin
Despite prolific work on evaluating generative models, little research has
been done on the quality evaluation of an individual generated sample. To
address this problem, a lightweight generated sample quality evaluation (LGSQE)
method is proposed in this work. In the training stage of LGSQE, a binary
classifier is trained on real and synthetic samples, where real and synthetic
data are labeled by 0 and 1, respectively. In the inference stage, the
classifier assigns soft labels (ranging from 0 to 1) to each generated sample.
The value of soft label indicates the quality level; namely, the quality is
better if its soft label is closer to 0. LGSQE can serve as a post-processing
module for quality control. Furthermore, LGSQE can be used to evaluate the
performance of generative models, such as accuracy, AUC, precision and recall,
by aggregating sample-level quality. Experiments are conducted on CIFAR-10 and
MNIST to demonstrate that LGSQE can preserve the same performance rank order as
that predicted by the Frechet Inception Distance (FID) but with significantly
lower complexity
Unsupervised Synthetic Image Refinement via Contrastive Learning and Consistent Semantic-Structural Constraints
Ensuring the realism of computer-generated synthetic images is crucial to
deep neural network (DNN) training. Due to different semantic distributions
between synthetic and real-world captured datasets, there exists semantic
mismatch between synthetic and refined images, which in turn results in the
semantic distortion. Recently, contrastive learning (CL) has been successfully
used to pull correlated patches together and push uncorrelated ones apart. In
this work, we exploit semantic and structural consistency between synthetic and
refined images and adopt CL to reduce the semantic distortion. Besides, we
incorporate hard negative mining to improve the performance furthermore. We
compare the performance of our method with several other benchmarking methods
using qualitative and quantitative measures and show that our method offers the
state-of-the-art performance