34 research outputs found

    Unsupervised Green Object Tracker (GOT) without Offline Pre-training

    Full text link
    Supervised trackers trained on labeled data dominate the single object tracking field for superior tracking accuracy. The labeling cost and the huge computational complexity hinder their applications on edge devices. Unsupervised learning methods have also been investigated to reduce the labeling cost but their complexity remains high. Aiming at lightweight high-performance tracking, feasibility without offline pre-training, and algorithmic transparency, we propose a new single object tracking method, called the green object tracker (GOT), in this work. GOT conducts an ensemble of three prediction branches for robust box tracking: 1) a global object-based correlator to predict the object location roughly, 2) a local patch-based correlator to build temporal correlations of small spatial units, and 3) a superpixel-based segmentator to exploit the spatial information of the target frame. GOT offers competitive tracking accuracy with state-of-the-art unsupervised trackers, which demand heavy offline pre-training, at a lower computation cost. GOT has a tiny model size (<3k parameters) and low inference complexity (around 58M FLOPs per frame). Since its inference complexity is between 0.1%-10% of DL trackers, it can be easily deployed on mobile and edge devices

    GUSOT: Green and Unsupervised Single Object Tracking for Long Video Sequences

    Full text link
    Supervised and unsupervised deep trackers that rely on deep learning technologies are popular in recent years. Yet, they demand high computational complexity and a high memory cost. A green unsupervised single-object tracker, called GUSOT, that aims at object tracking for long videos under a resource-constrained environment is proposed in this work. Built upon a baseline tracker, UHP-SOT++, which works well for short-term tracking, GUSOT contains two additional new modules: 1) lost object recovery, and 2) color-saliency-based shape proposal. They help resolve the tracking loss problem and offer a more flexible object proposal, respectively. Thus, they enable GUSOT to achieve higher tracking accuracy in the long run. We conduct experiments on the large-scale dataset LaSOT with long video sequences, and show that GUSOT offers a lightweight high-performance tracking solution that finds applications in mobile and edge computing platforms

    LGSQE: Lightweight Generated Sample Quality Evaluatoin

    Full text link
    Despite prolific work on evaluating generative models, little research has been done on the quality evaluation of an individual generated sample. To address this problem, a lightweight generated sample quality evaluation (LGSQE) method is proposed in this work. In the training stage of LGSQE, a binary classifier is trained on real and synthetic samples, where real and synthetic data are labeled by 0 and 1, respectively. In the inference stage, the classifier assigns soft labels (ranging from 0 to 1) to each generated sample. The value of soft label indicates the quality level; namely, the quality is better if its soft label is closer to 0. LGSQE can serve as a post-processing module for quality control. Furthermore, LGSQE can be used to evaluate the performance of generative models, such as accuracy, AUC, precision and recall, by aggregating sample-level quality. Experiments are conducted on CIFAR-10 and MNIST to demonstrate that LGSQE can preserve the same performance rank order as that predicted by the Frechet Inception Distance (FID) but with significantly lower complexity

    Unsupervised Synthetic Image Refinement via Contrastive Learning and Consistent Semantic-Structural Constraints

    Full text link
    Ensuring the realism of computer-generated synthetic images is crucial to deep neural network (DNN) training. Due to different semantic distributions between synthetic and real-world captured datasets, there exists semantic mismatch between synthetic and refined images, which in turn results in the semantic distortion. Recently, contrastive learning (CL) has been successfully used to pull correlated patches together and push uncorrelated ones apart. In this work, we exploit semantic and structural consistency between synthetic and refined images and adopt CL to reduce the semantic distortion. Besides, we incorporate hard negative mining to improve the performance furthermore. We compare the performance of our method with several other benchmarking methods using qualitative and quantitative measures and show that our method offers the state-of-the-art performance
    corecore