13 research outputs found

    Dynamic Prototype Mask for Occluded Person Re-Identification

    Full text link
    Although person re-identification has achieved an impressive improvement in recent years, the common occlusion case caused by different obstacles is still an unsettled issue in real application scenarios. Existing methods mainly address this issue by employing body clues provided by an extra network to distinguish the visible part. Nevertheless, the inevitable domain gap between the assistant model and the ReID datasets has highly increased the difficulty to obtain an effective and efficient model. To escape from the extra pre-trained networks and achieve an automatic alignment in an end-to-end trainable network, we propose a novel Dynamic Prototype Mask (DPM) based on two self-evident prior knowledge. Specifically, we first devise a Hierarchical Mask Generator which utilizes the hierarchical semantic to select the visible pattern space between the high-quality holistic prototype and the feature representation of the occluded input image. Under this condition, the occluded representation could be well aligned in a selected subspace spontaneously. Then, to enrich the feature representation of the high-quality holistic prototype and provide a more complete feature space, we introduce a Head Enrich Module to encourage different heads to aggregate different patterns representation in the whole image. Extensive experimental evaluations conducted on occluded and holistic person re-identification benchmarks demonstrate the superior performance of the DPM over the state-of-the-art methods. The code is released at https://github.com/stone96123/DPM.Comment: Accepted by ACM MM 202

    Approximated Prompt Tuning for Vision-Language Pre-trained Models

    Full text link
    Prompt tuning is a parameter-efficient way to deploy large-scale pre-trained models to downstream tasks by adding task-specific tokens. In terms of vision-language pre-trained (VLP) models, prompt tuning often requires a large number of learnable tokens to bridge the gap between the pre-training and downstream tasks, which greatly exacerbates the already high computational overhead. In this paper, we revisit the principle of prompt tuning for Transformer-based VLP models and reveal that the impact of soft prompt tokens can be actually approximated via independent information diffusion steps, thereby avoiding the expensive global attention modeling and reducing the computational complexity to a large extent. Based on this finding, we propose a novel Approximated Prompt Tuning (APT) approach towards efficient VL transfer learning. To validate APT, we apply it to two representative VLP models, namely ViLT and METER, and conduct extensive experiments on a bunch of downstream tasks. Meanwhile, the generalization of APT is also validated on CLIP for image classification. The experimental results not only show the superior performance gains and computation efficiency of APT against the conventional prompt tuning methods, e.g., +6.6% accuracy and -64.62% additional computation overhead on METER, but also confirm its merits over other parameter-efficient transfer learning approaches

    Bag of Features with Dense Sampling for Visual Tracking ⋆

    No full text
    The bag-of-feature model has become a state-of-the-art method of visual classification. Visual codebooks can be used to capture image statistical information for object detection and classification, which is extracted from local image patches and based on the quantization of robust appearance descriptors. In this paper, more information of target objects can be captured by dense sampling rather than sparsely sampling. Then a robust visual tracking method is proposed based on dense sampling and bag of features. Firstly, local image patches are densely extracted by sliding windows and represented as invariant descriptors. Secondly, visual codebooks are generated by fast clustering algorithms such as hierarchical k-means. Therefore, the object region and candidate regions are represented by the bag-of-feature model with the learnt codebooks. After that, tracking can operate in a Bayesian inference framework. The bag-of-feature tracking method with dense sampling is adaptive and flexible. It works independently in many situations without the complement of existed tracking algorithms. The experiments on various challenging videos demonstrate that the proposed tracker outperforms several state-of-art algorithms

    Robust visual tracking via part-based sparsity model

    No full text
    Conference Name:2013 38th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2013. Conference Address: Vancouver, BC, Canada. Time:May 26, 2013 - May 31, 2013.IEE Signal Processing SocietyThe sparse representation has been widely used in many areas including visual tracking. The part-based representation performs outstandingly by using non-holistic templates to against occlusion. This paper combined them and proposed a robust object tracking method using part-based sparsity model for tracking an object in a video sequence. In the proposed model, one object is represented by image patches. The candidates of these patches are sparsely represented in the space which is spanned by the patch templates and trivial templates. The part-based method takes the spatial information of each patch into consideration, where the vote maps of multiple patches are used. Furthermore, the update scheme keeps the representative templates of each part dynamically. Therefore, trackers can effectively deal with the changes of appearances and heavy occlusion. On various public benchmark videos, the abundant results of experiments demonstrate that the proposed tracking method outperforms many existing state-of-the-arts algorithms. ? 2013 IEEE

    Dual Distribution Alignment Network for Generalizable Person Re-Identification

    No full text
    Domain generalization (DG) offers a preferable real-world setting for Person Re-Identification (Re-ID), which trains a model using multiple source domain datasets and expects it to perform well in an unseen target domain without any model updating. Unfortunately, most DG approaches are designed explicitly for classification tasks, which fundamentally differs from the retrieval task Re-ID. Moreover, existing applications of DG in Re-ID cannot correctly handle the massive variation among Re-ID datasets. In this paper, we identify two fundamental challenges in DG for Person Re-ID: domain-wise variations and identity-wise similarities. To this end, we propose an end-to-end Dual Distribution Alignment Network (DDAN) to learn domain-invariant features with dual-level constraints: the domain-wise adversarial feature learning and the identity-wise similarity enhancement. These constraints effectively reduce the domain-shift among multiple source domains further while agreeing to real-world scenarios. We evaluate our method in a large-scale DG Re-ID benchmark and compare it with various cutting-edge DG approaches. Quantitative results show that DDAN achieves state-of-the-art performance
    corecore