9,521 research outputs found
Progressive Semantic-Visual Mutual Adaption for Generalized Zero-Shot Learning
Generalized Zero-Shot Learning (GZSL) identifies unseen categories by
knowledge transferred from the seen domain, relying on the intrinsic
interactions between visual and semantic information. Prior works mainly
localize regions corresponding to the sharing attributes. When various visual
appearances correspond to the same attribute, the sharing attributes inevitably
introduce semantic ambiguity, hampering the exploration of accurate
semantic-visual interactions. In this paper, we deploy the dual semantic-visual
transformer module (DSVTM) to progressively model the correspondences between
attribute prototypes and visual features, constituting a progressive
semantic-visual mutual adaption (PSVMA) network for semantic disambiguation and
knowledge transferability improvement. Specifically, DSVTM devises an
instance-motivated semantic encoder that learns instance-centric prototypes to
adapt to different images, enabling the recast of the unmatched semantic-visual
pair into the matched one. Then, a semantic-motivated instance decoder
strengthens accurate cross-domain interactions between the matched pair for
semantic-related instance adaption, encouraging the generation of unambiguous
visual representations. Moreover, to mitigate the bias towards seen classes in
GZSL, a debiasing loss is proposed to pursue response consistency between seen
and unseen predictions. The PSVMA consistently yields superior performances
against other state-of-the-art methods. Code will be available at:
https://github.com/ManLiuCoder/PSVMA.Comment: Accepted by CVPR202
Robust and Real-time Deep Tracking Via Multi-Scale Domain Adaptation
Visual tracking is a fundamental problem in computer vision. Recently, some
deep-learning-based tracking algorithms have been achieving record-breaking
performances. However, due to the high complexity of deep learning, most deep
trackers suffer from low tracking speed, and thus are impractical in many
real-world applications. Some new deep trackers with smaller network structure
achieve high efficiency while at the cost of significant decrease on precision.
In this paper, we propose to transfer the feature for image classification to
the visual tracking domain via convolutional channel reductions. The channel
reduction could be simply viewed as an additional convolutional layer with the
specific task. It not only extracts useful information for object tracking but
also significantly increases the tracking speed. To better accommodate the
useful feature of the target in different scales, the adaptation filters are
designed with different sizes. The yielded visual tracker is real-time and also
illustrates the state-of-the-art accuracies in the experiment involving two
well-adopted benchmarks with more than 100 test videos.Comment: 6 page
- …