50 research outputs found

    Multi-Modal Multi-Scale Deep Learning for Large-Scale Image Annotation

    Full text link
    Image annotation aims to annotate a given image with a variable number of class labels corresponding to diverse visual concepts. In this paper, we address two main issues in large-scale image annotation: 1) how to learn a rich feature representation suitable for predicting a diverse set of visual concepts ranging from object, scene to abstract concept; 2) how to annotate an image with the optimal number of class labels. To address the first issue, we propose a novel multi-scale deep model for extracting rich and discriminative features capable of representing a wide range of visual concepts. Specifically, a novel two-branch deep neural network architecture is proposed which comprises a very deep main network branch and a companion feature fusion network branch designed for fusing the multi-scale features computed from the main branch. The deep model is also made multi-modal by taking noisy user-provided tags as model input to complement the image input. For tackling the second issue, we introduce a label quantity prediction auxiliary task to the main label prediction task to explicitly estimate the optimal label number for a given image. Extensive experiments are carried out on two large-scale image annotation benchmark datasets and the results show that our method significantly outperforms the state-of-the-art.Comment: Submited to IEEE TI

    Debiased Fine-Tuning for Vision-language Models by Prompt Regularization

    Full text link
    We present a new paradigm for fine-tuning large-scale visionlanguage pre-trained models on downstream task, dubbed Prompt Regularization (ProReg). Different from traditional fine-tuning which easily overfits to the downstream task data, ProReg uses the prediction by prompting the pretrained model to regularize the fine-tuning. The motivation is: by prompting the large model "a photo of a [CLASS]", the fil-lin answer is only dependent on the pretraining encyclopedic knowledge while independent of the task data distribution, which is usually biased. Specifically, given a training sample prediction during fine-tuning, we first calculate its KullbackLeibler loss of the prompt prediction and Cross-Entropy loss of the ground-truth label, and then combine them with a proposed sample-wise adaptive trade-off weight, which automatically adjusts the transfer between the pretrained and downstream domains. On various out-of-distribution benchmarks, we show the consistently strong performance of ProReg compared with conventional fine-tuning, zero-shot prompt, prompt tuning, and other state-of-the-art methods.Comment: AAAI2023 accepte

    Second-Order Topological Insulator in van der Waals Heterostructures of CoBr2_2/Pt2_2HgSe3_3/CoBr2_2

    Full text link
    Second-order topological insulator, which has (d-2)-dimensional topological hinge or corner states, has been observed in three-dimensional materials, but has yet not been observed in two-dimensional system. In this Letter, we theoretically propose the realization of second-order topological insulator in the van der Waals heterostructure of CoBr2_2/Pt2_2HgSe3_3/CoBr2_2. Pt2_2HgSe3_3 is a large gap Z2\mathbb{Z}_2 topological insulator. With in-plane exchange field from neighboring CoBr2_2, a large band gap above 70 meV opens up at the edge. The corner states, which are robust against edge disorders and irregular shapes, are confirmed in the nanoflake. We further show that the second-order topological states can also be realized in the heterostructure of jacutingaite family Z2\mathbb{Z}_2 topological insulators. We believe that our work will be beneficial for the experimental realization of second-order topological insulators in van der Waals layered materials

    Prompt-aligned Gradient for Prompt Tuning

    Full text link
    Thanks to the large pre-trained vision-language models (VLMs) like CLIP, we can craft a zero-shot classifier by "prompt", e.g., the confidence score of an image being "[CLASS]" can be obtained by using the VLM provided similarity measure between the image and the prompt sentence "a photo of a [CLASS]". Therefore, prompt shows a great potential for fast adaptation of VLMs to downstream tasks if we fine-tune the prompt-based similarity measure. However, we find a common failure that improper fine-tuning may not only undermine the prompt's inherent prediction for the task-related classes, but also for other classes in the VLM vocabulary. Existing methods still address this problem by using traditional anti-overfitting techniques such as early stopping and data augmentation, which lack a principled solution specific to prompt. We present Prompt-aligned Gradient, dubbed ProGrad, to prevent prompt tuning from forgetting the the general knowledge learned from VLMs. In particular, ProGrad only updates the prompt whose gradient is aligned (or non-conflicting) to the "general direction", which is represented as the gradient of the KL loss of the pre-defined prompt prediction. Extensive experiments demonstrate the stronger few-shot generalization ability of ProGrad over state-of-the-art prompt tuning methods. Codes are available at https://github.com/BeierZhu/Prompt-align.Comment: Accepted by ICCV202

    Anderson Localization from Berry-Curvature Interchange in Quantum Anomalous Hall System

    Get PDF
    We theoretically investigate the localization mechanism of the quantum anomalous Hall effect (QAHE) in the presence of spin-flip disorders. We show that the QAHE keeps quantized at weak disorders, then enters a Berry-curvature mediated metallic phase at moderate disorders, and finally goes into the Anderson insulating phase at strong disorders. From the phase diagram, we find that at the charge neutrality point although the QAHE is most robust against disorders, the corresponding metallic phase is much easier to be localized into the Anderson insulating phase due to the \textit{interchange} of Berry curvatures carried respectively by the conduction and valence bands. At the end, we provide a phenomenological picture related to the topological charges to better understand the underlying physical origin of the QAHE Anderson localization.Comment: 6 pages, 4 figure
    corecore