50 research outputs found
Multi-Modal Multi-Scale Deep Learning for Large-Scale Image Annotation
Image annotation aims to annotate a given image with a variable number of
class labels corresponding to diverse visual concepts. In this paper, we
address two main issues in large-scale image annotation: 1) how to learn a rich
feature representation suitable for predicting a diverse set of visual concepts
ranging from object, scene to abstract concept; 2) how to annotate an image
with the optimal number of class labels. To address the first issue, we propose
a novel multi-scale deep model for extracting rich and discriminative features
capable of representing a wide range of visual concepts. Specifically, a novel
two-branch deep neural network architecture is proposed which comprises a very
deep main network branch and a companion feature fusion network branch designed
for fusing the multi-scale features computed from the main branch. The deep
model is also made multi-modal by taking noisy user-provided tags as model
input to complement the image input. For tackling the second issue, we
introduce a label quantity prediction auxiliary task to the main label
prediction task to explicitly estimate the optimal label number for a given
image. Extensive experiments are carried out on two large-scale image
annotation benchmark datasets and the results show that our method
significantly outperforms the state-of-the-art.Comment: Submited to IEEE TI
Debiased Fine-Tuning for Vision-language Models by Prompt Regularization
We present a new paradigm for fine-tuning large-scale visionlanguage
pre-trained models on downstream task, dubbed Prompt Regularization (ProReg).
Different from traditional fine-tuning which easily overfits to the downstream
task data, ProReg uses the prediction by prompting the pretrained model to
regularize the fine-tuning. The motivation is: by prompting the large model "a
photo of a [CLASS]", the fil-lin answer is only dependent on the pretraining
encyclopedic knowledge while independent of the task data distribution, which
is usually biased. Specifically, given a training sample prediction during
fine-tuning, we first calculate its KullbackLeibler loss of the prompt
prediction and Cross-Entropy loss of the ground-truth label, and then combine
them with a proposed sample-wise adaptive trade-off weight, which automatically
adjusts the transfer between the pretrained and downstream domains. On various
out-of-distribution benchmarks, we show the consistently strong performance of
ProReg compared with conventional fine-tuning, zero-shot prompt, prompt tuning,
and other state-of-the-art methods.Comment: AAAI2023 accepte
Second-Order Topological Insulator in van der Waals Heterostructures of CoBr/PtHgSe/CoBr
Second-order topological insulator, which has (d-2)-dimensional topological
hinge or corner states, has been observed in three-dimensional materials, but
has yet not been observed in two-dimensional system. In this Letter, we
theoretically propose the realization of second-order topological insulator in
the van der Waals heterostructure of CoBr/PtHgSe/CoBr.
PtHgSe is a large gap topological insulator. With
in-plane exchange field from neighboring CoBr, a large band gap above 70
meV opens up at the edge. The corner states, which are robust against edge
disorders and irregular shapes, are confirmed in the nanoflake. We further show
that the second-order topological states can also be realized in the
heterostructure of jacutingaite family topological insulators.
We believe that our work will be beneficial for the experimental realization of
second-order topological insulators in van der Waals layered materials
Prompt-aligned Gradient for Prompt Tuning
Thanks to the large pre-trained vision-language models (VLMs) like CLIP, we
can craft a zero-shot classifier by "prompt", e.g., the confidence score of an
image being "[CLASS]" can be obtained by using the VLM provided similarity
measure between the image and the prompt sentence "a photo of a [CLASS]".
Therefore, prompt shows a great potential for fast adaptation of VLMs to
downstream tasks if we fine-tune the prompt-based similarity measure. However,
we find a common failure that improper fine-tuning may not only undermine the
prompt's inherent prediction for the task-related classes, but also for other
classes in the VLM vocabulary. Existing methods still address this problem by
using traditional anti-overfitting techniques such as early stopping and data
augmentation, which lack a principled solution specific to prompt. We present
Prompt-aligned Gradient, dubbed ProGrad, to prevent prompt tuning from
forgetting the the general knowledge learned from VLMs. In particular, ProGrad
only updates the prompt whose gradient is aligned (or non-conflicting) to the
"general direction", which is represented as the gradient of the KL loss of the
pre-defined prompt prediction. Extensive experiments demonstrate the stronger
few-shot generalization ability of ProGrad over state-of-the-art prompt tuning
methods. Codes are available at https://github.com/BeierZhu/Prompt-align.Comment: Accepted by ICCV202
Anderson Localization from Berry-Curvature Interchange in Quantum Anomalous Hall System
We theoretically investigate the localization mechanism of the quantum
anomalous Hall effect (QAHE) in the presence of spin-flip disorders. We show
that the QAHE keeps quantized at weak disorders, then enters a Berry-curvature
mediated metallic phase at moderate disorders, and finally goes into the
Anderson insulating phase at strong disorders. From the phase diagram, we find
that at the charge neutrality point although the QAHE is most robust against
disorders, the corresponding metallic phase is much easier to be localized into
the Anderson insulating phase due to the \textit{interchange} of Berry
curvatures carried respectively by the conduction and valence bands. At the
end, we provide a phenomenological picture related to the topological charges
to better understand the underlying physical origin of the QAHE Anderson
localization.Comment: 6 pages, 4 figure