296 research outputs found
Source-Free Domain Adaptation with Frozen Multimodal Foundation Model
Source-Free Domain Adaptation (SFDA) aims to adapt a source model for a
target domain, with only access to unlabeled target training data and the
source model pre-trained on a supervised source domain. Relying on pseudo
labeling and/or auxiliary supervision, conventional methods are inevitably
error-prone. To mitigate this limitation, in this work we for the first time
explore the potentials of off-the-shelf vision-language (ViL) multimodal models
(e.g.,CLIP) with rich whilst heterogeneous knowledge. We find that directly
applying the ViL model to the target domain in a zero-shot fashion is
unsatisfactory, as it is not specialized for this particular task but largely
generic. To make it task specific, we propose a novel Distilling multimodal
Foundation model(DIFO)approach. Specifically, DIFO alternates between two steps
during adaptation: (i) Customizing the ViL model by maximizing the mutual
information with the target model in a prompt learning manner, (ii) Distilling
the knowledge of this customized ViL model to the target model. For more
fine-grained and reliable distillation, we further introduce two effective
regularization terms, namely most-likely category encouragement and predictive
consistency. Extensive experiments show that DIFO significantly outperforms the
state-of-the-art alternatives. Code is hereComment: Accepted at CVPR 202
Unified Source-Free Domain Adaptation
In the pursuit of transferring a source model to a target domain without
access to the source training data, Source-Free Domain Adaptation (SFDA) has
been extensively explored across various scenarios, including closed-set,
open-set, partial-set, and generalized settings. Existing methods, focusing on
specific scenarios, not only address only a subset of challenges but also
necessitate prior knowledge of the target domain, significantly limiting their
practical utility and deployability. In light of these considerations, we
introduce a more practical yet challenging problem, termed unified SFDA, which
comprehensively incorporates all specific scenarios in a unified manner. To
tackle this unified SFDA problem, we propose a novel approach called Latent
Causal Factors Discovery (LCFD). In contrast to previous alternatives that
emphasize learning the statistical description of reality, we formulate LCFD
from a causality perspective. The objective is to uncover the causal
relationships between latent variables and model decisions, enhancing the
reliability and robustness of the learned model against domain shifts. To
integrate extensive world knowledge, we leverage a pre-trained vision-language
model such as CLIP. This aids in the formation and discovery of latent causal
factors in the absence of supervision in the variation of distribution and
semantics, coupled with a newly designed information bottleneck with
theoretical guarantees. Extensive experiments demonstrate that LCFD can achieve
new state-of-the-art results in distinct SFDA settings, as well as source-free
out-of-distribution generalization.Our code and data are available at
https://github.com/tntek/source-free-domain-adaptation
Business Process Text Sketch Automation Generation Using Large Language Model
Business Process Management (BPM) is gaining increasing attention as it has
the potential to cut costs while boosting output and quality. Business process
document generation is a crucial stage in BPM. However, due to a shortage of
datasets, data-driven deep learning techniques struggle to deliver the expected
results. We propose an approach to transform Conditional Process Trees (CPTs)
into Business Process Text Sketches (BPTSs) using Large Language Models (LLMs).
The traditional prompting approach (Few-shot In-Context Learning) tries to get
the correct answer in one go, and it can find the pattern of transforming
simple CPTs into BPTSs, but for close-domain and CPTs with complex hierarchy,
the traditional prompts perform weakly and with low correctness. We suggest
using this technique to break down a difficult CPT into a number of basic CPTs
and then solve each one in turn, drawing inspiration from the
divide-and-conquer strategy. We chose 100 process trees with depths ranging
from 2 to 5 at random, as well as CPTs with many nodes, many degrees of
selection, and cyclic nesting. Experiments show that our method can achieve a
correct rate of 93.42%, which is 45.17% better than traditional prompting
methods. Our proposed method provides a solution for business process document
generation in the absence of datasets, and secondly, it becomes potentially
possible to provide a large number of datasets for the process model extraction
(PME) domain.Comment: 10 pages, 7 figure
Boosting Cross-Domain Speech Recognition with Self-Supervision
The cross-domain performance of automatic speech recognition (ASR) could be
severely hampered due to the mismatch between training and testing
distributions. Since the target domain usually lacks labeled data, and domain
shifts exist at acoustic and linguistic levels, it is challenging to perform
unsupervised domain adaptation (UDA) for ASR. Previous work has shown that
self-supervised learning (SSL) or pseudo-labeling (PL) is effective in UDA by
exploiting the self-supervisions of unlabeled data. However, these
self-supervisions also face performance degradation in mismatched domain
distributions, which previous work fails to address. This work presents a
systematic UDA framework to fully utilize the unlabeled data with
self-supervision in the pre-training and fine-tuning paradigm. On the one hand,
we apply continued pre-training and data replay techniques to mitigate the
domain mismatch of the SSL pre-trained model. On the other hand, we propose a
domain-adaptive fine-tuning approach based on the PL technique with three
unique modifications: Firstly, we design a dual-branch PL method to decrease
the sensitivity to the erroneous pseudo-labels; Secondly, we devise an
uncertainty-aware confidence filtering strategy to improve pseudo-label
correctness; Thirdly, we introduce a two-step PL approach to incorporate target
domain linguistic knowledge, thus generating more accurate target domain
pseudo-labels. Experimental results on various cross-domain scenarios
demonstrate that the proposed approach effectively boosts the cross-domain
performance and significantly outperforms previous approaches.Comment: Accepted by IEEE/ACM Transactions on Audio, Speech and Language
Processing (TASLP), 202
Electron Density Dependence of in-plane Spin Relaxation Anisotropy in GaAs/AlGaAs Two-Dimensional Electron Gas
We investigated the spin dynamics of two-dimensional electrons in (001)
GaAs/AlGaAs heterostructure using the time resolved Kerr rotation technique
under a transverse magnetic field. The in-plane spin lifetime is found to be
anisotropic below 150k due to the interference of Rashba and Dresselhaus
spin-orbit coupling and D'yakonov-Perel' spin relaxation. The ratio of in-plane
spin lifetimes is measured directly as a function of temperature and pump
power, showing that the electron density in 2DEG channel strongly affects the
Rashba spin-orbit coupling.Comment: 3 pages, 2 figure
EmotionPrompt: Leveraging Psychology for Large Language Models Enhancement via Emotional Stimulus
Large language models (LLMs) have achieved significant performance in many
fields such as reasoning, language understanding, and math problem-solving, and
are regarded as a crucial step to artificial general intelligence (AGI).
However, the sensitivity of LLMs to prompts remains a major bottleneck for
their daily adoption. In this paper, we take inspiration from psychology and
propose EmotionPrompt to explore emotional intelligence to enhance the
performance of LLMs. EmotionPrompt operates on a remarkably straightforward
principle: the incorporation of emotional stimulus into prompts. Experimental
results demonstrate that our EmotionPrompt, using the same single prompt
templates, significantly outperforms original zero-shot prompt and
Zero-shot-CoT on 8 tasks with diverse models: ChatGPT, Vicuna-13b, Bloom, and
T5. Further, EmotionPrompt was observed to improve both truthfulness and
informativeness. We believe that EmotionPrompt heralds a novel avenue for
exploring interdisciplinary knowledge for humans-LLMs interaction.Comment: Work in progress; 9 page
Diaquabis[4-(4H-1,2,4-triazol-4-yl)benzoato-κ2 O,O′]nickel(II)
In the title compound, [Ni(C9H6N3O2)2(H2O)2], the NiII atom lies on a twofold rotation axis and is six-coordinated by two bidentate chelating 4-(1,2,4-triazol-4-yl)benzoate ligands and two water molecules in a distorted octahedral geometry. Intermolecular O—H⋯N hydrogen bonds link the complex molecules into a two-dimensional network parallel to (010)
- …