296 research outputs found

    Source-Free Domain Adaptation with Frozen Multimodal Foundation Model

    Full text link
    Source-Free Domain Adaptation (SFDA) aims to adapt a source model for a target domain, with only access to unlabeled target training data and the source model pre-trained on a supervised source domain. Relying on pseudo labeling and/or auxiliary supervision, conventional methods are inevitably error-prone. To mitigate this limitation, in this work we for the first time explore the potentials of off-the-shelf vision-language (ViL) multimodal models (e.g.,CLIP) with rich whilst heterogeneous knowledge. We find that directly applying the ViL model to the target domain in a zero-shot fashion is unsatisfactory, as it is not specialized for this particular task but largely generic. To make it task specific, we propose a novel Distilling multimodal Foundation model(DIFO)approach. Specifically, DIFO alternates between two steps during adaptation: (i) Customizing the ViL model by maximizing the mutual information with the target model in a prompt learning manner, (ii) Distilling the knowledge of this customized ViL model to the target model. For more fine-grained and reliable distillation, we further introduce two effective regularization terms, namely most-likely category encouragement and predictive consistency. Extensive experiments show that DIFO significantly outperforms the state-of-the-art alternatives. Code is hereComment: Accepted at CVPR 202

    Unified Source-Free Domain Adaptation

    Full text link
    In the pursuit of transferring a source model to a target domain without access to the source training data, Source-Free Domain Adaptation (SFDA) has been extensively explored across various scenarios, including closed-set, open-set, partial-set, and generalized settings. Existing methods, focusing on specific scenarios, not only address only a subset of challenges but also necessitate prior knowledge of the target domain, significantly limiting their practical utility and deployability. In light of these considerations, we introduce a more practical yet challenging problem, termed unified SFDA, which comprehensively incorporates all specific scenarios in a unified manner. To tackle this unified SFDA problem, we propose a novel approach called Latent Causal Factors Discovery (LCFD). In contrast to previous alternatives that emphasize learning the statistical description of reality, we formulate LCFD from a causality perspective. The objective is to uncover the causal relationships between latent variables and model decisions, enhancing the reliability and robustness of the learned model against domain shifts. To integrate extensive world knowledge, we leverage a pre-trained vision-language model such as CLIP. This aids in the formation and discovery of latent causal factors in the absence of supervision in the variation of distribution and semantics, coupled with a newly designed information bottleneck with theoretical guarantees. Extensive experiments demonstrate that LCFD can achieve new state-of-the-art results in distinct SFDA settings, as well as source-free out-of-distribution generalization.Our code and data are available at https://github.com/tntek/source-free-domain-adaptation

    Business Process Text Sketch Automation Generation Using Large Language Model

    Full text link
    Business Process Management (BPM) is gaining increasing attention as it has the potential to cut costs while boosting output and quality. Business process document generation is a crucial stage in BPM. However, due to a shortage of datasets, data-driven deep learning techniques struggle to deliver the expected results. We propose an approach to transform Conditional Process Trees (CPTs) into Business Process Text Sketches (BPTSs) using Large Language Models (LLMs). The traditional prompting approach (Few-shot In-Context Learning) tries to get the correct answer in one go, and it can find the pattern of transforming simple CPTs into BPTSs, but for close-domain and CPTs with complex hierarchy, the traditional prompts perform weakly and with low correctness. We suggest using this technique to break down a difficult CPT into a number of basic CPTs and then solve each one in turn, drawing inspiration from the divide-and-conquer strategy. We chose 100 process trees with depths ranging from 2 to 5 at random, as well as CPTs with many nodes, many degrees of selection, and cyclic nesting. Experiments show that our method can achieve a correct rate of 93.42%, which is 45.17% better than traditional prompting methods. Our proposed method provides a solution for business process document generation in the absence of datasets, and secondly, it becomes potentially possible to provide a large number of datasets for the process model extraction (PME) domain.Comment: 10 pages, 7 figure

    Boosting Cross-Domain Speech Recognition with Self-Supervision

    Full text link
    The cross-domain performance of automatic speech recognition (ASR) could be severely hampered due to the mismatch between training and testing distributions. Since the target domain usually lacks labeled data, and domain shifts exist at acoustic and linguistic levels, it is challenging to perform unsupervised domain adaptation (UDA) for ASR. Previous work has shown that self-supervised learning (SSL) or pseudo-labeling (PL) is effective in UDA by exploiting the self-supervisions of unlabeled data. However, these self-supervisions also face performance degradation in mismatched domain distributions, which previous work fails to address. This work presents a systematic UDA framework to fully utilize the unlabeled data with self-supervision in the pre-training and fine-tuning paradigm. On the one hand, we apply continued pre-training and data replay techniques to mitigate the domain mismatch of the SSL pre-trained model. On the other hand, we propose a domain-adaptive fine-tuning approach based on the PL technique with three unique modifications: Firstly, we design a dual-branch PL method to decrease the sensitivity to the erroneous pseudo-labels; Secondly, we devise an uncertainty-aware confidence filtering strategy to improve pseudo-label correctness; Thirdly, we introduce a two-step PL approach to incorporate target domain linguistic knowledge, thus generating more accurate target domain pseudo-labels. Experimental results on various cross-domain scenarios demonstrate that the proposed approach effectively boosts the cross-domain performance and significantly outperforms previous approaches.Comment: Accepted by IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP), 202

    Electron Density Dependence of in-plane Spin Relaxation Anisotropy in GaAs/AlGaAs Two-Dimensional Electron Gas

    Full text link
    We investigated the spin dynamics of two-dimensional electrons in (001) GaAs/AlGaAs heterostructure using the time resolved Kerr rotation technique under a transverse magnetic field. The in-plane spin lifetime is found to be anisotropic below 150k due to the interference of Rashba and Dresselhaus spin-orbit coupling and D'yakonov-Perel' spin relaxation. The ratio of in-plane spin lifetimes is measured directly as a function of temperature and pump power, showing that the electron density in 2DEG channel strongly affects the Rashba spin-orbit coupling.Comment: 3 pages, 2 figure

    EmotionPrompt: Leveraging Psychology for Large Language Models Enhancement via Emotional Stimulus

    Full text link
    Large language models (LLMs) have achieved significant performance in many fields such as reasoning, language understanding, and math problem-solving, and are regarded as a crucial step to artificial general intelligence (AGI). However, the sensitivity of LLMs to prompts remains a major bottleneck for their daily adoption. In this paper, we take inspiration from psychology and propose EmotionPrompt to explore emotional intelligence to enhance the performance of LLMs. EmotionPrompt operates on a remarkably straightforward principle: the incorporation of emotional stimulus into prompts. Experimental results demonstrate that our EmotionPrompt, using the same single prompt templates, significantly outperforms original zero-shot prompt and Zero-shot-CoT on 8 tasks with diverse models: ChatGPT, Vicuna-13b, Bloom, and T5. Further, EmotionPrompt was observed to improve both truthfulness and informativeness. We believe that EmotionPrompt heralds a novel avenue for exploring interdisciplinary knowledge for humans-LLMs interaction.Comment: Work in progress; 9 page

    Diaqua­bis­[4-(4H-1,2,4-triazol-4-yl)benzoato-κ2 O,O′]nickel(II)

    Get PDF
    In the title compound, [Ni(C9H6N3O2)2(H2O)2], the NiII atom lies on a twofold rotation axis and is six-coordinated by two bidentate chelating 4-(1,2,4-triazol-4-yl)benzoate ligands and two water mol­ecules in a distorted octa­hedral geometry. Inter­molecular O—H⋯N hydrogen bonds link the complex mol­ecules into a two-dimensional network parallel to (010)
    corecore