2,883 research outputs found

    Test-time Distribution Learning Adapter for Cross-modal Visual Reasoning

    Full text link
    Vision-Language Pre-Trained (VLP) models, such as CLIP, have demonstrated remarkable effectiveness in learning generic visual representations. Several approaches aim to efficiently adapt VLP models to downstream tasks with limited supervision, aiming to leverage the acquired knowledge from VLP models. However, these methods suffer from either introducing biased representations or requiring high computational complexity, which hinders their effectiveness in fine-tuning the CLIP model. Moreover, when a model is trained on data specific to a particular domain, its ability to generalize to uncharted domains diminishes. In this work, we propose Test-Time Distribution LearNing Adapter (TT-DNA) which directly works during the testing period. Specifically, we estimate Gaussian distributions to model visual features of the few-shot support images to capture the knowledge from the support set. The cosine similarity between query image and the feature distribution of support images is used as the prediction of visual adapter. Subsequently, the visual adapter's prediction merges with the original CLIP prediction via a residual connection, resulting in the final prediction. Our extensive experimental results on visual reasoning for human object interaction demonstrate that our proposed TT-DNA outperforms existing state-of-the-art methods by large margins.Comment: Accepted by ICASSP 202

    Zc(3900)Z_c(3900) as a DDΛ‰βˆ—D\bar{D}^* molecule from the pole counting rule

    Full text link
    A comprehensive study on the nature of the Zc(3900)Z_c(3900) resonant structure is carried out in this work. By constructing the pertinent effective Lagrangians and considering the important final-state-interaction effects, we first give a unified description to all the relevant experimental data available, including the J/ΟˆΟ€J/\psi\pi and ππ\pi\pi invariant mass distributions from the e+eβˆ’β†’J/ΟˆΟ€Ο€e^+e^-\to J/\psi\pi\pi process, the hcΟ€h_c\pi distribution from e+eβˆ’β†’hcππe^+e^-\to h_c\pi\pi and also the DDΛ‰βˆ—D\bar D^{*} spectrum in the e+eβˆ’β†’DDΛ‰βˆ—Ο€e^+e^-\to D\bar D^{*}\pi process. After fitting the unknown parameters to the previous data, we search the pole in the complex energy plane and find only one pole in the nearby energy region in different Riemann sheets. Therefore we conclude that Zc(3900)Z_c(3900) is of DDΛ‰βˆ—D\bar D^* molecular nature, according to the pole counting rule method~[Nucl.~Phys.~A543, 632 (1992); Phys.~Rev.~D 35,~1633 (1987)]. We emphasize that the conclusion based upon the pole counting method is not trivial, since both the DDΛ‰βˆ—D\bar D^{*} contact interactions and the explicit ZcZ_c exchanges are introduced in our analyses and they lead to the same conclusion.Comment: 21 pages, 9 figures. To match the published version in PRD. Additional discussion on the spectral density function is include

    Evaluating Summary Statistics with Mutual Information for Cosmological Inference

    Full text link
    The ability to compress observational data and accurately estimate physical parameters relies heavily on informative summary statistics. In this paper, we introduce the use of mutual information (MI) as a means of evaluating the quality of summary statistics in inference tasks. MI can assess the sufficiency of summaries, and provide a quantitative basis for comparison. We propose to estimate MI using the Barber-Agakov lower bound and normalizing flow based variational distributions. To demonstrate the effectiveness of our method, we compare three different summary statistics (namely the power spectrum, bispectrum, and scattering transform) in the context of inferring reionization parameters from mock images of 21~cm observations with Square Kilometre Array. We find that this approach is able to correctly assess the informativeness of different summary statistics and allows us to select the optimal set of statistics for inference tasks.Comment: Accepted at the ICML 2023 Workshop on Machine Learning for Astrophysics, comments welcom

    Cross-Modal Concept Learning and Inference for Vision-Language Models

    Full text link
    Large-scale pre-trained Vision-Language Models (VLMs), such as CLIP, establish the correlation between texts and images, achieving remarkable success on various downstream tasks with fine-tuning. In existing fine-tuning methods, the class-specific text description is matched against the whole image. We recognize that this whole image matching is not effective since images from the same class often contain a set of different semantic objects, and an object further consists of a set of semantic parts or concepts. Individual semantic parts or concepts may appear in image samples from different classes. To address this issue, in this paper, we develop a new method called cross-model concept learning and inference (CCLI). Using the powerful text-image correlation capability of CLIP, our method automatically learns a large set of distinctive visual concepts from images using a set of semantic text concepts. Based on these visual concepts, we construct a discriminative representation of images and learn a concept inference network to perform downstream image classification tasks, such as few-shot learning and domain generalization. Extensive experimental results demonstrate that our CCLI method is able to improve the performance upon the current state-of-the-art methods by large margins, for example, by up to 8.0% improvement on few-shot learning and by up to 1.3% for domain generalization

    Unsupervised Prototype Adapter for Vision-Language Models

    Full text link
    Recently, large-scale pre-trained vision-language models (e.g. CLIP and ALIGN) have demonstrated remarkable effectiveness in acquiring transferable visual representations. To leverage the valuable knowledge encoded within these models for downstream tasks, several fine-tuning approaches, including prompt tuning methods and adapter-based methods, have been developed to adapt vision-language models effectively with supervision. However, these methods rely on the availability of annotated samples, which can be labor-intensive and time-consuming to acquire, thus limiting scalability. To address this issue, in this work, we design an unsupervised fine-tuning approach for vision-language models called Unsupervised Prototype Adapter (UP-Adapter). Specifically, for the unannotated target datasets, we leverage the text-image aligning capability of CLIP to automatically select the most confident samples for each class. Utilizing these selected samples, we generate class prototypes, which serve as the initialization for the learnable prototype model. After fine-tuning, the prototype model prediction is combined with the original CLIP's prediction by a residual connection to perform downstream recognition tasks. Our extensive experimental results on image recognition and domain generalization show that the proposed unsupervised method outperforms 8-shot CoOp, 8-shot Tip-Adapter, and also the state-of-the-art UPL method by large margins.Comment: Accepted by PRCV 202

    BDC-Adapter: Brownian Distance Covariance for Better Vision-Language Reasoning

    Full text link
    Large-scale pre-trained Vision-Language Models (VLMs), such as CLIP and ALIGN, have introduced a new paradigm for learning transferable visual representations. Recently, there has been a surge of interest among researchers in developing lightweight fine-tuning techniques to adapt these models to downstream visual tasks. We recognize that current state-of-the-art fine-tuning methods, such as Tip-Adapter, simply consider the covariance between the query image feature and features of support few-shot training samples, which only captures linear relations and potentially instigates a deceptive perception of independence. To address this issue, in this work, we innovatively introduce Brownian Distance Covariance (BDC) to the field of vision-language reasoning. The BDC metric can model all possible relations, providing a robust metric for measuring feature dependence. Based on this, we present a novel method called BDC-Adapter, which integrates BDC prototype similarity reasoning and multi-modal reasoning network prediction to perform classification tasks. Our extensive experimental results show that the proposed BDC-Adapter can freely handle non-linear relations and fully characterize independence, outperforming the current state-of-the-art methods by large margins.Comment: Accepted by BMVC 202

    Research on bearing radiation noise and optimization design based on coupled vibro-acoustic method

    Get PDF
    For bearings, radiation noise was an important evaluation index for mechanical property, in particularly mute machinery. Environmental pollution caused by bearing noise has always been the focus in bearing industry. In this paper, slippage of the rolling bearing and its own variable stiffness excitation were considered to accomplish the vibration coupling between the bearing and bearing seat as well as the coupling between bearing vibration and noise by means of combination of dynamic model, FEA model and boundary element method. A perfect coupled vibro-acoustic model of the bearing was built, and its results were compared with the experimental results to verify the reliability of the proposed method. Based on the verified simulation model, the improved design was carried out for the low-noise rolling bearings. Finally, in order to further verify the superiority of the proposed method in this paper, the designed rolling bearing was compared with that of the traditional design method. The results showed that the proposed design method was reliable
    • …
    corecore