2,883 research outputs found
Test-time Distribution Learning Adapter for Cross-modal Visual Reasoning
Vision-Language Pre-Trained (VLP) models, such as CLIP, have demonstrated
remarkable effectiveness in learning generic visual representations. Several
approaches aim to efficiently adapt VLP models to downstream tasks with limited
supervision, aiming to leverage the acquired knowledge from VLP models.
However, these methods suffer from either introducing biased representations or
requiring high computational complexity, which hinders their effectiveness in
fine-tuning the CLIP model. Moreover, when a model is trained on data specific
to a particular domain, its ability to generalize to uncharted domains
diminishes. In this work, we propose Test-Time Distribution LearNing Adapter
(TT-DNA) which directly works during the testing period. Specifically, we
estimate Gaussian distributions to model visual features of the few-shot
support images to capture the knowledge from the support set. The cosine
similarity between query image and the feature distribution of support images
is used as the prediction of visual adapter. Subsequently, the visual adapter's
prediction merges with the original CLIP prediction via a residual connection,
resulting in the final prediction. Our extensive experimental results on visual
reasoning for human object interaction demonstrate that our proposed TT-DNA
outperforms existing state-of-the-art methods by large margins.Comment: Accepted by ICASSP 202
as a molecule from the pole counting rule
A comprehensive study on the nature of the resonant structure is
carried out in this work. By constructing the pertinent effective Lagrangians
and considering the important final-state-interaction effects, we first give a
unified description to all the relevant experimental data available, including
the and invariant mass distributions from the process, the distribution from and
also the spectrum in the process.
After fitting the unknown parameters to the previous data, we search the pole
in the complex energy plane and find only one pole in the nearby energy region
in different Riemann sheets. Therefore we conclude that is of
molecular nature, according to the pole counting rule
method~[Nucl.~Phys.~A543, 632 (1992); Phys.~Rev.~D 35,~1633 (1987)]. We
emphasize that the conclusion based upon the pole counting method is not
trivial, since both the contact interactions and the explicit
exchanges are introduced in our analyses and they lead to the same
conclusion.Comment: 21 pages, 9 figures. To match the published version in PRD.
Additional discussion on the spectral density function is include
Evaluating Summary Statistics with Mutual Information for Cosmological Inference
The ability to compress observational data and accurately estimate physical
parameters relies heavily on informative summary statistics. In this paper, we
introduce the use of mutual information (MI) as a means of evaluating the
quality of summary statistics in inference tasks. MI can assess the sufficiency
of summaries, and provide a quantitative basis for comparison. We propose to
estimate MI using the Barber-Agakov lower bound and normalizing flow based
variational distributions. To demonstrate the effectiveness of our method, we
compare three different summary statistics (namely the power spectrum,
bispectrum, and scattering transform) in the context of inferring reionization
parameters from mock images of 21~cm observations with Square Kilometre Array.
We find that this approach is able to correctly assess the informativeness of
different summary statistics and allows us to select the optimal set of
statistics for inference tasks.Comment: Accepted at the ICML 2023 Workshop on Machine Learning for
Astrophysics, comments welcom
Cross-Modal Concept Learning and Inference for Vision-Language Models
Large-scale pre-trained Vision-Language Models (VLMs), such as CLIP,
establish the correlation between texts and images, achieving remarkable
success on various downstream tasks with fine-tuning. In existing fine-tuning
methods, the class-specific text description is matched against the whole
image. We recognize that this whole image matching is not effective since
images from the same class often contain a set of different semantic objects,
and an object further consists of a set of semantic parts or concepts.
Individual semantic parts or concepts may appear in image samples from
different classes. To address this issue, in this paper, we develop a new
method called cross-model concept learning and inference (CCLI). Using the
powerful text-image correlation capability of CLIP, our method automatically
learns a large set of distinctive visual concepts from images using a set of
semantic text concepts. Based on these visual concepts, we construct a
discriminative representation of images and learn a concept inference network
to perform downstream image classification tasks, such as few-shot learning and
domain generalization. Extensive experimental results demonstrate that our CCLI
method is able to improve the performance upon the current state-of-the-art
methods by large margins, for example, by up to 8.0% improvement on few-shot
learning and by up to 1.3% for domain generalization
Unsupervised Prototype Adapter for Vision-Language Models
Recently, large-scale pre-trained vision-language models (e.g. CLIP and
ALIGN) have demonstrated remarkable effectiveness in acquiring transferable
visual representations. To leverage the valuable knowledge encoded within these
models for downstream tasks, several fine-tuning approaches, including prompt
tuning methods and adapter-based methods, have been developed to adapt
vision-language models effectively with supervision. However, these methods
rely on the availability of annotated samples, which can be labor-intensive and
time-consuming to acquire, thus limiting scalability. To address this issue, in
this work, we design an unsupervised fine-tuning approach for vision-language
models called Unsupervised Prototype Adapter (UP-Adapter). Specifically, for
the unannotated target datasets, we leverage the text-image aligning capability
of CLIP to automatically select the most confident samples for each class.
Utilizing these selected samples, we generate class prototypes, which serve as
the initialization for the learnable prototype model. After fine-tuning, the
prototype model prediction is combined with the original CLIP's prediction by a
residual connection to perform downstream recognition tasks. Our extensive
experimental results on image recognition and domain generalization show that
the proposed unsupervised method outperforms 8-shot CoOp, 8-shot Tip-Adapter,
and also the state-of-the-art UPL method by large margins.Comment: Accepted by PRCV 202
BDC-Adapter: Brownian Distance Covariance for Better Vision-Language Reasoning
Large-scale pre-trained Vision-Language Models (VLMs), such as CLIP and
ALIGN, have introduced a new paradigm for learning transferable visual
representations. Recently, there has been a surge of interest among researchers
in developing lightweight fine-tuning techniques to adapt these models to
downstream visual tasks. We recognize that current state-of-the-art fine-tuning
methods, such as Tip-Adapter, simply consider the covariance between the query
image feature and features of support few-shot training samples, which only
captures linear relations and potentially instigates a deceptive perception of
independence. To address this issue, in this work, we innovatively introduce
Brownian Distance Covariance (BDC) to the field of vision-language reasoning.
The BDC metric can model all possible relations, providing a robust metric for
measuring feature dependence. Based on this, we present a novel method called
BDC-Adapter, which integrates BDC prototype similarity reasoning and
multi-modal reasoning network prediction to perform classification tasks. Our
extensive experimental results show that the proposed BDC-Adapter can freely
handle non-linear relations and fully characterize independence, outperforming
the current state-of-the-art methods by large margins.Comment: Accepted by BMVC 202
Research on bearing radiation noise and optimization design based on coupled vibro-acoustic method
For bearings, radiation noise was an important evaluation index for mechanical property, in particularly mute machinery. Environmental pollution caused by bearing noise has always been the focus in bearing industry. In this paper, slippage of the rolling bearing and its own variable stiffness excitation were considered to accomplish the vibration coupling between the bearing and bearing seat as well as the coupling between bearing vibration and noise by means of combination of dynamic model, FEA model and boundary element method. A perfect coupled vibro-acoustic model of the bearing was built, and its results were compared with the experimental results to verify the reliability of the proposed method. Based on the verified simulation model, the improved design was carried out for the low-noise rolling bearings. Finally, in order to further verify the superiority of the proposed method in this paper, the designed rolling bearing was compared with that of the traditional design method. The results showed that the proposed design method was reliable
A modified VMAT adaptive radiotherapy for nasopharyngeal cancer patients based on CT-CT image fusion
- β¦