56 research outputs found
Scaling & Shifting Your Features: A New Baseline for Efficient Model Tuning
Existing fine-tuning methods either tune all parameters of the pre-trained
model (full fine-tuning), which is not efficient, or only tune the last linear
layer (linear probing), which suffers a significant accuracy drop compared to
the full fine-tuning. In this paper, we propose a new parameter-efficient
fine-tuning method termed as SSF, representing that researchers only need to
Scale and Shift the deep Features extracted by a pre-trained model to catch up
with the performance of full fine-tuning. In this way, SSF also surprisingly
outperforms other parameter-efficient fine-tuning approaches even with a
smaller number of tunable parameters. Furthermore, different from some existing
parameter-efficient fine-tuning methods (e.g., Adapter or VPT) that introduce
the extra parameters and computational cost in the training and inference
stages, SSF only adds learnable parameters during the training stage, and these
additional parameters can be merged into the original pre-trained model weights
via re-parameterization in the inference phase. With the proposed SSF, our
model obtains 2.46% (90.72% vs. 88.54%) and 11.48% (73.10% vs. 65.57%)
performance improvement on FGVC and VTAB-1k in terms of Top-1 accuracy compared
to the full fine-tuning but only fine-tuning about 0.3M parameters. We also
conduct amounts of experiments in various model families (CNNs, Transformers,
and MLPs) and datasets. Results on 26 image classification datasets in total
and 3 robustness & out-of-distribution datasets show the effectiveness of SSF.
Code is available at https://github.com/dongzelian/SSF.Comment: Accepted by NeurIPS202
Magnetic interactions and possible structural distortion in kagome FeGe from first-principles study and symmetry analysis
Based on density functional theory and symmetry analysis, we present a
comprehensive investigation of electronic structure, magnetic properties and
possible structural distortion of magnetic kagome metal FeGe. We estimate the
magnetic parameters including Heisenberg and Dzyaloshinskii-Moriya (DM)
interactions, and find that the ferromagnetic nearest-neighbor
dominates over the others, while the magnetic interactions between nearest
kagome layers favors antiferromagnetic. The N\'{e}el temperature and
Curie-Weiss temperature are successfully reproduced, and the
calculated magnetic anisotropy energy is also in consistent with the
experiment. However, these reasonable Heisenberg interactions and magnetic
anisotropy cannot explain the double cone magnetic transition, and the DM
interactions, which even exist in the centrosymmetric materials, can result in
this small magnetic cone angle. Unfortunately, due to the crystal symmetry of
the high-temperature structure, the net contribution of DM interactions to
double cone magnetic structure is absent. Based on the experimental supercell, we thus explore the subgroups of the parent phase. Group
theoretical analysis reveals that there are 68 different distortions, and only
four of them (space group or ) without inversion and mirror
symmetry thus can explain the low-temperature magnetic structure. Furthermore,
we suggest that these four proposed CDW phases can be identified by using Raman
spectroscopy. Since DM interactions are very sensitive to small atomic
displacements and symmetry restrictions, we believe that symmetry analysis is
an effective method to reveal the interplay of delicate structural distortions
and complex magnetic configurations
Priority-Centric Human Motion Generation in Discrete Latent Space
Text-to-motion generation is a formidable task, aiming to produce human
motions that align with the input text while also adhering to human
capabilities and physical laws. While there have been advancements in diffusion
models, their application in discrete spaces remains underexplored. Current
methods often overlook the varying significance of different motions, treating
them uniformly. It is essential to recognize that not all motions hold the same
relevance to a particular textual description. Some motions, being more salient
and informative, should be given precedence during generation. In response, we
introduce a Priority-Centric Motion Discrete Diffusion Model (M2DM), which
utilizes a Transformer-based VQ-VAE to derive a concise, discrete motion
representation, incorporating a global self-attention mechanism and a
regularization term to counteract code collapse. We also present a motion
discrete diffusion model that employs an innovative noise schedule, determined
by the significance of each motion token within the entire motion sequence.
This approach retains the most salient motions during the reverse diffusion
process, leading to more semantically rich and varied motions. Additionally, we
formulate two strategies to gauge the importance of motion tokens, drawing from
both textual and visual indicators. Comprehensive experiments on the HumanML3D
and KIT-ML datasets confirm that our model surpasses existing techniques in
fidelity and diversity, particularly for intricate textual descriptions.Comment: Accepted by ICCV202
GraphAdapter: Tuning Vision-Language Models With Dual Knowledge Graph
Adapter-style efficient transfer learning (ETL) has shown excellent
performance in the tuning of vision-language models (VLMs) under the low-data
regime, where only a few additional parameters are introduced to excavate the
task-specific knowledge based on the general and powerful representation of
VLMs. However, most adapter-style works face two limitations: (i) modeling
task-specific knowledge with a single modality only; and (ii) overlooking the
exploitation of the inter-class relationships in downstream tasks, thereby
leading to sub-optimal solutions. To mitigate that, we propose an effective
adapter-style tuning strategy, dubbed GraphAdapter, which performs the textual
adapter by explicitly modeling the dual-modality structure knowledge (i.e., the
correlation of different semantics/classes in textual and visual modalities)
with a dual knowledge graph. In particular, the dual knowledge graph is
established with two sub-graphs, i.e., a textual knowledge sub-graph, and a
visual knowledge sub-graph, where the nodes and edges represent the
semantics/classes and their correlations in two modalities, respectively. This
enables the textual feature of each prompt to leverage the task-specific
structure knowledge from both textual and visual modalities, yielding a more
effective classifier for downstream tasks. Extensive experimental results on 11
benchmark datasets reveal that our GraphAdapter significantly outperforms
previous adapter-based methods. The code will be released at
https://github.com/lixinustc/GraphAdapterComment: Accepted by NeurIPS 2023. The manuscript will be further revised
based on the review
TM2D: Bimodality Driven 3D Dance Generation via Music-Text Integration
We propose a novel task for generating 3D dance movements that simultaneously
incorporate both text and music modalities. Unlike existing works that generate
dance movements using a single modality such as music, our goal is to produce
richer dance movements guided by the instructive information provided by the
text. However, the lack of paired motion data with both music and text
modalities limits the ability to generate dance movements that integrate both.
To alleviate this challenge, we propose to utilize a 3D human motion VQ-VAE to
project the motions of the two datasets into a latent space consisting of
quantized vectors, which effectively mix the motion tokens from the two
datasets with different distributions for training. Additionally, we propose a
cross-modal transformer to integrate text instructions into motion generation
architecture for generating 3D dance movements without degrading the
performance of music-conditioned dance generation. To better evaluate the
quality of the generated motion, we introduce two novel metrics, namely Motion
Prediction Distance (MPD) and Freezing Score, to measure the coherence and
freezing percentage of the generated motion. Extensive experiments show that
our approach can generate realistic and coherent dance movements conditioned on
both text and music while maintaining comparable performance with the two
single modalities. Code will be available at:
https://garfield-kh.github.io/TM2D/
Dataset Quantization
State-of-the-art deep neural networks are trained with large amounts
(millions or even billions) of data. The expensive computation and memory costs
make it difficult to train them on limited hardware resources, especially for
recent popular large language models (LLM) and computer vision models (CV).
Recent popular dataset distillation methods are thus developed, aiming to
reduce the number of training samples via synthesizing small-scale datasets via
gradient matching. However, as the gradient calculation is coupled with the
specific network architecture, the synthesized dataset is biased and performs
poorly when used for training unseen architectures. To address these
limitations, we present dataset quantization (DQ), a new framework to compress
large-scale datasets into small subsets which can be used for training any
neural network architectures. Extensive experiments demonstrate that DQ is able
to generate condensed small datasets for training unseen network architectures
with state-of-the-art compression ratios for lossless model training. To the
best of our knowledge, DQ is the first method that can successfully distill
large-scale datasets such as ImageNet-1k with a state-of-the-art compression
ratio. Notably, with 60% data from ImageNet and 20% data from Alpaca's
instruction tuning data, the models can be trained with negligible or no
performance drop for both vision tasks (including classification, semantic
segmentation, and object detection) as well as language tasks (including
instruction tuning tasks such as BBH and DROP).Comment: 9 page
Revisiting Event-based Video Frame Interpolation
Dynamic vision sensors or event cameras provide rich complementary
information for video frame interpolation. Existing state-of-the-art methods
follow the paradigm of combining both synthesis-based and warping networks.
However, few of those methods fully respect the intrinsic characteristics of
events streams. Given that event cameras only encode intensity changes and
polarity rather than color intensities, estimating optical flow from events is
arguably more difficult than from RGB information. We therefore propose to
incorporate RGB information in an event-guided optical flow refinement
strategy. Moreover, in light of the quasi-continuous nature of the time signals
provided by event cameras, we propose a divide-and-conquer strategy in which
event-based intermediate frame synthesis happens incrementally in multiple
simplified stages rather than in a single, long stage. Extensive experiments on
both synthetic and real-world datasets show that these modifications lead to
more reliable and realistic intermediate frame results than previous video
frame interpolation methods. Our findings underline that a careful
consideration of event characteristics such as high temporal density and
elevated noise benefits interpolation accuracy.Comment: Accepted by IROS2023 Project Site:
https://jiabenchen.github.io/revisit_even
Immune Infiltration in Atherosclerosis is Mediated by Cuproptosis-Associated Ferroptosis Genes
Aims: In this study, we aimed to identify cuproptosis-associated ferroptosis genes in the atherosclerosis microarray of the Gene Expression Omnibus (GEO) database and to explore hub gene-mediated immune infiltration in atherosclerosis.Background: Immune infiltration plays a crucial role in atherosclerosis development. Ferroptosis is a mode of cell death caused by the iron-dependent accumulation of lipid peroxides. Cuproptosis is a recently discovered type of programmed cell death. No previous studies have examined the mechanism of cuproptosis-associated ferroptosis gene regulation in immune infiltration in atherosclerosis.Methods: We searched the qualified atherosclerosis gene microarray in the GEO database, integrated it with ferroptosis and cuproptosis genes, and calculated the correlation coefficients. We then obtained the cuproptosis-associated ferroptosis gene matrix and screened differentially expressed genes. Subsequently, we performed Gene Ontology and Kyoto Encyclopedia of Genes and Genomes enrichment analyses and protein–protein interaction network analysis of differentially expressed genes. We also screened hub genes according to the Matthews correlation coefficient (MCC) algorithm. We conducted enrichment analysis of hub genes to explore their functions and predict related microRNAs (P<0.05). We also used the single-sample gene set enrichment analysis (ssGSEA) algorithm to analyze the relationships between hub genes and immune infiltration, and used immune-associated hub genes to construct a risk model. Finally, we used the drug prediction results and molecular docking technology to explore potential therapeutic drugs targeting the hub genes.Results: Seventy-eight cuproptosis-associated ferroptosis genes were found to be involved in the cellular response to oxidative and chemical stress, and to be enriched in multiple pathways, including ferroptosis, glutathione metabolism, and atherosclerosis. Ten hub genes were identified with the MCC algorithm; according to the ssGSEA algorithm, these genes were closely associated with immune infiltration, thus indicating that cuproptosis-associated ferroptosis genes may participate in atherosclerosis by mediating immune infiltration. The receiver operating characteristic curve indicated that the model had a good ability to predict atherosclerosis risk. The results of drug prediction (adjusted P<0.001) and molecular docking showed that glutathione may be a potential therapeutic drug that targets the hub genes.Conclusion: Cuproptosis-associated ferroptosis genes are associated with immune infiltration in atherosclerosis
Water-Borne Perovskite Quantum Dot-Loaded, Polystyrene Latex Ink
Highly lipophilic nanocrystals (NCs) of cesium lead halides were successfully embedded in polystyrene (PS) particles by deliberately controlling the swelling of the PS particles in the mixtures of good and bad organic solvents. The resulting composite particles were readily transferred into water via simple stepwise solvent exchange, which yielded water-borne perovskite NC-based inks with outstanding structural and chemical stability in aqueous media. Minimal change in the photoluminescence (PL) of the NCs loaded in the PS particles was visible after 1 month of incubation of the composite particles in water in a broad pH range from 1 to 14, which could otherwise be hardly realized. Loading into the PS particles also made the NCs highly stable against polar organic solvents, such as ethanol, intense light irradiation, and heat. The NC PL intensity slightly changed after the composite particles were heated at 75°C and under irradiation of strong blue light (@365 nm) for 1 h. Furthermore, the PS matrices could effectively inhibit the exchange of halide anions between two differently sized perovskite NCs loaded therein, thereby offering a considerable technical advantage in the application of multiple perovskite NCs for multicolor display in the future
- …