56 research outputs found

    Scaling & Shifting Your Features: A New Baseline for Efficient Model Tuning

    Full text link
    Existing fine-tuning methods either tune all parameters of the pre-trained model (full fine-tuning), which is not efficient, or only tune the last linear layer (linear probing), which suffers a significant accuracy drop compared to the full fine-tuning. In this paper, we propose a new parameter-efficient fine-tuning method termed as SSF, representing that researchers only need to Scale and Shift the deep Features extracted by a pre-trained model to catch up with the performance of full fine-tuning. In this way, SSF also surprisingly outperforms other parameter-efficient fine-tuning approaches even with a smaller number of tunable parameters. Furthermore, different from some existing parameter-efficient fine-tuning methods (e.g., Adapter or VPT) that introduce the extra parameters and computational cost in the training and inference stages, SSF only adds learnable parameters during the training stage, and these additional parameters can be merged into the original pre-trained model weights via re-parameterization in the inference phase. With the proposed SSF, our model obtains 2.46% (90.72% vs. 88.54%) and 11.48% (73.10% vs. 65.57%) performance improvement on FGVC and VTAB-1k in terms of Top-1 accuracy compared to the full fine-tuning but only fine-tuning about 0.3M parameters. We also conduct amounts of experiments in various model families (CNNs, Transformers, and MLPs) and datasets. Results on 26 image classification datasets in total and 3 robustness & out-of-distribution datasets show the effectiveness of SSF. Code is available at https://github.com/dongzelian/SSF.Comment: Accepted by NeurIPS202

    Magnetic interactions and possible structural distortion in kagome FeGe from first-principles study and symmetry analysis

    Full text link
    Based on density functional theory and symmetry analysis, we present a comprehensive investigation of electronic structure, magnetic properties and possible structural distortion of magnetic kagome metal FeGe. We estimate the magnetic parameters including Heisenberg and Dzyaloshinskii-Moriya (DM) interactions, and find that the ferromagnetic nearest-neighbor J1J_{1} dominates over the others, while the magnetic interactions between nearest kagome layers favors antiferromagnetic. The N\'{e}el temperature TNT_{N} and Curie-Weiss temperature θCW\theta _{CW} are successfully reproduced, and the calculated magnetic anisotropy energy is also in consistent with the experiment. However, these reasonable Heisenberg interactions and magnetic anisotropy cannot explain the double cone magnetic transition, and the DM interactions, which even exist in the centrosymmetric materials, can result in this small magnetic cone angle. Unfortunately, due to the crystal symmetry of the high-temperature structure, the net contribution of DM interactions to double cone magnetic structure is absent. Based on the experimental 2×2×22\times 2\times 2 supercell, we thus explore the subgroups of the parent phase. Group theoretical analysis reveals that there are 68 different distortions, and only four of them (space group P622P622 or P6322P6_{3}22) without inversion and mirror symmetry thus can explain the low-temperature magnetic structure. Furthermore, we suggest that these four proposed CDW phases can be identified by using Raman spectroscopy. Since DM interactions are very sensitive to small atomic displacements and symmetry restrictions, we believe that symmetry analysis is an effective method to reveal the interplay of delicate structural distortions and complex magnetic configurations

    Priority-Centric Human Motion Generation in Discrete Latent Space

    Full text link
    Text-to-motion generation is a formidable task, aiming to produce human motions that align with the input text while also adhering to human capabilities and physical laws. While there have been advancements in diffusion models, their application in discrete spaces remains underexplored. Current methods often overlook the varying significance of different motions, treating them uniformly. It is essential to recognize that not all motions hold the same relevance to a particular textual description. Some motions, being more salient and informative, should be given precedence during generation. In response, we introduce a Priority-Centric Motion Discrete Diffusion Model (M2DM), which utilizes a Transformer-based VQ-VAE to derive a concise, discrete motion representation, incorporating a global self-attention mechanism and a regularization term to counteract code collapse. We also present a motion discrete diffusion model that employs an innovative noise schedule, determined by the significance of each motion token within the entire motion sequence. This approach retains the most salient motions during the reverse diffusion process, leading to more semantically rich and varied motions. Additionally, we formulate two strategies to gauge the importance of motion tokens, drawing from both textual and visual indicators. Comprehensive experiments on the HumanML3D and KIT-ML datasets confirm that our model surpasses existing techniques in fidelity and diversity, particularly for intricate textual descriptions.Comment: Accepted by ICCV202

    GraphAdapter: Tuning Vision-Language Models With Dual Knowledge Graph

    Full text link
    Adapter-style efficient transfer learning (ETL) has shown excellent performance in the tuning of vision-language models (VLMs) under the low-data regime, where only a few additional parameters are introduced to excavate the task-specific knowledge based on the general and powerful representation of VLMs. However, most adapter-style works face two limitations: (i) modeling task-specific knowledge with a single modality only; and (ii) overlooking the exploitation of the inter-class relationships in downstream tasks, thereby leading to sub-optimal solutions. To mitigate that, we propose an effective adapter-style tuning strategy, dubbed GraphAdapter, which performs the textual adapter by explicitly modeling the dual-modality structure knowledge (i.e., the correlation of different semantics/classes in textual and visual modalities) with a dual knowledge graph. In particular, the dual knowledge graph is established with two sub-graphs, i.e., a textual knowledge sub-graph, and a visual knowledge sub-graph, where the nodes and edges represent the semantics/classes and their correlations in two modalities, respectively. This enables the textual feature of each prompt to leverage the task-specific structure knowledge from both textual and visual modalities, yielding a more effective classifier for downstream tasks. Extensive experimental results on 11 benchmark datasets reveal that our GraphAdapter significantly outperforms previous adapter-based methods. The code will be released at https://github.com/lixinustc/GraphAdapterComment: Accepted by NeurIPS 2023. The manuscript will be further revised based on the review

    TM2D: Bimodality Driven 3D Dance Generation via Music-Text Integration

    Full text link
    We propose a novel task for generating 3D dance movements that simultaneously incorporate both text and music modalities. Unlike existing works that generate dance movements using a single modality such as music, our goal is to produce richer dance movements guided by the instructive information provided by the text. However, the lack of paired motion data with both music and text modalities limits the ability to generate dance movements that integrate both. To alleviate this challenge, we propose to utilize a 3D human motion VQ-VAE to project the motions of the two datasets into a latent space consisting of quantized vectors, which effectively mix the motion tokens from the two datasets with different distributions for training. Additionally, we propose a cross-modal transformer to integrate text instructions into motion generation architecture for generating 3D dance movements without degrading the performance of music-conditioned dance generation. To better evaluate the quality of the generated motion, we introduce two novel metrics, namely Motion Prediction Distance (MPD) and Freezing Score, to measure the coherence and freezing percentage of the generated motion. Extensive experiments show that our approach can generate realistic and coherent dance movements conditioned on both text and music while maintaining comparable performance with the two single modalities. Code will be available at: https://garfield-kh.github.io/TM2D/

    Dataset Quantization

    Full text link
    State-of-the-art deep neural networks are trained with large amounts (millions or even billions) of data. The expensive computation and memory costs make it difficult to train them on limited hardware resources, especially for recent popular large language models (LLM) and computer vision models (CV). Recent popular dataset distillation methods are thus developed, aiming to reduce the number of training samples via synthesizing small-scale datasets via gradient matching. However, as the gradient calculation is coupled with the specific network architecture, the synthesized dataset is biased and performs poorly when used for training unseen architectures. To address these limitations, we present dataset quantization (DQ), a new framework to compress large-scale datasets into small subsets which can be used for training any neural network architectures. Extensive experiments demonstrate that DQ is able to generate condensed small datasets for training unseen network architectures with state-of-the-art compression ratios for lossless model training. To the best of our knowledge, DQ is the first method that can successfully distill large-scale datasets such as ImageNet-1k with a state-of-the-art compression ratio. Notably, with 60% data from ImageNet and 20% data from Alpaca's instruction tuning data, the models can be trained with negligible or no performance drop for both vision tasks (including classification, semantic segmentation, and object detection) as well as language tasks (including instruction tuning tasks such as BBH and DROP).Comment: 9 page

    Revisiting Event-based Video Frame Interpolation

    Full text link
    Dynamic vision sensors or event cameras provide rich complementary information for video frame interpolation. Existing state-of-the-art methods follow the paradigm of combining both synthesis-based and warping networks. However, few of those methods fully respect the intrinsic characteristics of events streams. Given that event cameras only encode intensity changes and polarity rather than color intensities, estimating optical flow from events is arguably more difficult than from RGB information. We therefore propose to incorporate RGB information in an event-guided optical flow refinement strategy. Moreover, in light of the quasi-continuous nature of the time signals provided by event cameras, we propose a divide-and-conquer strategy in which event-based intermediate frame synthesis happens incrementally in multiple simplified stages rather than in a single, long stage. Extensive experiments on both synthetic and real-world datasets show that these modifications lead to more reliable and realistic intermediate frame results than previous video frame interpolation methods. Our findings underline that a careful consideration of event characteristics such as high temporal density and elevated noise benefits interpolation accuracy.Comment: Accepted by IROS2023 Project Site: https://jiabenchen.github.io/revisit_even

    Immune Infiltration in Atherosclerosis is Mediated by Cuproptosis-Associated Ferroptosis Genes

    Get PDF
    Aims: In this study, we aimed to identify cuproptosis-associated ferroptosis genes in the atherosclerosis microarray of the Gene Expression Omnibus (GEO) database and to explore hub gene-mediated immune infiltration in atherosclerosis.Background: Immune infiltration plays a crucial role in atherosclerosis development. Ferroptosis is a mode of cell death caused by the iron-dependent accumulation of lipid peroxides. Cuproptosis is a recently discovered type of programmed cell death. No previous studies have examined the mechanism of cuproptosis-associated ferroptosis gene regulation in immune infiltration in atherosclerosis.Methods: We searched the qualified atherosclerosis gene microarray in the GEO database, integrated it with ferroptosis and cuproptosis genes, and calculated the correlation coefficients. We then obtained the cuproptosis-associated ferroptosis gene matrix and screened differentially expressed genes. Subsequently, we performed Gene Ontology and Kyoto Encyclopedia of Genes and Genomes enrichment analyses and protein–protein interaction network analysis of differentially expressed genes. We also screened hub genes according to the Matthews correlation coefficient (MCC) algorithm. We conducted enrichment analysis of hub genes to explore their functions and predict related microRNAs (P<0.05). We also used the single-sample gene set enrichment analysis (ssGSEA) algorithm to analyze the relationships between hub genes and immune infiltration, and used immune-associated hub genes to construct a risk model. Finally, we used the drug prediction results and molecular docking technology to explore potential therapeutic drugs targeting the hub genes.Results: Seventy-eight cuproptosis-associated ferroptosis genes were found to be involved in the cellular response to oxidative and chemical stress, and to be enriched in multiple pathways, including ferroptosis, glutathione metabolism, and atherosclerosis. Ten hub genes were identified with the MCC algorithm; according to the ssGSEA algorithm, these genes were closely associated with immune infiltration, thus indicating that cuproptosis-associated ferroptosis genes may participate in atherosclerosis by mediating immune infiltration. The receiver operating characteristic curve indicated that the model had a good ability to predict atherosclerosis risk. The results of drug prediction (adjusted P<0.001) and molecular docking showed that glutathione may be a potential therapeutic drug that targets the hub genes.Conclusion: Cuproptosis-associated ferroptosis genes are associated with immune infiltration in atherosclerosis

    Water-Borne Perovskite Quantum Dot-Loaded, Polystyrene Latex Ink

    Get PDF
    Highly lipophilic nanocrystals (NCs) of cesium lead halides were successfully embedded in polystyrene (PS) particles by deliberately controlling the swelling of the PS particles in the mixtures of good and bad organic solvents. The resulting composite particles were readily transferred into water via simple stepwise solvent exchange, which yielded water-borne perovskite NC-based inks with outstanding structural and chemical stability in aqueous media. Minimal change in the photoluminescence (PL) of the NCs loaded in the PS particles was visible after 1 month of incubation of the composite particles in water in a broad pH range from 1 to 14, which could otherwise be hardly realized. Loading into the PS particles also made the NCs highly stable against polar organic solvents, such as ethanol, intense light irradiation, and heat. The NC PL intensity slightly changed after the composite particles were heated at 75°C and under irradiation of strong blue light (@365 nm) for 1 h. Furthermore, the PS matrices could effectively inhibit the exchange of halide anions between two differently sized perovskite NCs loaded therein, thereby offering a considerable technical advantage in the application of multiple perovskite NCs for multicolor display in the future
    • …
    corecore