259 research outputs found
Diffusion Model as Representation Learner
Diffusion Probabilistic Models (DPMs) have recently demonstrated impressive
results on various generative tasks.Despite its promises, the learned
representations of pre-trained DPMs, however, have not been fully understood.
In this paper, we conduct an in-depth investigation of the representation power
of DPMs, and propose a novel knowledge transfer method that leverages the
knowledge acquired by generative DPMs for recognition tasks. Our study begins
by examining the feature space of DPMs, revealing that DPMs are inherently
denoising autoencoders that balance the representation learning with
regularizing model capacity. To this end, we introduce a novel knowledge
transfer paradigm named RepFusion. Our paradigm extracts representations at
different time steps from off-the-shelf DPMs and dynamically employs them as
supervision for student networks, in which the optimal time is determined
through reinforcement learning. We evaluate our approach on several image
classification, semantic segmentation, and landmark detection benchmarks, and
demonstrate that it outperforms state-of-the-art methods. Our results uncover
the potential of DPMs as a powerful tool for representation learning and
provide insights into the usefulness of generative models beyond sample
generation. The code is available at
\url{https://github.com/Adamdad/Repfusion}.Comment: Accepted by ICCV 202
Relation Rectification in Diffusion Model
Despite their exceptional generative abilities, large text-to-image diffusion
models, much like skilled but careless artists, often struggle with accurately
depicting visual relationships between objects. This issue, as we uncover
through careful analysis, arises from a misaligned text encoder that struggles
to interpret specific relationships and differentiate the logical order of
associated objects. To resolve this, we introduce a novel task termed Relation
Rectification, aiming to refine the model to accurately represent a given
relationship it initially fails to generate. To address this, we propose an
innovative solution utilizing a Heterogeneous Graph Convolutional Network
(HGCN). It models the directional relationships between relation terms and
corresponding objects within the input prompts. Specifically, we optimize the
HGCN on a pair of prompts with identical relational words but reversed object
orders, supplemented by a few reference images. The lightweight HGCN adjusts
the text embeddings generated by the text encoder, ensuring the accurate
reflection of the textual relation in the embedding space. Crucially, our
method retains the parameters of the text encoder and diffusion model,
preserving the model's robust performance on unrelated descriptions. We
validated our approach on a newly curated dataset of diverse relational data,
demonstrating both quantitative and qualitative enhancements in generating
images with precise visual relations. Project page:
https://wuyinwei-hah.github.io/rrnet.github.io/
Neural Point Process for Learning Spatiotemporal Event Dynamics
Learning the dynamics of spatiotemporal events is a fundamental problem.
Neural point processes enhance the expressivity of point process models with
deep neural networks. However, most existing methods only consider temporal
dynamics without spatial modeling. We propose Deep Spatiotemporal Point Process
(\ours{}), a deep dynamics model that integrates spatiotemporal point
processes. Our method is flexible, efficient, and can accurately forecast
irregularly sampled events over space and time. The key construction of our
approach is the nonparametric space-time intensity function, governed by a
latent process. The intensity function enjoys closed form integration for the
density. The latent process captures the uncertainty of the event sequence. We
use amortized variational inference to infer the latent process with deep
networks. Using synthetic datasets, we validate our model can accurately learn
the true intensity function. On real-world benchmark datasets, our model
demonstrates superior performance over state-of-the-art baselines. Our code and
data can be found at the https://github.com/Rose-STL-Lab/DeepSTPP
EQ-Net: Elastic Quantization Neural Networks
Current model quantization methods have shown their promising capability in
reducing storage space and computation complexity. However, due to the
diversity of quantization forms supported by different hardware, one limitation
of existing solutions is that usually require repeated optimization for
different scenarios. How to construct a model with flexible quantization forms
has been less studied. In this paper, we explore a one-shot network
quantization regime, named Elastic Quantization Neural Networks (EQ-Net), which
aims to train a robust weight-sharing quantization supernet. First of all, we
propose an elastic quantization space (including elastic bit-width,
granularity, and symmetry) to adapt to various mainstream quantitative forms.
Secondly, we propose the Weight Distribution Regularization Loss (WDR-Loss) and
Group Progressive Guidance Loss (GPG-Loss) to bridge the inconsistency of the
distribution for weights and output logits in the elastic quantization space
gap. Lastly, we incorporate genetic algorithms and the proposed Conditional
Quantization-Aware Accuracy Predictor (CQAP) as an estimator to quickly search
mixed-precision quantized neural networks in supernet. Extensive experiments
demonstrate that our EQ-Net is close to or even better than its static
counterparts as well as state-of-the-art robust bit-width methods. Code can be
available at
\href{https://github.com/xuke225/EQ-Net.git}{https://github.com/xuke225/EQ-Net}
Towards Personalized Federated Learning via Heterogeneous Model Reassembly
This paper focuses on addressing the practical yet challenging problem of
model heterogeneity in federated learning, where clients possess models with
different network structures. To track this problem, we propose a novel
framework called pFedHR, which leverages heterogeneous model reassembly to
achieve personalized federated learning. In particular, we approach the problem
of heterogeneous model personalization as a model-matching optimization task on
the server side. Moreover, pFedHR automatically and dynamically generates
informative and diverse personalized candidates with minimal human
intervention. Furthermore, our proposed heterogeneous model reassembly
technique mitigates the adverse impact introduced by using public data with
different distributions from the client data to a certain extent. Experimental
results demonstrate that pFedHR outperforms baselines on three datasets under
both IID and Non-IID settings. Additionally, pFedHR effectively reduces the
adverse impact of using different public data and dynamically generates diverse
personalized models in an automated manner
- …