232 research outputs found
Human aging, DNA methylation, and telomere length: Investigating indices of biological aging
publishedVersio
Learning to Detour: Shortcut Mitigating Augmentation for Weakly Supervised Semantic Segmentation
Weakly supervised semantic segmentation (WSSS) employing weak forms of labels
has been actively studied to alleviate the annotation cost of acquiring
pixel-level labels. However, classifiers trained on biased datasets tend to
exploit shortcut features and make predictions based on spurious correlations
between certain backgrounds and objects, leading to a poor generalization
performance. In this paper, we propose shortcut mitigating augmentation (SMA)
for WSSS, which generates synthetic representations of object-background
combinations not seen in the training data to reduce the use of shortcut
features. Our approach disentangles the object-relevant and background
features. We then shuffle and combine the disentangled representations to
create synthetic features of diverse object-background combinations.
SMA-trained classifier depends less on contexts and focuses more on the target
object when making predictions. In addition, we analyzed the behavior of the
classifier on shortcut usage after applying our augmentation using an
attribution method-based metric. The proposed method achieved the improved
performance of semantic segmentation result on PASCAL VOC 2012 and MS COCO 2014
datasets.Comment: Accepted to WACV 202
Towards Flexible Inductive Bias via Progressive Reparameterization Scheduling
There are two de facto standard architectures in recent computer vision:
Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs). Strong
inductive biases of convolutions help the model learn sample effectively, but
such strong biases also limit the upper bound of CNNs when sufficient data are
available. On the contrary, ViT is inferior to CNNs for small data but superior
for sufficient data. Recent approaches attempt to combine the strengths of
these two architectures. However, we show these approaches overlook that the
optimal inductive bias also changes according to the target data scale changes
by comparing various models' accuracy on subsets of sampled ImageNet at
different ratios. In addition, through Fourier analysis of feature maps, the
model's response patterns according to signal frequency changes, we observe
which inductive bias is advantageous for each data scale. The more
convolution-like inductive bias is included in the model, the smaller the data
scale is required where the ViT-like model outperforms the ResNet performance.
To obtain a model with flexible inductive bias on the data scale, we show
reparameterization can interpolate inductive bias between convolution and
self-attention. By adjusting the number of epochs the model stays in the
convolution, we show that reparameterization from convolution to self-attention
interpolates the Fourier analysis pattern between CNNs and ViTs. Adapting these
findings, we propose Progressive Reparameterization Scheduling (PRS), in which
reparameterization adjusts the required amount of convolution-like or
self-attention-like inductive bias per layer. For small-scale datasets, our PRS
performs reparameterization from convolution to self-attention linearly faster
at the late stage layer. PRS outperformed previous studies on the small-scale
dataset, e.g., CIFAR-100.Comment: Accepted at VIPriors ECCVW 2022, camera-ready versio
Addressing Negative Transfer in Diffusion Models
Diffusion-based generative models have achieved remarkable success in various
domains. It trains a model on denoising tasks that encompass different noise
levels simultaneously, representing a form of multi-task learning (MTL).
However, analyzing and improving diffusion models from an MTL perspective
remains under-explored. In particular, MTL can sometimes lead to the well-known
phenomenon of , which results in the performance
degradation of certain tasks due to conflicts between tasks. In this paper, we
aim to analyze diffusion training from an MTL standpoint, presenting two key
observations: the task affinity between denoising tasks
diminishes as the gap between noise levels widens, and negative
transfer can arise even in the context of diffusion training. Building upon
these observations, our objective is to enhance diffusion training by
mitigating negative transfer. To achieve this, we propose leveraging existing
MTL methods, but the presence of a huge number of denoising tasks makes this
computationally expensive to calculate the necessary per-task loss or gradient.
To address this challenge, we propose clustering the denoising tasks into small
task clusters and applying MTL methods to them. Specifically, based on
, we employ interval clustering to enforce temporal proximity
among denoising tasks within clusters. We show that interval clustering can be
solved with dynamic programming and utilize signal-to-noise ratio, timestep,
and task affinity for clustering objectives. Through this, our approach
addresses the issue of negative transfer in diffusion models by allowing for
efficient computation of MTL methods. We validate the proposed clustering and
its integration with MTL methods through various experiments, demonstrating
improved sample quality of diffusion models.Comment: 22 pages, 12 figures, under revie
Towards Practical Plug-and-Play Diffusion Models
Diffusion-based generative models have achieved remarkable success in image
generation. Their guidance formulation allows an external model to
plug-and-play control the generation process for various tasks without
fine-tuning the diffusion model. However, the direct use of publicly available
off-the-shelf models for guidance fails due to their poor performance on noisy
inputs. For that, the existing practice is to fine-tune the guidance models
with labeled data corrupted with noises. In this paper, we argue that this
practice has limitations in two aspects: (1) performing on inputs with
extremely various noises is too hard for a single model; (2) collecting labeled
datasets hinders scaling up for various tasks. To tackle the limitations, we
propose a novel strategy that leverages multiple experts where each expert is
specialized in a particular noise range and guides the reverse process at its
corresponding timesteps. However, as it is infeasible to manage multiple
networks and utilize labeled data, we present a practical guidance framework
termed Practical Plug-And-Play (PPAP), which leverages parameter-efficient
fine-tuning and data-free knowledge transfer. We exhaustively conduct ImageNet
class conditional generation experiments to show that our method can
successfully guide diffusion with small trainable parameters and no labeled
data. Finally, we show that image classifiers, depth estimators, and semantic
segmentation models can guide publicly available GLIDE through our framework in
a plug-and-play manner
Maternal and paternal anxiety during pregnancy: Comparing the effects on behavioral problems in offspring
publishedVersio
Multi-Architecture Multi-Expert Diffusion Models
Diffusion models have achieved impressive results in generating diverse and
realistic data by employing multi-step denoising processes. However, the need
for accommodating significant variations in input noise at each time-step has
led to diffusion models requiring a large number of parameters for their
denoisers. We have observed that diffusion models effectively act as filters
for different frequency ranges at each time-step noise. While some previous
works have introduced multi-expert strategies, assigning denoisers to different
noise intervals, they overlook the importance of specialized operations for
high and low frequencies. For instance, self-attention operations are effective
at handling low-frequency components (low-pass filters), while convolutions
excel at capturing high-frequency features (high-pass filters). In other words,
existing diffusion models employ denoisers with the same architecture, without
considering the optimal operations for each time-step noise. To address this
limitation, we propose a novel approach called Multi-architecturE Multi-Expert
(MEME), which consists of multiple experts with specialized architectures
tailored to the operations required at each time-step interval. Through
extensive experiments, we demonstrate that MEME outperforms large competitors
in terms of both generation performance and computational efficiency
Blood-based epigenetic estimators of chronological age in human adults using DNA methylation data from the Illumina MethylationEPIC array
publishedVersio
- …
