21 research outputs found
DREAM: Efficient Dataset Distillation by Representative Matching
Dataset distillation aims to synthesize small datasets with little
information loss from original large-scale ones for reducing storage and
training costs. Recent state-of-the-art methods mainly constrain the sample
synthesis process by matching synthetic images and the original ones regarding
gradients, embedding distributions, or training trajectories. Although there
are various matching objectives, currently the strategy for selecting original
images is limited to naive random sampling.
We argue that random sampling overlooks the evenness of the selected sample
distribution, which may result in noisy or biased matching targets.
Besides, the sample diversity is also not constrained by random sampling.
These factors together lead to optimization instability in the distilling
process and degrade the training efficiency. Accordingly, we propose a novel
matching strategy named as \textbf{D}ataset distillation by
\textbf{RE}present\textbf{A}tive \textbf{M}atching (DREAM), where only
representative original images are selected for matching. DREAM is able to be
easily plugged into popular dataset distillation frameworks and reduce the
distilling iterations by more than 8 times without performance drop. Given
sufficient training time, DREAM further provides significant improvements and
achieves state-of-the-art performances.Comment: Efficient matching for dataset distillatio
Color Prompting for Data-Free Continual Unsupervised Domain Adaptive Person Re-Identification
Unsupervised domain adaptive person re-identification (Re-ID) methods
alleviate the burden of data annotation through generating pseudo supervision
messages. However, real-world Re-ID systems, with continuously accumulating
data streams, simultaneously demand more robust adaptation and anti-forgetting
capabilities. Methods based on image rehearsal addresses the forgetting issue
with limited extra storage but carry the risk of privacy leakage. In this work,
we propose a Color Prompting (CoP) method for data-free continual unsupervised
domain adaptive person Re-ID. Specifically, we employ a light-weighted prompter
network to fit the color distribution of the current task together with Re-ID
training. Then for the incoming new tasks, the learned color distribution
serves as color style transfer guidance to transfer the images into past
styles. CoP achieves accurate color style recovery for past tasks with adequate
data diversity, leading to superior anti-forgetting effects compared with image
rehearsal methods. Moreover, CoP demonstrates strong generalization performance
for fast adaptation into new domains, given only a small amount of unlabeled
images. Extensive experiments demonstrate that after the continual training
pipeline the proposed CoP achieves 6.7% and 8.1% average rank-1 improvements
over the replay method on seen and unseen domains, respectively. The source
code for this work is publicly available in
https://github.com/vimar-gu/ColorPromptReID
DREAM+: Efficient Dataset Distillation by Bidirectional Representative Matching
Dataset distillation plays a crucial role in creating compact datasets with
similar training performance compared with original large-scale ones. This is
essential for addressing the challenges of data storage and training costs.
Prevalent methods facilitate knowledge transfer by matching the gradients,
embedding distributions, or training trajectories of synthetic images with
those of the sampled original images. Although there are various matching
objectives, currently the strategy for selecting original images is limited to
naive random sampling. We argue that random sampling overlooks the evenness of
the selected sample distribution, which may result in noisy or biased matching
targets. Besides, the sample diversity is also not constrained by random
sampling. Additionally, current methods predominantly focus on
single-dimensional matching, where information is not fully utilized. To
address these challenges, we propose a novel matching strategy called Dataset
Distillation by Bidirectional REpresentAtive Matching (DREAM+), which selects
representative original images for bidirectional matching. DREAM+ is applicable
to a variety of mainstream dataset distillation frameworks and significantly
reduces the number of distillation iterations by more than 15 times without
affecting performance. Given sufficient training time, DREAM+ can further
improve the performance and achieve state-of-the-art results. We have released
the code at github.com/NUS-HPC-AI-Lab/DREAM+.Comment: This is an extension of the ICCV conference versio
Dataset Quantization
State-of-the-art deep neural networks are trained with large amounts
(millions or even billions) of data. The expensive computation and memory costs
make it difficult to train them on limited hardware resources, especially for
recent popular large language models (LLM) and computer vision models (CV).
Recent popular dataset distillation methods are thus developed, aiming to
reduce the number of training samples via synthesizing small-scale datasets via
gradient matching. However, as the gradient calculation is coupled with the
specific network architecture, the synthesized dataset is biased and performs
poorly when used for training unseen architectures. To address these
limitations, we present dataset quantization (DQ), a new framework to compress
large-scale datasets into small subsets which can be used for training any
neural network architectures. Extensive experiments demonstrate that DQ is able
to generate condensed small datasets for training unseen network architectures
with state-of-the-art compression ratios for lossless model training. To the
best of our knowledge, DQ is the first method that can successfully distill
large-scale datasets such as ImageNet-1k with a state-of-the-art compression
ratio. Notably, with 60% data from ImageNet and 20% data from Alpaca's
instruction tuning data, the models can be trained with negligible or no
performance drop for both vision tasks (including classification, semantic
segmentation, and object detection) as well as language tasks (including
instruction tuning tasks such as BBH and DROP).Comment: 9 page
Dynamic Gradient Reactivation for Backward Compatible Person Re-identification
We study the backward compatible problem for person re-identification
(Re-ID), which aims to constrain the features of an updated new model to be
comparable with the existing features from the old model in galleries. Most of
the existing works adopt distillation-based methods, which focus on pushing new
features to imitate the distribution of the old ones. However, the
distillation-based methods are intrinsically sub-optimal since it forces the
new feature space to imitate the inferior old feature space. To address this
issue, we propose the Ranking-based Backward Compatible Learning (RBCL), which
directly optimizes the ranking metric between new features and old features.
Different from previous methods, RBCL only pushes the new features to find
best-ranking positions in the old feature space instead of strictly alignment,
and is in line with the ultimate goal of backward retrieval. However, the sharp
sigmoid function used to make the ranking metric differentiable also incurs the
gradient vanish issue, therefore stems the ranking refinement during the later
period of training. To address this issue, we propose the Dynamic Gradient
Reactivation (DGR), which can reactivate the suppressed gradients by adding
dynamic computed constant during forward step. To further help targeting the
best-ranking positions, we include the Neighbor Context Agents (NCAs) to
approximate the entire old feature space during training. Unlike previous works
which only test on the in-domain settings, we make the first attempt to
introduce the cross-domain settings (including both supervised and
unsupervised), which are more meaningful and difficult. The experimental
results on all five settings show that the proposed RBCL outperforms previous
state-of-the-art methods by large margins under all settings.Comment: Submitted to Pattern Recognition on Dec 06, 2021. Under Revie
The Effects of Warming-Shifted Plant Phenology on Ecosystem Carbon Exchange Are Regulated by Precipitation in a Semi-Arid Grassland
BACKGROUND: The longer growing season under climate warming has served as a crucial mechanism for the enhancement of terrestrial carbon (C) sink over the past decades. A better understanding of this mechanism is critical for projection of changes in C cycling of terrestrial ecosystems. METHODOLOGY/PRINCIPAL FINDINGS: A 4-year field experiment with day and night warming was conducted to examine the responses of plant phenology and their influences on plant coverage and ecosystem C cycling in a temperate steppe in northern China. Greater phenological responses were observed under night than day warming. Both day and night warming prolonged the growing season by advancing phenology of early-blooming species but without changing that of late-blooming species. However, no warming response of vegetation coverage was found for any of the eight species. The variances in species-level coverage and ecosystem C fluxes under different treatments were positively dependent upon the accumulated precipitation within phenological duration but not the length of phenological duration. CONCLUSIONS/SIGNIFICANCE: These plants' phenology is more sensitive to night than day warming, and the warming effects on ecosystem C exchange via shifting plant phenology could be mediated by precipitation patterns in semi-arid grasslands