64 research outputs found
Person Transfer GAN to Bridge Domain Gap for Person Re-Identification
Although the performance of person Re-Identification (ReID) has been
significantly boosted, many challenging issues in real scenarios have not been
fully investigated, e.g., the complex scenes and lighting variations, viewpoint
and pose changes, and the large number of identities in a camera network. To
facilitate the research towards conquering those issues, this paper contributes
a new dataset called MSMT17 with many important features, e.g., 1) the raw
videos are taken by an 15-camera network deployed in both indoor and outdoor
scenes, 2) the videos cover a long period of time and present complex lighting
variations, and 3) it contains currently the largest number of annotated
identities, i.e., 4,101 identities and 126,441 bounding boxes. We also observe
that, domain gap commonly exists between datasets, which essentially causes
severe performance drop when training and testing on different datasets. This
results in that available training data cannot be effectively leveraged for new
testing domains. To relieve the expensive costs of annotating new training
samples, we propose a Person Transfer Generative Adversarial Network (PTGAN) to
bridge the domain gap. Comprehensive experiments show that the domain gap could
be substantially narrowed-down by the PTGAN.Comment: 10 pages, 9 figures; accepted in CVPR 201
Degeneration-Tuning: Using Scrambled Grid shield Unwanted Concepts from Stable Diffusion
Owing to the unrestricted nature of the content in the training data, large
text-to-image diffusion models, such as Stable Diffusion (SD), are capable of
generating images with potentially copyrighted or dangerous content based on
corresponding textual concepts information. This includes specific intellectual
property (IP), human faces, and various artistic styles. However, Negative
Prompt, a widely used method for content removal, frequently fails to conceal
this content due to inherent limitations in its inference logic. In this work,
we propose a novel strategy named \textbf{Degeneration-Tuning (DT)} to shield
contents of unwanted concepts from SD weights. By utilizing Scrambled Grid to
reconstruct the correlation between undesired concepts and their corresponding
image domain, we guide SD to generate meaningless content when such textual
concepts are provided as input. As this adaptation occurs at the level of the
model's weights, the SD, after DT, can be grafted onto other conditional
diffusion frameworks like ControlNet to shield unwanted concepts. In addition
to qualitatively showcasing the effectiveness of our DT method in protecting
various types of concepts, a quantitative comparison of the SD before and after
DT indicates that the DT method does not significantly impact the generative
quality of other contents. The FID and IS scores of the model on COCO-30K
exhibit only minor changes after DT, shifting from 12.61 and 39.20 to 13.04 and
38.25, respectively, which clearly outperforms the previous methods
Network Adjustment: Channel Search Guided by FLOPs Utilization Ratio
Automatic designing computationally efficient neural networks has received
much attention in recent years. Existing approaches either utilize network
pruning or leverage the network architecture search methods. This paper
presents a new framework named network adjustment, which considers network
accuracy as a function of FLOPs, so that under each network configuration, one
can estimate the FLOPs utilization ratio (FUR) for each layer and use it to
determine whether to increase or decrease the number of channels on the layer.
Note that FUR, like the gradient of a non-linear function, is accurate only in
a small neighborhood of the current network. Hence, we design an iterative
mechanism so that the initial network undergoes a number of steps, each of
which has a small `adjusting rate' to control the changes to the network. The
computational overhead of the entire search process is reasonable, i.e.,
comparable to that of re-training the final model from scratch. Experiments on
standard image classification datasets and a wide range of base networks
demonstrate the effectiveness of our approach, which consistently outperforms
the pruning counterpart. The code is available at
https://github.com/danczs/NetworkAdjustment
Fitting the Search Space of Weight-sharing NAS with Graph Convolutional Networks
Neural architecture search has attracted wide attentions in both academia and
industry. To accelerate it, researchers proposed weight-sharing methods which
first train a super-network to reuse computation among different operators,
from which exponentially many sub-networks can be sampled and efficiently
evaluated. These methods enjoy great advantages in terms of computational
costs, but the sampled sub-networks are not guaranteed to be estimated
precisely unless an individual training process is taken. This paper owes such
inaccuracy to the inevitable mismatch between assembled network layers, so that
there is a random error term added to each estimation. We alleviate this issue
by training a graph convolutional network to fit the performance of sampled
sub-networks so that the impact of random errors becomes minimal. With this
strategy, we achieve a higher rank correlation coefficient in the selected set
of candidates, which consequently leads to better performance of the final
architecture. In addition, our approach also enjoys the flexibility of being
used under different hardware constraints, since the graph convolutional
network has provided an efficient lookup table of the performance of
architectures in the entire search space.Comment: Accepted to AAAI 202
Gradient-Regulated Meta-Prompt Learning for Generalizable Vision-Language Models
Prompt tuning, a recently emerging paradigm, enables the powerful
vision-language pre-training models to adapt to downstream tasks in a parameter
-- and data -- efficient way, by learning the ``soft prompts'' to condition
frozen pre-training models. Though effective, it is particularly problematic in
the few-shot scenario, where prompt tuning performance is sensitive to the
initialization and requires a time-consuming process to find a good
initialization, thus restricting the fast adaptation ability of the
pre-training models. In addition, prompt tuning could undermine the
generalizability of the pre-training models, because the learnable prompt
tokens are easy to overfit to the limited training samples. To address these
issues, we introduce a novel Gradient-RegulAted Meta-prompt learning (GRAM)
framework that jointly meta-learns an efficient soft prompt initialization for
better adaptation and a lightweight gradient regulating function for strong
cross-domain generalizability in a meta-learning paradigm using only the
unlabeled image-text pre-training data. Rather than designing a specific prompt
tuning method, our GRAM can be easily incorporated into various prompt tuning
methods in a model-agnostic way, and comprehensive experiments show that GRAM
brings about consistent improvement for them in several settings (i.e.,
few-shot learning, cross-domain generalization, cross-dataset generalization,
etc.) over 11 datasets. Further, experiments show that GRAM enables the
orthogonal methods of textual and visual prompt tuning to work in a
mutually-enhanced way, offering better generalizability beyond the uni-modal
prompt tuning methods.Comment: Accepted by ICCV 202
- …