15 research outputs found
DeltaSpace: A Semantic-aligned Feature Space for Flexible Text-guided Image Editing
Text-guided image editing faces significant challenges to training and
inference flexibility. Much literature collects large amounts of annotated
image-text pairs to train text-conditioned generative models from scratch,
which is expensive and not efficient. After that, some approaches that leverage
pre-trained vision-language models are put forward to avoid data collection,
but they are also limited by either per text-prompt optimization or
inference-time hyper-parameters tuning. To address these issues, we investigate
and identify a specific space, referred to as CLIP DeltaSpace, where the CLIP
visual feature difference of two images is semantically aligned with the CLIP
textual feature difference of their corresponding text descriptions. Based on
DeltaSpace, we propose a novel framework called DeltaEdit, which maps the CLIP
visual feature differences to the latent space directions of a generative model
during the training phase, and predicts the latent space directions from the
CLIP textual feature differences during the inference phase. And this design
endows DeltaEdit with two advantages: (1) text-free training; (2)
generalization to various text prompts for zero-shot inference. Extensive
experiments validate the effectiveness and versatility of DeltaEdit with
different generative models, including both the GAN model and the diffusion
model, in achieving flexible text-guided image editing. Code is available at
https://github.com/Yueming6568/DeltaEdit.Comment: 17 pages. arXiv admin note: text overlap with arXiv:2303.0628
DeltaEdit: Exploring Text-free Training for Text-Driven Image Manipulation
Text-driven image manipulation remains challenging in training or inference
flexibility. Conditional generative models depend heavily on expensive
annotated training data. Meanwhile, recent frameworks, which leverage
pre-trained vision-language models, are limited by either per text-prompt
optimization or inference-time hyper-parameters tuning. In this work, we
propose a novel framework named \textit{DeltaEdit} to address these problems.
Our key idea is to investigate and identify a space, namely delta image and
text space that has well-aligned distribution between CLIP visual feature
differences of two images and CLIP textual embedding differences of source and
target texts. Based on the CLIP delta space, the DeltaEdit network is designed
to map the CLIP visual features differences to the editing directions of
StyleGAN at training phase. Then, in inference phase, DeltaEdit predicts the
StyleGAN's editing directions from the differences of the CLIP textual
features. In this way, DeltaEdit is trained in a text-free manner. Once
trained, it can well generalize to various text prompts for zero-shot inference
without bells and whistles. Code is available at
https://github.com/Yueming6568/DeltaEdit.Comment: Accepted by CVPR2023. Code is available at
https://github.com/Yueming6568/DeltaEdi
Earning Extra Performance from Restrictive Feedbacks
Many machine learning applications encounter a situation where model
providers are required to further refine the previously trained model so as to
gratify the specific need of local users. This problem is reduced to the
standard model tuning paradigm if the target data is permissibly fed to the
model. However, it is rather difficult in a wide range of practical cases where
target data is not shared with model providers but commonly some evaluations
about the model are accessible. In this paper, we formally set up a challenge
named \emph{Earning eXtra PerformancE from restriCTive feEDdbacks} (EXPECTED)
to describe this form of model tuning problems. Concretely, EXPECTED admits a
model provider to access the operational performance of the candidate model
multiple times via feedback from a local user (or a group of users). The goal
of the model provider is to eventually deliver a satisfactory model to the
local user(s) by utilizing the feedbacks. Unlike existing model tuning methods
where the target data is always ready for calculating model gradients, the
model providers in EXPECTED only see some feedbacks which could be as simple as
scalars, such as inference accuracy or usage rate. To enable tuning in this
restrictive circumstance, we propose to characterize the geometry of the model
performance with regard to model parameters through exploring the parameters'
distribution. In particular, for the deep models whose parameters distribute
across multiple layers, a more query-efficient algorithm is further
tailor-designed that conducts layerwise tuning with more attention to those
layers which pay off better. Our theoretical analyses justify the proposed
algorithms from the aspects of both efficacy and efficiency. Extensive
experiments on different applications demonstrate that our work forges a sound
solution to the EXPECTED problem.Comment: Accepted by IEEE TPAMI in April 202
Efficient and Robust Black-box Integral-approximation and Optimization
University of Technology Sydney. Faculty of Engineering and Information Technology.Black-box optimization and black-box integral approximation are important techniques for machine learning, industrial design, and simulation in science. This thesis investigates black-box integral approximation and black-box optimization by considering the closed relationship between them. For integral approximation, we develop a simple closed-form rank-1 lattice construction method based on group theory. Our method reduces the number of distinct pairwise distance values to generate a more regular lattice. Furthermore, we investigate structured points set for integral approximation on hyper-sphere. Our structured point sets can serve as a good initialization for black-box optimization. Moreover, we propose stochastic black-box optimization with implicit natural gradients for black-box optimization. Our method is very simple and has only the step-size hyper-parameter. Furthermore, we develop a batch Bayesian optimization algorithm from the perspective of frequentist kernel methods, which is powerful for low-dimensional black-box optimization problems. We further apply our structured integral approximation techniques for kernel approximation. In addition, we develop structured approximation for robust deep neural network architecture, which results in an elegant and simple architecture that preserves optimization properties. Moreover, we develop adaptive loss as a tighter upper bound approximation for expected 0-1 risk, robust and trainable with SGD
A cone order sequence based multi-objective evolutionary algorithm
A cone order sequence based MOEA (CS-MOEA) is proposed to deal with the multi-objective optimization problems. Instead of only using the Pareto dominance, it constructs a sequence of cone order to balance the search diversity and convergence. By gradually increasing the open angle of the cone order, it approximates the Pareto cone gradually. A simple formula for judging the θ-cone dominance is derived, which is easy to be computed. Moreover, an energy model is introduced for the selection of individuals to maintain population diversity. Experiments on more than 10 problems (i.e. zdt and dtlz benchmark problem sets) demonstrate that the proposed method is competitive, compared with Stable Matching MOEA/D (STM-MOEA/D) and MOEA/D-DE