419 research outputs found
WarpedGANSpace: Finding non-linear RBF paths in GAN latent space
This work addresses the problem of discovering, in an unsupervised manner, interpretable paths in the latent space of pretrained GANs, so as to provide an intuitive and easy way of controlling the underlying generative factors. In doing so, it addresses some of the limitations of the state-of-the-art works, namely, a) that they discover directions that are independent of the latent code, i.e., paths that are linear, and b) that their evaluation relies either on visual inspection or on laborious human labeling. More specifically, we propose to learn non-linear warpings on the latent space, each one parametrized by a set of RBF-based latent space warping functions, and where each warping gives rise to a family of non-linear paths via the gradient of the function. Building on the work of Voynov and Babenko that discovers linear paths, we optimize the trainable parameters of the set of RBFs, so as that images that are generated by codes along different paths, are easily distinguishable by a discriminator network. This leads to easily distinguishable image transformations, such as pose and facial expressions in facial images. We show that linear paths can be derived as a special case of our method, and show experimentally that non-linear paths in the latent space lead to steeper, more disentangled and interpretable changes in the image space than in state-of-the art methods, both qualitatively and quantitatively. We make the code and the pretrained models publicly available at: https://github.com/chi0tzp/WarpedGANSpace
Phenex: Ontological Annotation of Phenotypic Diversity
Phenex is a platform-independent desktop application designed to facilitate efficient and consistent annotation of phenotypic variation using Entity-Quality syntax, drawing on terms from community ontologies for anatomical entities, phenotypic qualities, and taxonomic names. Despite the centrality of the phenotype to so much of biology, traditions for communicating information about phenotypes are idiosyncratic to different disciplines. Phenotypes seem to elude standardized descriptions due to the variety of traits that compose them and the difficulty of capturing the complex forms and subtle differences among organisms that we can readily observe. Consequently, phenotypes are refractory to attempts at data integration that would allow computational analyses across studies and study systems. Phenex addresses this problem by allowing scientists to employ standard ontologies and syntax to link computable phenotype annotations to evolutionary character matrices, as well as to link taxa and specimens to ontological identifiers. Ontologies have become a foundational technology for establishing shared semantics, and, more generally, for capturing and computing with biological knowledge
Fs-detr: Few-shot detection transformer with prompting and without re-training
This paper is on Few-Shot Object Detection (FSOD),
where given a few templates (examples) depicting a novel
class (not seen during training), the goal is to detect all
of its occurrences within a set of images. From a practical perspective, an FSOD system must fulfil the following
desiderata: (a) it must be used as is, without requiring any
fine-tuning at test time, (b) it must be able to process an arbitrary number of novel objects concurrently while supporting
an arbitrary number of examples from each class and (c) it
must achieve accuracy comparable to a closed system. Towards satisfying (a)-(c), in this work, we make the following
contributions: We introduce, for the first time, a simple, yet
powerful, few-shot detection transformer (FS-DETR) based
on visual prompting that can address both desiderata (a) and
(b). Our system builds upon the DETR framework, extending it based on two key ideas: (1) feed the provided visual
templates of the novel classes as visual prompts during test
time, and (2) “stamp” these prompts with pseudo-class embeddings (akin to soft prompting), which are then predicted
at the output of the decoder. Importantly, we show that
our system is not only more flexible than existing methods,
but also, it makes a step towards satisfying desideratum (c).
Specifically, it is significantly more accurate than all methods that do not require fine-tuning and even matches and
outperforms the current state-of-the-art fine-tuning based
methods on the most well-established benchmarks (PASCAL
VOC & MSCOCO)
ReGen: A good Generative zero-shot video classifier should be Rewarded
This paper sets out to solve the following problem: How
can we turn a generative video captioning model into an
open-world video/action classification model? Video captioning models can naturally produce open-ended free-form
descriptions of a given video which, however, might not be
discriminative enough for video/action recognition. Unfortunately, when fine-tuned to auto-regress the class names
directly, video captioning models overfit the base classes
losing their open-world zero-shot capabilities. To alleviate
base class overfitting, in this work, we propose to use reinforcement learning to enforce the output of the video captioning model to be more class-level discriminative. Specifically, we propose ReGen, a novel reinforcement learning
based framework with a three-fold objective and reward
functions: (1) a class-level discrimination reward that enforces the generated caption to be correctly classified into
the corresponding action class, (2) a CLIP reward that encourages the generated caption to continue to be descriptive
of the input video (i.e. video-specific), and (3) a grammar
reward that preserves the grammatical correctness of the
caption. We show that ReGen can train a model to produce
captions that are: discriminative, video-specific and grammatically correct. Importantly, when evaluated on standard
benchmarks for zero- and few-shot action classification, ReGen significantly outperforms the previous state-of-the-art
HyperReenact: one-shot reenactment via jointly learning to refine and retarget faces
In this paper, we present our method for neural face
reenactment, called HyperReenact, that aims to generate
realistic talking head images of a source identity, driven
by a target facial pose. Existing state-of-the-art face reenactment methods train controllable generative models that
learn to synthesize realistic facial images, yet producing
reenacted faces that are prone to significant visual artifacts,
especially under the challenging condition of extreme head
pose changes, or requiring expensive few-shot fine-tuning
to better preserve the source identity characteristics. We
propose to address these limitations by leveraging the photorealistic generation ability and the disentangled properties of a pretrained StyleGAN2 generator, by first inverting
the real images into its latent space and then using a hypernetwork to perform: (i) refinement of the source identity characteristics and (ii) facial pose re-targeting, eliminating this way the dependence on external editing methods that typically produce artifacts. Our method operates under the one-shot setting (i.e., using a single source
frame) and allows for cross-subject reenactment, without
requiring any subject-specific fine-tuning. We compare
our method both quantitatively and qualitatively against
several state-of-the-art techniques on the standard benchmarks of VoxCeleb1 and VoxCeleb2, demonstrating the superiority of our approach in producing artifact-free images, exhibiting remarkable robustness even under extreme
head pose changes. We make the code and the pretrained
models publicly available at: https://github.com/
StelaBou/HyperReenact
- …