123 research outputs found
Estimating continuous affect with label uncertainty
Continuous affect estimation is a problem where there is an inherent uncertainty and subjectivity in the labels that accompany data samples -- typically, datasets use the average of multiple annotations or self-reporting to obtain ground truth labels. In this work, we propose a method for uncertainty-aware continuous affect estimation, that models explicitly the uncertainty of the ground truth label as a uni-variate Gaussian with mean equal to the ground truth label, and unknown variance. For each sample, the proposed neural network estimates not only the value of the target label (valence and arousal in our case), but also the variance. The network is trained with a loss that is defined as the KL-divergence between the estimation (valence/arousal) and the Gaussian around the ground truth. We show that, in two affect recognition problems with real data, the estimated variances are correlated with measures of uncertainty/error in the labels that are extracted by considering multiple annotations of the data
Linear Maximum Margin Classifier for Learning from Uncertain Data
In this paper, we propose a maximum margin classifier that deals with
uncertainty in data input. More specifically, we reformulate the SVM framework
such that each training example can be modeled by a multi-dimensional Gaussian
distribution described by its mean vector and its covariance matrix -- the
latter modeling the uncertainty. We address the classification problem and
define a cost function that is the expected value of the classical SVM cost
when data samples are drawn from the multi-dimensional Gaussian distributions
that form the set of the training examples. Our formulation approximates the
classical SVM formulation when the training examples are isotropic Gaussians
with variance tending to zero. We arrive at a convex optimization problem,
which we solve efficiently in the primal form using a stochastic gradient
descent approach. The resulting classifier, which we name SVM with Gaussian
Sample Uncertainty (SVM-GSU), is tested on synthetic data and five publicly
available and popular datasets; namely, the MNIST, WDBC, DEAP, TV News Channel
Commercial Detection, and TRECVID MED datasets. Experimental results verify the
effectiveness of the proposed method.Comment: IEEE Transactions on Pattern Analysis and Machine Intelligence. (c)
2017 IEEE. DOI: 10.1109/TPAMI.2017.2772235 Author's accepted version. The
final publication is available at
http://ieeexplore.ieee.org/document/8103808
DiffusionAct: Controllable Diffusion Autoencoder for One-shot Face Reenactment.
Video-driven neural face reenactment aims to synthesize realistic facial images that successfully preserve the identity and appearance of a source face, while transferring the target head pose and facial expressions. Existing GAN-based methods suffer from either distortions and visual artifacts or poor reconstruction quality, i.e., the background and several important appearance details, such as hair style/color, glasses and accessories, are not faithfully reconstructed. Recent advances in Diffusion Probabilistic Models (DPMs) enable the generation of high-quality realistic images. To this end, in this paper we present DiffusionAct, a novel method that leverages the photo-realistic image generation of diffusion models to perform neural face reenactment. Specifically, we propose to control the semantic space of a Diffusion Autoencoder (DiffAE), in order to edit the facial pose of the input images, defined as the head pose orientation and the facial expressions. Our method allows one-shot, self, and cross-subject reenactment, without requiring subject-specific fine-tuning. We compare against state-of-the-art GAN-, StyleGAN2-, and diffusion-based methods, showing better or on-par reenactment performance
Bilinear Models of Parts and Appearances in Generative Adversarial Networks.
Recent advances in the understanding of Generative Adversarial Networks (GANs) have led to remarkable progress in visual editing and synthesis tasks, capitalizing on the rich semantics that are embedded in the latent spaces of pre-trained GANs. However, existing methods are often tailored to specific GAN architectures and are limited to either discovering global semantic directions that do not facilitate localized control, or require some form of supervision through manually provided regions or segmentation masks. In this light, we present an architecture-agnostic approach that jointly discovers factors representing spatial parts and their appearances in an entirely unsupervised fashion. These factors are obtained by applying a semi-nonnegative tensor factorization on the feature maps, which in turn enables context-aware local image editing with pixel-level control. In addition, we show that the discovered appearance factors correspond to saliency maps that localize concepts of interest, without using any labels. Experiments on a wide range of GAN architectures and datasets show that, in comparison to the state of the art, our method is far more efficient in terms of training time and, most importantly, provides much more accurate localized control
Recommended from our members
A deep generic to specific recognition model for group membership analysis using non-verbal cues
Automatic understanding and analysis of groups has attracted increasing attention
in the vision and multimedia communities in recent years. However,
little attention has been paid to the automatic analysis of the non-verbal behaviors
and how this can be utilized for analysis of group membership, i.e.,
recognizing which group each individual is part of. This paper presents a
novel Support Vector Machine (SVM) based Deep Specific Recognition Model
(DeepSRM) that is learned based on a generic recognition model. The generic
recognition model refers to the model trained with data across different conditions,
i.e., when people are watching movies of different types. Although the
generic recognition model can provide a baseline for the recognition model
trained for each specific condition, the different behaviors people exhibit in
different conditions limit the recognition performance of the generic model.
Therefore, the specific recognition model is proposed for each condition separately
and built on the top of the generic recognition model. We conduct a set
of experiments using a database collected to study group analysis while each
group (i.e., four participants together) were watching a number of long movie
segments. The proposed deep specific recognition model (44%) outperforms the generic recognition model (26%). The recognition of group membership also indicates that the non-verbal behaviors of individuals within a group share commonalities
WarpedGANSpace: Finding non-linear RBF paths in GAN latent space
This work addresses the problem of discovering, in an unsupervised manner, interpretable paths in the latent space of pretrained GANs, so as to provide an intuitive and easy way of controlling the underlying generative factors. In doing so, it addresses some of the limitations of the state-of-the-art works, namely, a) that they discover directions that are independent of the latent code, i.e., paths that are linear, and b) that their evaluation relies either on visual inspection or on laborious human labeling. More specifically, we propose to learn non-linear warpings on the latent space, each one parametrized by a set of RBF-based latent space warping functions, and where each warping gives rise to a family of non-linear paths via the gradient of the function. Building on the work of Voynov and Babenko that discovers linear paths, we optimize the trainable parameters of the set of RBFs, so as that images that are generated by codes along different paths, are easily distinguishable by a discriminator network. This leads to easily distinguishable image transformations, such as pose and facial expressions in facial images. We show that linear paths can be derived as a special case of our method, and show experimentally that non-linear paths in the latent space lead to steeper, more disentangled and interpretable changes in the image space than in state-of-the art methods, both qualitatively and quantitatively. We make the code and the pretrained models publicly available at: https://github.com/chi0tzp/WarpedGANSpace
WarpedGANSpace: Finding non-linear RBF paths in GAN latent space
This work addresses the problem of discovering, in an unsupervised manner, interpretable paths in the latent space of pretrained GANs, so as to provide an intuitive and easy way of controlling the underlying generative factors. In doing so, it addresses some of the limitations of the state-of-the-art works, namely, a) that they discover directions that are independent of the latent code, i.e., paths that are linear, and b) that their evaluation relies either on visual inspection or on laborious human labeling. More specifically, we propose to learn non-linear warpings on the latent space, each one parametrized by a set of RBF-based latent space warping functions, and where each warping gives rise to a family of non-linear paths via the gradient of the function. Building on the work of [34], that discovers linear paths, we optimize the trainable parameters of the set of RBFs, so as that images that are generated by codes along different paths, are easily distinguishable by a discriminator network. This leads to easily distinguishable image transformations, such as pose and facial expressions in facial images. We show that linear paths can be derived as a special case of our method, and show experimentally that non-linear paths in the latent space lead to steeper, more disentangled and interpretable changes in the image space than in state-of-the art methods, both qualitatively and quantitatively. We make the code and the pretrained models publicly available at: https://github.com/chi0tzp/WarpedGANSpace
HyperReenact: one-shot reenactment via jointly learning to refine and retarget faces
In this paper, we present our method for neural face
reenactment, called HyperReenact, that aims to generate
realistic talking head images of a source identity, driven
by a target facial pose. Existing state-of-the-art face reenactment methods train controllable generative models that
learn to synthesize realistic facial images, yet producing
reenacted faces that are prone to significant visual artifacts,
especially under the challenging condition of extreme head
pose changes, or requiring expensive few-shot fine-tuning
to better preserve the source identity characteristics. We
propose to address these limitations by leveraging the photorealistic generation ability and the disentangled properties of a pretrained StyleGAN2 generator, by first inverting
the real images into its latent space and then using a hypernetwork to perform: (i) refinement of the source identity characteristics and (ii) facial pose re-targeting, eliminating this way the dependence on external editing methods that typically produce artifacts. Our method operates under the one-shot setting (i.e., using a single source
frame) and allows for cross-subject reenactment, without
requiring any subject-specific fine-tuning. We compare
our method both quantitatively and qualitatively against
several state-of-the-art techniques on the standard benchmarks of VoxCeleb1 and VoxCeleb2, demonstrating the superiority of our approach in producing artifact-free images, exhibiting remarkable robustness even under extreme
head pose changes. We make the code and the pretrained
models publicly available at: https://github.com/
StelaBou/HyperReenact
Recommended from our members
Attribute-Preserving Face Dataset Anonymization via Latent Code Optimization
This work addresses the problem of anonymizing the identity of faces in a dataset of images, such that the privacy of those depicted is not violated, while at the same time the dataset is useful for downstream task such as for training machine learning models. To the best of our knowledge, we are the first to explicitly address this issue and deal with two major drawbacks of the existing state-of-the-art approaches, namely that they (i) require the costly training of additional, purpose-trained neural networks, and/or (ii) fail to retain the facial attributes of the original images in the anonymized counterparts, the preservation of which is of paramount importance for their use in downstream tasks. We accordingly present a task-agnostic anonymization procedure that directly optimizes the images' latent representation in the latent space of a pretrained GAN. By optimizing the latent codes directly, we ensure both that the identity is of a desired distance away from the original (with an identity obfuscation loss), whilst preserving the facial attributes (using a novel feature-matching loss in FaRL's [48] deep feature space). We demonstrate through a series of both qualitative and quantitative experiments that our method is capable of anonymizing the identity of the images whilst-crucially-better-preserving the facial attributes. We make the code and the pretrained models publicly available at: https://github.com/chi0tzp/FALCO
- …