4,420 research outputs found
Expressive visual text to speech and expression adaptation using deep neural networks
In this paper, we present an expressive visual text to speech system (VTTS) based on a deep neural network (DNN). Given an input text sentence and a set of expression tags, the VTTS is able to produce not only the audio speech, but also the accompanying facial movements. The expressions can either be one of the expressions in the training corpus or a blend of expressions from the training corpus. Furthermore, we present a method of adapting a previously trained DNN to include a new expression using a small amount of training data. Experiments show that the proposed DNN-based VTTS is preferred by 57.9% over the baseline hidden Markov model based VTTS which uses cluster adaptive training
Adversarial Training in Affective Computing and Sentiment Analysis: Recent Advances and Perspectives
Over the past few years, adversarial training has become an extremely active
research topic and has been successfully applied to various Artificial
Intelligence (AI) domains. As a potentially crucial technique for the
development of the next generation of emotional AI systems, we herein provide a
comprehensive overview of the application of adversarial training to affective
computing and sentiment analysis. Various representative adversarial training
algorithms are explained and discussed accordingly, aimed at tackling diverse
challenges associated with emotional AI systems. Further, we highlight a range
of potential future research directions. We expect that this overview will help
facilitate the development of adversarial training for affective computing and
sentiment analysis in both the academic and industrial communities
Domain transfer for deep natural language generation from abstract meaning representations
Stochastic natural language generation systems that are trained from labelled datasets are often domainspecific in their annotation and in their mapping from semantic input representations to lexical-syntactic outputs. As a result, learnt models fail to generalize across domains, heavily restricting their usability beyond single applications. In this article, we focus on the problem of domain adaptation for natural language generation. We show how linguistic knowledge from a source domain, for which labelled data is available, can be adapted to a target domain by reusing training data across domains. As a key to this, we propose to employ abstract meaning representations as a common semantic representation across domains. We model natural language generation as a long short-term memory recurrent neural network encoderdecoder, in which one recurrent neural network learns a latent representation of a semantic input, and a second recurrent neural network learns to decode it to a sequence of words. We show that the learnt representations can be transferred across domains and can be leveraged effectively to improve training on new unseen domains. Experiments in three different domains and with six datasets demonstrate that the lexical-syntactic constructions learnt in one domain can be transferred to new domains and achieve up to 75-100% of the performance of in-domain training. This is based on objective metrics such as BLEU and semantic error rate and a subjective human rating study. Training a policy from prior knowledge from a different domain is consistently better than pure in-domain training by up to 10%
Efficient Emotional Adaptation for Audio-Driven Talking-Head Generation
Audio-driven talking-head synthesis is a popular research topic for virtual
human-related applications. However, the inflexibility and inefficiency of
existing methods, which necessitate expensive end-to-end training to transfer
emotions from guidance videos to talking-head predictions, are significant
limitations. In this work, we propose the Emotional Adaptation for Audio-driven
Talking-head (EAT) method, which transforms emotion-agnostic talking-head
models into emotion-controllable ones in a cost-effective and efficient manner
through parameter-efficient adaptations. Our approach utilizes a pretrained
emotion-agnostic talking-head transformer and introduces three lightweight
adaptations (the Deep Emotional Prompts, Emotional Deformation Network, and
Emotional Adaptation Module) from different perspectives to enable precise and
realistic emotion controls. Our experiments demonstrate that our approach
achieves state-of-the-art performance on widely-used benchmarks, including LRW
and MEAD. Additionally, our parameter-efficient adaptations exhibit remarkable
generalization ability, even in scenarios where emotional training videos are
scarce or nonexistent. Project website: https://yuangan.github.io/eat/Comment: Accepted to ICCV 2023. Project page: https://yuangan.github.io/eat
- …