2,556 research outputs found
Text-based Editing of Talking-head Video
Editing talking-head video to change the speech content or to remove filler words is challenging. We propose a novel method to edit talking-head video based on its transcript to produce a realistic output video in which the dialogue of the speaker has been modified, while maintaining a seamless audio-visual flow (i.e. no jump cuts). Our method automatically annotates an input talking-head video with phonemes, visemes, 3D face pose and geometry, reflectance, expression and scene illumination per frame. To edit a video, the user has to only edit the transcript, and an optimization strategy then chooses segments of the input corpus as base material. The annotated parameters corresponding to the selected segments are seamlessly stitched together and used to produce an intermediate video representation in which the lower half of the face is rendered with a parametric face model. Finally, a recurrent video generation network transforms this representation to a photorealistic video that matches the edited transcript. We demonstrate a large variety of edits, such as the addition, removal, and alteration of words, as well as convincing language translation and full sentence synthesis
Multi-expert learning of adaptive legged locomotion
Achieving versatile robot locomotion requires motor skills which can adapt to
previously unseen situations. We propose a Multi-Expert Learning Architecture
(MELA) that learns to generate adaptive skills from a group of representative
expert skills. During training, MELA is first initialised by a distinct set of
pre-trained experts, each in a separate deep neural network (DNN). Then by
learning the combination of these DNNs using a Gating Neural Network (GNN),
MELA can acquire more specialised experts and transitional skills across
various locomotion modes. During runtime, MELA constantly blends multiple DNNs
and dynamically synthesises a new DNN to produce adaptive behaviours in
response to changing situations. This approach leverages the advantages of
trained expert skills and the fast online synthesis of adaptive policies to
generate responsive motor skills during the changing tasks. Using a unified
MELA framework, we demonstrated successful multi-skill locomotion on a real
quadruped robot that performed coherent trotting, steering, and fall recovery
autonomously, and showed the merit of multi-expert learning generating
behaviours which can adapt to unseen scenarios
Application of deep learning for livestock behaviour recognition: a systematic literature review.
Livestock health and welfare monitoring is a tedious and labour-intensive task previously performed manually by humans. However, with recent technological advancements, the livestock industry has adopted the latest AI and computer vision-based techniques empowered by deep learning (DL) models that, at the core, act as decision-making tools. These models have previously been used to address several issues, including individual animal identification, tracking animal movement, body part recognition, and species classification. However, over the past decade, there has been a growing interest in using these models to examine the relationship between livestock behaviour and associated health problems. Several DL-based methodologies have been developed for livestock behaviour recognition, necessitating surveying and synthesising state-of-the-art. Previously, review studies were conducted in a very generic manner and did not focus on a specific problem, such as behaviour recognition. To the best of our knowledge, there is currently no review study that focuses on the use of DL specifically for livestock behaviour recognition. As a result, this systematic literature review (SLR) is being carried out. The review was performed by initially searching several popular electronic databases, resulting in 1101 publications. Further assessed through the defined selection criteria, 126 publications were shortlisted. These publications were filtered using quality criteria that resulted in the selection of 44 high-quality primary studies, which were analysed to extract the data to answer the defined research questions. According to the results, DL solved 13 behaviour recognition problems involving 44 different behaviour classes. 23 DL models and 24 networks were employed, with CNN, Faster R-CNN, YOLOv5, and YOLOv4 being the most common models, and VGG16, CSPDarknet53, GoogLeNet, ResNet101, and ResNet50 being the most popular networks. Ten different matrices were utilised for performance evaluation, with precision and accuracy being the most commonly used. Occlusion and adhesion, data imbalance, and the complex livestock environment were the most prominent challenges reported by the primary studies. Finally, potential solutions and research directions were discussed in this SLR study to aid in developing autonomous livestock behaviour recognition systems
Motion In-Betweening with Phase Manifolds
This paper introduces a novel data-driven motion in-betweening system to
reach target poses of characters by making use of phases variables learned by a
Periodic Autoencoder. Our approach utilizes a mixture-of-experts neural network
model, in which the phases cluster movements in both space and time with
different expert weights. Each generated set of weights then produces a
sequence of poses in an autoregressive manner between the current and target
state of the character. In addition, to satisfy poses which are manually
modified by the animators or where certain end effectors serve as constraints
to be reached by the animation, a learned bi-directional control scheme is
implemented to satisfy such constraints. The results demonstrate that using
phases for motion in-betweening tasks sharpen the interpolated movements, and
furthermore stabilizes the learning process. Moreover, using phases for motion
in-betweening tasks can also synthesize more challenging movements beyond
locomotion behaviors. Additionally, style control is enabled between given
target keyframes. Our proposed framework can compete with popular
state-of-the-art methods for motion in-betweening in terms of motion quality
and generalization, especially in the existence of long transition durations.
Our framework contributes to faster prototyping workflows for creating animated
character sequences, which is of enormous interest for the game and film
industry.Comment: 17 pages, 11 figures, conferenc
CIAGAN: Conditional Identity Anonymization Generative Adversarial Networks
The unprecedented increase in the usage of computer vision technology in
society goes hand in hand with an increased concern in data privacy. In many
real-world scenarios like people tracking or action recognition, it is
important to be able to process the data while taking careful consideration in
protecting people's identity. We propose and develop CIAGAN, a model for image
and video anonymization based on conditional generative adversarial networks.
Our model is able to remove the identifying characteristics of faces and bodies
while producing high-quality images and videos that can be used for any
computer vision task, such as detection or tracking. Unlike previous methods,
we have full control over the de-identification (anonymization) procedure,
ensuring both anonymization as well as diversity. We compare our method to
several baselines and achieve state-of-the-art results.Comment: CVPR 202
- …