2,556 research outputs found

    Text-based Editing of Talking-head Video

    No full text
    Editing talking-head video to change the speech content or to remove filler words is challenging. We propose a novel method to edit talking-head video based on its transcript to produce a realistic output video in which the dialogue of the speaker has been modified, while maintaining a seamless audio-visual flow (i.e. no jump cuts). Our method automatically annotates an input talking-head video with phonemes, visemes, 3D face pose and geometry, reflectance, expression and scene illumination per frame. To edit a video, the user has to only edit the transcript, and an optimization strategy then chooses segments of the input corpus as base material. The annotated parameters corresponding to the selected segments are seamlessly stitched together and used to produce an intermediate video representation in which the lower half of the face is rendered with a parametric face model. Finally, a recurrent video generation network transforms this representation to a photorealistic video that matches the edited transcript. We demonstrate a large variety of edits, such as the addition, removal, and alteration of words, as well as convincing language translation and full sentence synthesis

    Multi-expert learning of adaptive legged locomotion

    Get PDF
    Achieving versatile robot locomotion requires motor skills which can adapt to previously unseen situations. We propose a Multi-Expert Learning Architecture (MELA) that learns to generate adaptive skills from a group of representative expert skills. During training, MELA is first initialised by a distinct set of pre-trained experts, each in a separate deep neural network (DNN). Then by learning the combination of these DNNs using a Gating Neural Network (GNN), MELA can acquire more specialised experts and transitional skills across various locomotion modes. During runtime, MELA constantly blends multiple DNNs and dynamically synthesises a new DNN to produce adaptive behaviours in response to changing situations. This approach leverages the advantages of trained expert skills and the fast online synthesis of adaptive policies to generate responsive motor skills during the changing tasks. Using a unified MELA framework, we demonstrated successful multi-skill locomotion on a real quadruped robot that performed coherent trotting, steering, and fall recovery autonomously, and showed the merit of multi-expert learning generating behaviours which can adapt to unseen scenarios

    Application of deep learning for livestock behaviour recognition: a systematic literature review.

    Get PDF
    Livestock health and welfare monitoring is a tedious and labour-intensive task previously performed manually by humans. However, with recent technological advancements, the livestock industry has adopted the latest AI and computer vision-based techniques empowered by deep learning (DL) models that, at the core, act as decision-making tools. These models have previously been used to address several issues, including individual animal identification, tracking animal movement, body part recognition, and species classification. However, over the past decade, there has been a growing interest in using these models to examine the relationship between livestock behaviour and associated health problems. Several DL-based methodologies have been developed for livestock behaviour recognition, necessitating surveying and synthesising state-of-the-art. Previously, review studies were conducted in a very generic manner and did not focus on a specific problem, such as behaviour recognition. To the best of our knowledge, there is currently no review study that focuses on the use of DL specifically for livestock behaviour recognition. As a result, this systematic literature review (SLR) is being carried out. The review was performed by initially searching several popular electronic databases, resulting in 1101 publications. Further assessed through the defined selection criteria, 126 publications were shortlisted. These publications were filtered using quality criteria that resulted in the selection of 44 high-quality primary studies, which were analysed to extract the data to answer the defined research questions. According to the results, DL solved 13 behaviour recognition problems involving 44 different behaviour classes. 23 DL models and 24 networks were employed, with CNN, Faster R-CNN, YOLOv5, and YOLOv4 being the most common models, and VGG16, CSPDarknet53, GoogLeNet, ResNet101, and ResNet50 being the most popular networks. Ten different matrices were utilised for performance evaluation, with precision and accuracy being the most commonly used. Occlusion and adhesion, data imbalance, and the complex livestock environment were the most prominent challenges reported by the primary studies. Finally, potential solutions and research directions were discussed in this SLR study to aid in developing autonomous livestock behaviour recognition systems

    Motion In-Betweening with Phase Manifolds

    Full text link
    This paper introduces a novel data-driven motion in-betweening system to reach target poses of characters by making use of phases variables learned by a Periodic Autoencoder. Our approach utilizes a mixture-of-experts neural network model, in which the phases cluster movements in both space and time with different expert weights. Each generated set of weights then produces a sequence of poses in an autoregressive manner between the current and target state of the character. In addition, to satisfy poses which are manually modified by the animators or where certain end effectors serve as constraints to be reached by the animation, a learned bi-directional control scheme is implemented to satisfy such constraints. The results demonstrate that using phases for motion in-betweening tasks sharpen the interpolated movements, and furthermore stabilizes the learning process. Moreover, using phases for motion in-betweening tasks can also synthesize more challenging movements beyond locomotion behaviors. Additionally, style control is enabled between given target keyframes. Our proposed framework can compete with popular state-of-the-art methods for motion in-betweening in terms of motion quality and generalization, especially in the existence of long transition durations. Our framework contributes to faster prototyping workflows for creating animated character sequences, which is of enormous interest for the game and film industry.Comment: 17 pages, 11 figures, conferenc

    CIAGAN: Conditional Identity Anonymization Generative Adversarial Networks

    Full text link
    The unprecedented increase in the usage of computer vision technology in society goes hand in hand with an increased concern in data privacy. In many real-world scenarios like people tracking or action recognition, it is important to be able to process the data while taking careful consideration in protecting people's identity. We propose and develop CIAGAN, a model for image and video anonymization based on conditional generative adversarial networks. Our model is able to remove the identifying characteristics of faces and bodies while producing high-quality images and videos that can be used for any computer vision task, such as detection or tracking. Unlike previous methods, we have full control over the de-identification (anonymization) procedure, ensuring both anonymization as well as diversity. We compare our method to several baselines and achieve state-of-the-art results.Comment: CVPR 202
    corecore