5,381 research outputs found

    Text-based Editing of Talking-head Video

    No full text
    Editing talking-head video to change the speech content or to remove filler words is challenging. We propose a novel method to edit talking-head video based on its transcript to produce a realistic output video in which the dialogue of the speaker has been modified, while maintaining a seamless audio-visual flow (i.e. no jump cuts). Our method automatically annotates an input talking-head video with phonemes, visemes, 3D face pose and geometry, reflectance, expression and scene illumination per frame. To edit a video, the user has to only edit the transcript, and an optimization strategy then chooses segments of the input corpus as base material. The annotated parameters corresponding to the selected segments are seamlessly stitched together and used to produce an intermediate video representation in which the lower half of the face is rendered with a parametric face model. Finally, a recurrent video generation network transforms this representation to a photorealistic video that matches the edited transcript. We demonstrate a large variety of edits, such as the addition, removal, and alteration of words, as well as convincing language translation and full sentence synthesis

    Learning from Longitudinal Face Demonstration - Where Tractable Deep Modeling Meets Inverse Reinforcement Learning

    Full text link
    This paper presents a novel Subject-dependent Deep Aging Path (SDAP), which inherits the merits of both Generative Probabilistic Modeling and Inverse Reinforcement Learning to model the facial structures and the longitudinal face aging process of a given subject. The proposed SDAP is optimized using tractable log-likelihood objective functions with Convolutional Neural Networks (CNNs) based deep feature extraction. Instead of applying a fixed aging development path for all input faces and subjects, SDAP is able to provide the most appropriate aging development path for individual subject that optimizes the reward aging formulation. Unlike previous methods that can take only one image as the input, SDAP further allows multiple images as inputs, i.e. all information of a subject at either the same or different ages, to produce the optimal aging path for the given subject. Finally, SDAP allows efficiently synthesizing in-the-wild aging faces. The proposed model is experimented in both tasks of face aging synthesis and cross-age face verification. The experimental results consistently show SDAP achieves the state-of-the-art performance on numerous face aging databases, i.e. FG-NET, MORPH, AginG Faces in the Wild (AGFW), and Cross-Age Celebrity Dataset (CACD). Furthermore, we also evaluate the performance of SDAP on large-scale Megaface challenge to demonstrate the advantages of the proposed solution

    Data Augmentation for Leaf Segmentation and Counting Tasks in Rosette Plants

    Full text link
    Deep learning techniques involving image processing and data analysis are constantly evolving. Many domains adapt these techniques for object segmentation, instantiation and classification. Recently, agricultural industries adopted those techniques in order to bring automation to farmers around the globe. One analysis procedure required for automatic visual inspection in this domain is leaf count and segmentation. Collecting labeled data from field crops and greenhouses is a complicated task due to the large variety of crops, growth seasons, climate changes, phenotype diversity, and more, especially when specific learning tasks require a large amount of labeled data for training. Data augmentation for training deep neural networks is well established, examples include data synthesis, using generative semi-synthetic models, and applying various kinds of transformations. In this paper we propose a method that preserves the geometric structure of the data objects, thus keeping the physical appearance of the data-set as close as possible to imaged plants in real agricultural scenes. The proposed method provides state of the art results when applied to the standard benchmark in the field, namely, the ongoing Leaf Segmentation Challenge hosted by Computer Vision Problems in Plant Phenotyping

    Generative Adversarial Network in Medical Imaging: A Review

    Full text link
    Generative adversarial networks have gained a lot of attention in the computer vision community due to their capability of data generation without explicitly modelling the probability density function. The adversarial loss brought by the discriminator provides a clever way of incorporating unlabeled samples into training and imposing higher order consistency. This has proven to be useful in many cases, such as domain adaptation, data augmentation, and image-to-image translation. These properties have attracted researchers in the medical imaging community, and we have seen rapid adoption in many traditional and novel applications, such as image reconstruction, segmentation, detection, classification, and cross-modality synthesis. Based on our observations, this trend will continue and we therefore conducted a review of recent advances in medical imaging using the adversarial training scheme with the hope of benefiting researchers interested in this technique.Comment: 24 pages; v4; added missing references from before Jan 1st 2019; accepted to MedI

    Formal methods and software engineering for DL. Security, safety and productivity for DL systems development

    Full text link
    Deep Learning (DL) techniques are now widespread and being integrated into many important systems. Their classification and recognition abilities ensure their relevance for multiple application domains. As machine-learning that relies on training instead of algorithm programming, they offer a high degree of productivity. But they can be vulnerable to attacks and the verification of their correctness is only just emerging as a scientific and engineering possibility. This paper is a major update of a previously-published survey, attempting to cover all recent publications in this area. It also covers an even more recent trend, namely the design of domain-specific languages for producing and training neural nets.Comment: Submitted to IEEE-CCECE201

    Iterative Text-based Editing of Talking-heads Using Neural Retargeting

    Full text link
    We present a text-based tool for editing talking-head video that enables an iterative editing workflow. On each iteration users can edit the wording of the speech, further refine mouth motions if necessary to reduce artifacts and manipulate non-verbal aspects of the performance by inserting mouth gestures (e.g. a smile) or changing the overall performance style (e.g. energetic, mumble). Our tool requires only 2-3 minutes of the target actor video and it synthesizes the video for each iteration in about 40 seconds, allowing users to quickly explore many editing possibilities as they iterate. Our approach is based on two key ideas. (1) We develop a fast phoneme search algorithm that can quickly identify phoneme-level subsequences of the source repository video that best match a desired edit. This enables our fast iteration loop. (2) We leverage a large repository of video of a source actor and develop a new self-supervised neural retargeting technique for transferring the mouth motions of the source actor to the target actor. This allows us to work with relatively short target actor videos, making our approach applicable in many real-world editing scenarios. Finally, our refinement and performance controls give users the ability to further fine-tune the synthesized results.Comment: Project Website is https://davidyao.me/projects/text2vi

    High-level Synthesis

    Full text link
    Hardware synthesis is a general term used to refer to the processes involved in automatically generating a hardware design from its specification. High-level synthesis (HLS) could be defined as the translation from a behavioral description of the intended hardware circuit into a structural description similar to the compilation of programming languages (such as C and Pascal into assembly language. The chained synthesis tasks at each level of the design process include system synthesis, register-transfer synthesis, logic synthesis, and circuit synthesis. The development of hardware solutions for complex applications is no more a complicated task with the emergence of various HLS tools. Many areas of application have benefited from the modern advances in hardware design, such as automotive and aerospace industries, computer graphics, signal and image processing, security, complex simulations like molecular modeling, and DND matching. The field of HLS is continuing its rapid growth to facilitate the creation of hardware and to blur more and more the border separating the processes of designing hardware and software.Comment: 19 Pages, 16 Figures. arXiv admin note: text overlap with arXiv:1905.02075, arXiv:1905.0207

    Context-Aware System Synthesis, Task Assignment, and Routing

    Full text link
    The design and organization of complex robotic systems traditionally requires laborious trial-and-error processes to ensure both hardware and software components are correctly connected with the resources necessary for computation. This paper presents a novel generalization of the quadratic assignment and routing problem, introducing formalisms for selecting components and interconnections to synthesize a complete system capable of providing some user-defined functionality. By introducing mission context, functional requirements, and modularity directly into the assignment problem, we derive a solution where components are automatically selected and then organized into an optimal hardware and software interconnection structure, all while respecting restrictions on component viability and required functionality. The ability to generate \emph{complete} functional systems directly from individual components reduces manual design effort by allowing for a guided exploration of the design space. Additionally, our formulation increases resiliency by quantifying resource margins and enabling adaptation of system structure in response to changing environments, hardware or software failure. The proposed formulation is cast as an integer linear program which is provably NP\mathcal{NP}-hard. Two case studies are developed and analyzed to highlight the expressiveness and complexity of problems that can be addressed by this approach: the first explores the iterative development of a ground-based search-and-rescue robot in a variety of mission contexts, while the second explores the large-scale, complex design of a humanoid disaster robot for the DARPA Robotics Challenge. Numerical simulations quantify real world performance and demonstrate tractable time complexity for the scale of problems encountered in many modern robotic systems.Comment: 17 pages, 10 figures, Submitted to Transactions in Robotic

    Text-based Editing of Talking-head Video

    Full text link
    Editing talking-head video to change the speech content or to remove filler words is challenging. We propose a novel method to edit talking-head video based on its transcript to produce a realistic output video in which the dialogue of the speaker has been modified, while maintaining a seamless audio-visual flow (i.e. no jump cuts). Our method automatically annotates an input talking-head video with phonemes, visemes, 3D face pose and geometry, reflectance, expression and scene illumination per frame. To edit a video, the user has to only edit the transcript, and an optimization strategy then chooses segments of the input corpus as base material. The annotated parameters corresponding to the selected segments are seamlessly stitched together and used to produce an intermediate video representation in which the lower half of the face is rendered with a parametric face model. Finally, a recurrent video generation network transforms this representation to a photorealistic video that matches the edited transcript. We demonstrate a large variety of edits, such as the addition, removal, and alteration of words, as well as convincing language translation and full sentence synthesis.Comment: A version with higher resolution images can be downloaded from the authors' websit
    • …
    corecore