5,381 research outputs found
Text-based Editing of Talking-head Video
Editing talking-head video to change the speech content or to remove filler words is challenging. We propose a novel method to edit talking-head video based on its transcript to produce a realistic output video in which the dialogue of the speaker has been modified, while maintaining a seamless audio-visual flow (i.e. no jump cuts). Our method automatically annotates an input talking-head video with phonemes, visemes, 3D face pose and geometry, reflectance, expression and scene illumination per frame. To edit a video, the user has to only edit the transcript, and an optimization strategy then chooses segments of the input corpus as base material. The annotated parameters corresponding to the selected segments are seamlessly stitched together and used to produce an intermediate video representation in which the lower half of the face is rendered with a parametric face model. Finally, a recurrent video generation network transforms this representation to a photorealistic video that matches the edited transcript. We demonstrate a large variety of edits, such as the addition, removal, and alteration of words, as well as convincing language translation and full sentence synthesis
Learning from Longitudinal Face Demonstration - Where Tractable Deep Modeling Meets Inverse Reinforcement Learning
This paper presents a novel Subject-dependent Deep Aging Path (SDAP), which
inherits the merits of both Generative Probabilistic Modeling and Inverse
Reinforcement Learning to model the facial structures and the longitudinal face
aging process of a given subject. The proposed SDAP is optimized using
tractable log-likelihood objective functions with Convolutional Neural Networks
(CNNs) based deep feature extraction. Instead of applying a fixed aging
development path for all input faces and subjects, SDAP is able to provide the
most appropriate aging development path for individual subject that optimizes
the reward aging formulation. Unlike previous methods that can take only one
image as the input, SDAP further allows multiple images as inputs, i.e. all
information of a subject at either the same or different ages, to produce the
optimal aging path for the given subject. Finally, SDAP allows efficiently
synthesizing in-the-wild aging faces. The proposed model is experimented in
both tasks of face aging synthesis and cross-age face verification. The
experimental results consistently show SDAP achieves the state-of-the-art
performance on numerous face aging databases, i.e. FG-NET, MORPH, AginG Faces
in the Wild (AGFW), and Cross-Age Celebrity Dataset (CACD). Furthermore, we
also evaluate the performance of SDAP on large-scale Megaface challenge to
demonstrate the advantages of the proposed solution
Data Augmentation for Leaf Segmentation and Counting Tasks in Rosette Plants
Deep learning techniques involving image processing and data analysis are
constantly evolving. Many domains adapt these techniques for object
segmentation, instantiation and classification. Recently, agricultural
industries adopted those techniques in order to bring automation to farmers
around the globe. One analysis procedure required for automatic visual
inspection in this domain is leaf count and segmentation. Collecting labeled
data from field crops and greenhouses is a complicated task due to the large
variety of crops, growth seasons, climate changes, phenotype diversity, and
more, especially when specific learning tasks require a large amount of labeled
data for training. Data augmentation for training deep neural networks is well
established, examples include data synthesis, using generative semi-synthetic
models, and applying various kinds of transformations. In this paper we propose
a method that preserves the geometric structure of the data objects, thus
keeping the physical appearance of the data-set as close as possible to imaged
plants in real agricultural scenes. The proposed method provides state of the
art results when applied to the standard benchmark in the field, namely, the
ongoing Leaf Segmentation Challenge hosted by Computer Vision Problems in Plant
Phenotyping
Generative Adversarial Network in Medical Imaging: A Review
Generative adversarial networks have gained a lot of attention in the
computer vision community due to their capability of data generation without
explicitly modelling the probability density function. The adversarial loss
brought by the discriminator provides a clever way of incorporating unlabeled
samples into training and imposing higher order consistency. This has proven to
be useful in many cases, such as domain adaptation, data augmentation, and
image-to-image translation. These properties have attracted researchers in the
medical imaging community, and we have seen rapid adoption in many traditional
and novel applications, such as image reconstruction, segmentation, detection,
classification, and cross-modality synthesis. Based on our observations, this
trend will continue and we therefore conducted a review of recent advances in
medical imaging using the adversarial training scheme with the hope of
benefiting researchers interested in this technique.Comment: 24 pages; v4; added missing references from before Jan 1st 2019;
accepted to MedI
Formal methods and software engineering for DL. Security, safety and productivity for DL systems development
Deep Learning (DL) techniques are now widespread and being integrated into
many important systems. Their classification and recognition abilities ensure
their relevance for multiple application domains. As machine-learning that
relies on training instead of algorithm programming, they offer a high degree
of productivity. But they can be vulnerable to attacks and the verification of
their correctness is only just emerging as a scientific and engineering
possibility. This paper is a major update of a previously-published survey,
attempting to cover all recent publications in this area. It also covers an
even more recent trend, namely the design of domain-specific languages for
producing and training neural nets.Comment: Submitted to IEEE-CCECE201
Iterative Text-based Editing of Talking-heads Using Neural Retargeting
We present a text-based tool for editing talking-head video that enables an
iterative editing workflow. On each iteration users can edit the wording of the
speech, further refine mouth motions if necessary to reduce artifacts and
manipulate non-verbal aspects of the performance by inserting mouth gestures
(e.g. a smile) or changing the overall performance style (e.g. energetic,
mumble). Our tool requires only 2-3 minutes of the target actor video and it
synthesizes the video for each iteration in about 40 seconds, allowing users to
quickly explore many editing possibilities as they iterate. Our approach is
based on two key ideas. (1) We develop a fast phoneme search algorithm that can
quickly identify phoneme-level subsequences of the source repository video that
best match a desired edit. This enables our fast iteration loop. (2) We
leverage a large repository of video of a source actor and develop a new
self-supervised neural retargeting technique for transferring the mouth motions
of the source actor to the target actor. This allows us to work with relatively
short target actor videos, making our approach applicable in many real-world
editing scenarios. Finally, our refinement and performance controls give users
the ability to further fine-tune the synthesized results.Comment: Project Website is https://davidyao.me/projects/text2vi
High-level Synthesis
Hardware synthesis is a general term used to refer to the processes involved
in automatically generating a hardware design from its specification.
High-level synthesis (HLS) could be defined as the translation from a
behavioral description of the intended hardware circuit into a structural
description similar to the compilation of programming languages (such as C and
Pascal into assembly language. The chained synthesis tasks at each level of the
design process include system synthesis, register-transfer synthesis, logic
synthesis, and circuit synthesis. The development of hardware solutions for
complex applications is no more a complicated task with the emergence of
various HLS tools. Many areas of application have benefited from the modern
advances in hardware design, such as automotive and aerospace industries,
computer graphics, signal and image processing, security, complex simulations
like molecular modeling, and DND matching. The field of HLS is continuing its
rapid growth to facilitate the creation of hardware and to blur more and more
the border separating the processes of designing hardware and software.Comment: 19 Pages, 16 Figures. arXiv admin note: text overlap with
arXiv:1905.02075, arXiv:1905.0207
Context-Aware System Synthesis, Task Assignment, and Routing
The design and organization of complex robotic systems traditionally requires
laborious trial-and-error processes to ensure both hardware and software
components are correctly connected with the resources necessary for
computation. This paper presents a novel generalization of the quadratic
assignment and routing problem, introducing formalisms for selecting components
and interconnections to synthesize a complete system capable of providing some
user-defined functionality. By introducing mission context, functional
requirements, and modularity directly into the assignment problem, we derive a
solution where components are automatically selected and then organized into an
optimal hardware and software interconnection structure, all while respecting
restrictions on component viability and required functionality. The ability to
generate \emph{complete} functional systems directly from individual components
reduces manual design effort by allowing for a guided exploration of the design
space. Additionally, our formulation increases resiliency by quantifying
resource margins and enabling adaptation of system structure in response to
changing environments, hardware or software failure. The proposed formulation
is cast as an integer linear program which is provably -hard. Two
case studies are developed and analyzed to highlight the expressiveness and
complexity of problems that can be addressed by this approach: the first
explores the iterative development of a ground-based search-and-rescue robot in
a variety of mission contexts, while the second explores the large-scale,
complex design of a humanoid disaster robot for the DARPA Robotics Challenge.
Numerical simulations quantify real world performance and demonstrate tractable
time complexity for the scale of problems encountered in many modern robotic
systems.Comment: 17 pages, 10 figures, Submitted to Transactions in Robotic
Text-based Editing of Talking-head Video
Editing talking-head video to change the speech content or to remove filler
words is challenging. We propose a novel method to edit talking-head video
based on its transcript to produce a realistic output video in which the
dialogue of the speaker has been modified, while maintaining a seamless
audio-visual flow (i.e. no jump cuts). Our method automatically annotates an
input talking-head video with phonemes, visemes, 3D face pose and geometry,
reflectance, expression and scene illumination per frame. To edit a video, the
user has to only edit the transcript, and an optimization strategy then chooses
segments of the input corpus as base material. The annotated parameters
corresponding to the selected segments are seamlessly stitched together and
used to produce an intermediate video representation in which the lower half of
the face is rendered with a parametric face model. Finally, a recurrent video
generation network transforms this representation to a photorealistic video
that matches the edited transcript. We demonstrate a large variety of edits,
such as the addition, removal, and alteration of words, as well as convincing
language translation and full sentence synthesis.Comment: A version with higher resolution images can be downloaded from the
authors' websit
- …