3 research outputs found
Unsupervised Pose Flow Learning for Pose Guided Synthesis
Pose guided synthesis aims to generate a new image in an arbitrary target
pose while preserving the appearance details from the source image. Existing
approaches rely on either hard-coded spatial transformations or 3D body
modeling. They often overlook complex non-rigid pose deformation or unmatched
occluded regions, thus fail to effectively preserve appearance information. In
this paper, we propose an unsupervised pose flow learning scheme that learns to
transfer the appearance details from the source image. Based on such learned
pose flow, we proposed GarmentNet and SynthesisNet, both of which use
multi-scale feature-domain alignment for coarse-to-fine synthesis. Experiments
on the DeepFashion, MVC dataset and additional real-world datasets demonstrate
that our approach compares favorably with the state-of-the-art methods and
generalizes to unseen poses and clothing styles.Comment: 12 pages, 13 figure
Towards Fine-grained Human Pose Transfer with Detail Replenishing Network
Human pose transfer (HPT) is an emerging research topic with huge potential
in fashion design, media production, online advertising and virtual reality.
For these applications, the visual realism of fine-grained appearance details
is crucial for production quality and user engagement. However, existing HPT
methods often suffer from three fundamental issues: detail deficiency, content
ambiguity and style inconsistency, which severely degrade the visual quality
and realism of generated images. Aiming towards real-world applications, we
develop a more challenging yet practical HPT setting, termed as Fine-grained
Human Pose Transfer (FHPT), with a higher focus on semantic fidelity and detail
replenishment. Concretely, we analyze the potential design flaws of existing
methods via an illustrative example, and establish the core FHPT methodology by
combing the idea of content synthesis and feature transfer together in a
mutually-guided fashion. Thereafter, we substantiate the proposed methodology
with a Detail Replenishing Network (DRN) and a corresponding coarse-to-fine
model training scheme. Moreover, we build up a complete suite of fine-grained
evaluation protocols to address the challenges of FHPT in a comprehensive
manner, including semantic analysis, structural detection and perceptual
quality assessment. Extensive experiments on the DeepFashion benchmark dataset
have verified the power of proposed benchmark against start-of-the-art works,
with 12\%-14\% gain on top-10 retrieval recall, 5\% higher joint localization
accuracy, and near 40\% gain on face identity preservation. Moreover, the
evaluation results offer further insights to the subject matter, which could
inspire many promising future works along this direction.Comment: IEEE TIP submissio
Intrinsic Temporal Regularization for High-resolution Human Video Synthesis
Temporal consistency is crucial for extending image processing pipelines to
the video domain, which is often enforced with flow-based warping error over
adjacent frames. Yet for human video synthesis, such scheme is less reliable
due to the misalignment between source and target video as well as the
difficulty in accurate flow estimation. In this paper, we propose an effective
intrinsic temporal regularization scheme to mitigate these issues, where an
intrinsic confidence map is estimated via the frame generator to regulate
motion estimation via temporal loss modulation. This creates a shortcut for
back-propagating temporal loss gradients directly to the front-end motion
estimator, thus improving training stability and temporal coherence in output
videos. We apply our intrinsic temporal regulation to single-image generator,
leading to a powerful "INTERnet" capable of generating
resolution human action videos with temporal-coherent, realistic visual
details. Extensive experiments demonstrate the superiority of proposed INTERnet
over several competitive baselines.Comment: 10 pages, work done during internship at Alibaba DAMO Academ