3,245 research outputs found
Synthesizing Images of Humans in Unseen Poses
We address the computational problem of novel human pose synthesis. Given an
image of a person and a desired pose, we produce a depiction of that person in
that pose, retaining the appearance of both the person and background. We
present a modular generative neural network that synthesizes unseen poses using
training pairs of images and poses taken from human action videos. Our network
separates a scene into different body part and background layers, moves body
parts to new locations and refines their appearances, and composites the new
foreground with a hole-filled background. These subtasks, implemented with
separate modules, are trained jointly using only a single target image as a
supervised label. We use an adversarial discriminator to force our network to
synthesize realistic details conditioned on pose. We demonstrate image
synthesis results on three action classes: golf, yoga/workouts and tennis, and
show that our method produces accurate results within action classes as well as
across action classes. Given a sequence of desired poses, we also produce
coherent videos of actions.Comment: CVPR 201
Unsupervised Discovery of Parts, Structure, and Dynamics
Humans easily recognize object parts and their hierarchical structure by
watching how they move; they can then predict how each part moves in the
future. In this paper, we propose a novel formulation that simultaneously
learns a hierarchical, disentangled object representation and a dynamics model
for object parts from unlabeled videos. Our Parts, Structure, and Dynamics
(PSD) model learns to, first, recognize the object parts via a layered image
representation; second, predict hierarchy via a structural descriptor that
composes low-level concepts into a hierarchical structure; and third, model the
system dynamics by predicting the future. Experiments on multiple real and
synthetic datasets demonstrate that our PSD model works well on all three
tasks: segmenting object parts, building their hierarchical structure, and
capturing their motion distributions.Comment: ICLR 2019. The first two authors contributed equally to this wor
Synthesizing Moving People with 3D Control
In this paper, we present a diffusion model-based framework for animating
people from a single image for a given target 3D motion sequence. Our approach
has two core components: a) learning priors about invisible parts of the human
body and clothing, and b) rendering novel body poses with proper clothing and
texture. For the first part, we learn an in-filling diffusion model to
hallucinate unseen parts of a person given a single image. We train this model
on texture map space, which makes it more sample-efficient since it is
invariant to pose and viewpoint. Second, we develop a diffusion-based rendering
pipeline, which is controlled by 3D human poses. This produces realistic
renderings of novel poses of the person, including clothing, hair, and
plausible in-filling of unseen regions. This disentangled approach allows our
method to generate a sequence of images that are faithful to the target motion
in the 3D pose and, to the input image in terms of visual similarity. In
addition to that, the 3D control allows various synthetic camera trajectories
to render a person. Our experiments show that our method is resilient in
generating prolonged motions and varied challenging and complex poses compared
to prior methods. Please check our website for more details:
https://boyiliee.github.io/3DHM.github.io/
- …