2,959 research outputs found
Unsupervised Person Image Synthesis in Arbitrary Poses
We present a novel approach for synthesizing photo-realistic images of people
in arbitrary poses using generative adversarial learning. Given an input image
of a person and a desired pose represented by a 2D skeleton, our model renders
the image of the same person under the new pose, synthesizing novel views of
the parts visible in the input image and hallucinating those that are not seen.
This problem has recently been addressed in a supervised manner, i.e., during
training the ground truth images under the new poses are given to the network.
We go beyond these approaches by proposing a fully unsupervised strategy. We
tackle this challenging scenario by splitting the problem into two principal
subtasks. First, we consider a pose conditioned bidirectional generator that
maps back the initially rendered image to the original pose, hence being
directly comparable to the input image without the need to resort to any
training image. Second, we devise a novel loss function that incorporates
content and style terms, and aims at producing images of high perceptual
quality. Extensive experiments conducted on the DeepFashion dataset demonstrate
that the images rendered by our model are very close in appearance to those
obtained by fully supervised approaches.Comment: Accepted as Spotlight at CVPR 201
Unsupervised person image synthesis in arbitrary poses
© 2018 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting /republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other worksWe present a novel approach for synthesizing photo-realistic images of people in arbitrary poses using generative adversarial learning. Given an input image of a person and a desired pose represented by a 2D skeleton, our model renders the image of the same person under the new pose, synthesizing novel views of the parts visible in the input image and hallucinating those that are not seen. This problem has recently been addressed in a supervised manner, i.e., during training the ground truth images under the new poses are given to the network. We go beyond these approaches by proposing a fully unsupervised strategy. We tackle this challenging scenario by splitting the problem into two principal subtasks. First, we consider a pose conditioned bidirectional generator that maps back the initially rendered image to the original pose, hence being directly comparable to the input image without the need to resort to any training image. Second, we devise a novel loss function that incorporates content and style terms, and aims at producing images of high perceptual quality. Extensive experiments conducted on the DeepFashion dataset demonstrate that the images rendered by our model are very close in appearance to those obtained by fully supervised approaches.Peer ReviewedPostprint (author's final draft
GINA-3D: Learning to Generate Implicit Neural Assets in the Wild
Modeling the 3D world from sensor data for simulation is a scalable way of
developing testing and validation environments for robotic learning problems
such as autonomous driving. However, manually creating or re-creating
real-world-like environments is difficult, expensive, and not scalable. Recent
generative model techniques have shown promising progress to address such
challenges by learning 3D assets using only plentiful 2D images -- but still
suffer limitations as they leverage either human-curated image datasets or
renderings from manually-created synthetic 3D environments. In this paper, we
introduce GINA-3D, a generative model that uses real-world driving data from
camera and LiDAR sensors to create realistic 3D implicit neural assets of
diverse vehicles and pedestrians. Compared to the existing image datasets, the
real-world driving setting poses new challenges due to occlusions,
lighting-variations and long-tail distributions. GINA-3D tackles these
challenges by decoupling representation learning and generative modeling into
two stages with a learned tri-plane latent structure, inspired by recent
advances in generative modeling of images. To evaluate our approach, we
construct a large-scale object-centric dataset containing over 520K images of
vehicles and pedestrians from the Waymo Open Dataset, and a new set of 80K
images of long-tail instances such as construction equipment, garbage trucks,
and cable cars. We compare our model with existing approaches and demonstrate
that it achieves state-of-the-art performance in quality and diversity for both
generated images and geometries.Comment: Accepted by CVPR 202
DMV3D: Denoising Multi-View Diffusion using 3D Large Reconstruction Model
We propose \textbf{DMV3D}, a novel 3D generation approach that uses a
transformer-based 3D large reconstruction model to denoise multi-view
diffusion. Our reconstruction model incorporates a triplane NeRF representation
and can denoise noisy multi-view images via NeRF reconstruction and rendering,
achieving single-stage 3D generation in 30s on single A100 GPU. We train
\textbf{DMV3D} on large-scale multi-view image datasets of highly diverse
objects using only image reconstruction losses, without accessing 3D assets. We
demonstrate state-of-the-art results for the single-image reconstruction
problem where probabilistic modeling of unseen object parts is required for
generating diverse reconstructions with sharp textures. We also show
high-quality text-to-3D generation results outperforming previous 3D diffusion
models. Our project website is at: https://justimyhxu.github.io/projects/dmv3d/ .Comment: Project Page: https://justimyhxu.github.io/projects/dmv3d
Metaverse: A Vision, Architectural Elements, and Future Directions for Scalable and Realtime Virtual Worlds
With the emergence of Cloud computing, Internet of Things-enabled
Human-Computer Interfaces, Generative Artificial Intelligence, and
high-accurate Machine and Deep-learning recognition and predictive models,
along with the Post Covid-19 proliferation of social networking, and remote
communications, the Metaverse gained a lot of popularity. Metaverse has the
prospective to extend the physical world using virtual and augmented reality so
the users can interact seamlessly with the real and virtual worlds using
avatars and holograms. It has the potential to impact people in the way they
interact on social media, collaborate in their work, perform marketing and
business, teach, learn, and even access personalized healthcare. Several works
in the literature examine Metaverse in terms of hardware wearable devices, and
virtual reality gaming applications. However, the requirements of realizing the
Metaverse in realtime and at a large-scale need yet to be examined for the
technology to be usable. To address this limitation, this paper presents the
temporal evolution of Metaverse definitions and captures its evolving
requirements. Consequently, we provide insights into Metaverse requirements. In
addition to enabling technologies, we lay out architectural elements for
scalable, reliable, and efficient Metaverse systems, and a classification of
existing Metaverse applications along with proposing required future research
directions
- …