56 research outputs found
AI-generated Content for Various Data Modalities: A Survey
AI-generated content (AIGC) methods aim to produce text, images, videos, 3D
assets, and other media using AI algorithms. Due to its wide range of
applications and the demonstrated potential of recent works, AIGC developments
have been attracting lots of attention recently, and AIGC methods have been
developed for various data modalities, such as image, video, text, 3D shape (as
voxels, point clouds, meshes, and neural implicit fields), 3D scene, 3D human
avatar (body and head), 3D motion, and audio -- each presenting different
characteristics and challenges. Furthermore, there have also been many
significant developments in cross-modality AIGC methods, where generative
methods can receive conditioning input in one modality and produce outputs in
another. Examples include going from various modalities to image, video, 3D
shape, 3D scene, 3D avatar (body and head), 3D motion (skeleton and avatar),
and audio modalities. In this paper, we provide a comprehensive review of AIGC
methods across different data modalities, including both single-modality and
cross-modality methods, highlighting the various challenges, representative
works, and recent technical directions in each setting. We also survey the
representative datasets throughout the modalities, and present comparative
results for various modalities. Moreover, we also discuss the challenges and
potential future research directions
AvatarFusion: Zero-shot Generation of Clothing-Decoupled 3D Avatars Using 2D Diffusion
Large-scale pre-trained vision-language models allow for the zero-shot
text-based generation of 3D avatars. The previous state-of-the-art method
utilized CLIP to supervise neural implicit models that reconstructed a human
body mesh. However, this approach has two limitations. Firstly, the lack of
avatar-specific models can cause facial distortion and unrealistic clothing in
the generated avatars. Secondly, CLIP only provides optimization direction for
the overall appearance, resulting in less impressive results. To address these
limitations, we propose AvatarFusion, the first framework to use a latent
diffusion model to provide pixel-level guidance for generating human-realistic
avatars while simultaneously segmenting clothing from the avatar's body.
AvatarFusion includes the first clothing-decoupled neural implicit avatar model
that employs a novel Dual Volume Rendering strategy to render the decoupled
skin and clothing sub-models in one space. We also introduce a novel
optimization method, called Pixel-Semantics Difference-Sampling (PS-DS), which
semantically separates the generation of body and clothes, and generates a
variety of clothing styles. Moreover, we establish the first benchmark for
zero-shot text-to-avatar generation. Our experimental results demonstrate that
our framework outperforms previous approaches, with significant improvements
observed in all metrics. Additionally, since our model is clothing-decoupled,
we can exchange the clothes of avatars. Code will be available on Github
Desarrollo de un sistema de seguimiento de usuarios con iPhone para visualizarlos en un modelo 3D
El objetivo de este proyecto es desarrollar un sistema de seguimiento de usuarios con un iPhone y un modelo 3D del campus de la Technical University of Denmark. El usuario podrá activar el seguimiento tras abrir una aplicación en el iPhone siempre y cuando se encuentre en alguna de las áreas donde haya un modelo 3D disponible. Los usuarios que hayan activado el seguimiento serán mostrados en estos modelos 3D en forma de avatares. Los modelos 3D junto con los avatares pueden ser visualizados usando cualquier navegador de escritorio en la página web realsite.dk. Los sensores GPS de los Smartphones no son normalmente muy precisos. Para desarrollar buenos algoritmos en el sistema de seguimiento requerido, la precisión de este sensor tiene que ser analizada. Por esta razón el proyecto empieza con un extenso estudio de la precisión de los sistemas de localización en el iPhone y de los parámetros que pueden configurarse. Se estudian tanto posiciones fijas como en movimiento. Este estudio revela que el error medio en posiciones estáticas es en torno a 8 metros y bastante mayor para las posiciones en movimiento. Sin embargo es muy rápido determinando la primera posición lo cual lo hace en menos de 10 segundos en la mayoría de los casos. Utilizando los resultados de este estudio, se han diseñado varios filtros para eliminar las posiciones menos precisas. Además, también se ha desarrollado una técnica que permite detectar cuando el usuario entra dentro de un edificio sin usar ninguna información adicional más que la que los servicios de localización ofrecen. Las dos partes mas importantes de este sistema han sido desarrolladas en su totalidad en este proyecto fin de carrera. Estas son una aplicación para el sistema operativo móvil iOS y un algoritmo para representar a los avatares de los usuarios en los modelos 3D. La aplicación recoge las posiciones de los usuarios, utilizando el GPS del dispositivo, las filtra, las guarda y las manda a un servidor de internet donde son almacenadas en una base de datos. También permite visualizar las sesiones anteriores en las que el seguimiento ha sido activado y tomar una foto que será utilizada en el avatar del usuario. La representación de los avatares en el modelo no se puede llevar a cabo con las posiciones que el dispositivo iOS obtiene ya que no son suficientemente precisas. Por lo que se diseñó un algoritmo que genera a partir de las posiciones GPS recibidas una ruta realista, factible y libre de obstáculos en el modelo. Un detalle importante por ejemplo, es que hace que los avatares utilicen escaleras y puertas de edificios cuando se detecta que han cambiado de altitud o entrado a un edificio respectivamente
Malliavin and dirichlet structures for independent random variables
On any denumerable product of probability spaces, we construct a Malliavin
gradient and then a divergence and a number operator. This yields a Dirichlet
structure which can be shown to approach the usual structures for Poisson and
Brownian processes. We obtain versions of almost all the classical functional
inequalities in discrete settings which show that the Efron-Stein inequality
can be interpreted as a Poincar{\'e} inequality or that Hoeffding decomposition
of U-statistics can be interpreted as a chaos decomposition. We obtain a
version of the Lyapounov central limit theorem for independent random variables
without resorting to ad-hoc couplings, thus increasing the scope of the Stein
method
3DPortraitGAN: Learning One-Quarter Headshot 3D GANs from a Single-View Portrait Dataset with Diverse Body Poses
3D-aware face generators are typically trained on 2D real-life face image
datasets that primarily consist of near-frontal face data, and as such, they
are unable to construct one-quarter headshot 3D portraits with complete head,
neck, and shoulder geometry. Two reasons account for this issue: First,
existing facial recognition methods struggle with extracting facial data
captured from large camera angles or back views. Second, it is challenging to
learn a distribution of 3D portraits covering the one-quarter headshot region
from single-view data due to significant geometric deformation caused by
diverse body poses. To this end, we first create the dataset
360{\deg}-Portrait-HQ (360{\deg}PHQ for short) which consists of high-quality
single-view real portraits annotated with a variety of camera parameters (the
yaw angles span the entire 360{\deg} range) and body poses. We then propose
3DPortraitGAN, the first 3D-aware one-quarter headshot portrait generator that
learns a canonical 3D avatar distribution from the 360{\deg}PHQ dataset with
body pose self-learning. Our model can generate view-consistent portrait images
from all camera angles with a canonical one-quarter headshot 3D representation.
Our experiments show that the proposed framework can accurately predict
portrait body poses and generate view-consistent, realistic portrait images
with complete geometry from all camera angles
A Comprehensive Review of Data-Driven Co-Speech Gesture Generation
Gestures that accompany speech are an essential part of natural and efficient
embodied human communication. The automatic generation of such co-speech
gestures is a long-standing problem in computer animation and is considered an
enabling technology in film, games, virtual social spaces, and for interaction
with social robots. The problem is made challenging by the idiosyncratic and
non-periodic nature of human co-speech gesture motion, and by the great
diversity of communicative functions that gestures encompass. Gesture
generation has seen surging interest recently, owing to the emergence of more
and larger datasets of human gesture motion, combined with strides in
deep-learning-based generative models, that benefit from the growing
availability of data. This review article summarizes co-speech gesture
generation research, with a particular focus on deep generative models. First,
we articulate the theory describing human gesticulation and how it complements
speech. Next, we briefly discuss rule-based and classical statistical gesture
synthesis, before delving into deep learning approaches. We employ the choice
of input modalities as an organizing principle, examining systems that generate
gestures from audio, text, and non-linguistic input. We also chronicle the
evolution of the related training data sets in terms of size, diversity, motion
quality, and collection method. Finally, we identify key research challenges in
gesture generation, including data availability and quality; producing
human-like motion; grounding the gesture in the co-occurring speech in
interaction with other speakers, and in the environment; performing gesture
evaluation; and integration of gesture synthesis into applications. We
highlight recent approaches to tackling the various key challenges, as well as
the limitations of these approaches, and point toward areas of future
development.Comment: Accepted for EUROGRAPHICS 202
Self-assembly of thermo and light responsive amphiphilic linear dendritic block copolymers
The synthesis and structural characterization of a new dual responsive linear-dendritic block copolymer (LDBC) is presented. The LDBC is constituted by a thermoresponsive linear block from polymethacrylate of oligo- and diethylene glycol, and a light responsive den- dron block of bis-MPA decorated at the periphery with 4-isobutyloxyazobenzene and alkyl chains in a 50:50 M ratio. Blocks are coupled together by copper(I) catalyzed alkyne–azide cycloaddition (CuAAC). The ability of the LDBC to form vesicle self-assemblies in water is described, as well as the effect of light and temperature on the vesicles morphology, on the basis of transmission electron microscopy (TEM), dynamic light scattering (DLS) and UV–vis spectroscopy studies. The effect of UV light and temperature on the vesicles struc- ture by SAXS and WAXS conducted on real time is also presented. Finally, the potential use of the vesicles to load and stimuli controlled release of small fluorescent molecules is probed
HDHumans: A Hybrid Approach for High-fidelity Digital Humans
Photo-real digital human avatars are of enormous importance in graphics, asthey enable immersive communication over the globe, improve gaming andentertainment experiences, and can be particularly beneficial for AR and VRsettings. However, current avatar generation approaches either fall short inhigh-fidelity novel view synthesis, generalization to novel motions,reproduction of loose clothing, or they cannot render characters at the highresolution offered by modern displays. To this end, we propose HDHumans, whichis the first method for HD human character synthesis that jointly produces anaccurate and temporally coherent 3D deforming surface and highlyphoto-realistic images of arbitrary novel views and of motions not seen attraining time. At the technical core, our method tightly integrates a classicaldeforming character template with neural radiance fields (NeRF). Our method iscarefully designed to achieve a synergy between classical surface deformationand NeRF. First, the template guides the NeRF, which allows synthesizing novelviews of a highly dynamic and articulated character and even enables thesynthesis of novel motions. Second, we also leverage the dense pointcloudsresulting from NeRF to further improve the deforming surface via 3D-to-3Dsupervision. We outperform the state of the art quantitatively andqualitatively in terms of synthesis quality and resolution, as well as thequality of 3D surface reconstruction.<br
- …