56 research outputs found

    AI-generated Content for Various Data Modalities: A Survey

    Full text link
    AI-generated content (AIGC) methods aim to produce text, images, videos, 3D assets, and other media using AI algorithms. Due to its wide range of applications and the demonstrated potential of recent works, AIGC developments have been attracting lots of attention recently, and AIGC methods have been developed for various data modalities, such as image, video, text, 3D shape (as voxels, point clouds, meshes, and neural implicit fields), 3D scene, 3D human avatar (body and head), 3D motion, and audio -- each presenting different characteristics and challenges. Furthermore, there have also been many significant developments in cross-modality AIGC methods, where generative methods can receive conditioning input in one modality and produce outputs in another. Examples include going from various modalities to image, video, 3D shape, 3D scene, 3D avatar (body and head), 3D motion (skeleton and avatar), and audio modalities. In this paper, we provide a comprehensive review of AIGC methods across different data modalities, including both single-modality and cross-modality methods, highlighting the various challenges, representative works, and recent technical directions in each setting. We also survey the representative datasets throughout the modalities, and present comparative results for various modalities. Moreover, we also discuss the challenges and potential future research directions

    AvatarFusion: Zero-shot Generation of Clothing-Decoupled 3D Avatars Using 2D Diffusion

    Full text link
    Large-scale pre-trained vision-language models allow for the zero-shot text-based generation of 3D avatars. The previous state-of-the-art method utilized CLIP to supervise neural implicit models that reconstructed a human body mesh. However, this approach has two limitations. Firstly, the lack of avatar-specific models can cause facial distortion and unrealistic clothing in the generated avatars. Secondly, CLIP only provides optimization direction for the overall appearance, resulting in less impressive results. To address these limitations, we propose AvatarFusion, the first framework to use a latent diffusion model to provide pixel-level guidance for generating human-realistic avatars while simultaneously segmenting clothing from the avatar's body. AvatarFusion includes the first clothing-decoupled neural implicit avatar model that employs a novel Dual Volume Rendering strategy to render the decoupled skin and clothing sub-models in one space. We also introduce a novel optimization method, called Pixel-Semantics Difference-Sampling (PS-DS), which semantically separates the generation of body and clothes, and generates a variety of clothing styles. Moreover, we establish the first benchmark for zero-shot text-to-avatar generation. Our experimental results demonstrate that our framework outperforms previous approaches, with significant improvements observed in all metrics. Additionally, since our model is clothing-decoupled, we can exchange the clothes of avatars. Code will be available on Github

    Desarrollo de un sistema de seguimiento de usuarios con iPhone para visualizarlos en un modelo 3D

    Get PDF
    El objetivo de este proyecto es desarrollar un sistema de seguimiento de usuarios con un iPhone y un modelo 3D del campus de la Technical University of Denmark. El usuario podrá activar el seguimiento tras abrir una aplicación en el iPhone siempre y cuando se encuentre en alguna de las áreas donde haya un modelo 3D disponible. Los usuarios que hayan activado el seguimiento serán mostrados en estos modelos 3D en forma de avatares. Los modelos 3D junto con los avatares pueden ser visualizados usando cualquier navegador de escritorio en la página web realsite.dk. Los sensores GPS de los Smartphones no son normalmente muy precisos. Para desarrollar buenos algoritmos en el sistema de seguimiento requerido, la precisión de este sensor tiene que ser analizada. Por esta razón el proyecto empieza con un extenso estudio de la precisión de los sistemas de localización en el iPhone y de los parámetros que pueden configurarse. Se estudian tanto posiciones fijas como en movimiento. Este estudio revela que el error medio en posiciones estáticas es en torno a 8 metros y bastante mayor para las posiciones en movimiento. Sin embargo es muy rápido determinando la primera posición lo cual lo hace en menos de 10 segundos en la mayoría de los casos. Utilizando los resultados de este estudio, se han diseñado varios filtros para eliminar las posiciones menos precisas. Además, también se ha desarrollado una técnica que permite detectar cuando el usuario entra dentro de un edificio sin usar ninguna información adicional más que la que los servicios de localización ofrecen. Las dos partes mas importantes de este sistema han sido desarrolladas en su totalidad en este proyecto fin de carrera. Estas son una aplicación para el sistema operativo móvil iOS y un algoritmo para representar a los avatares de los usuarios en los modelos 3D. La aplicación recoge las posiciones de los usuarios, utilizando el GPS del dispositivo, las filtra, las guarda y las manda a un servidor de internet donde son almacenadas en una base de datos. También permite visualizar las sesiones anteriores en las que el seguimiento ha sido activado y tomar una foto que será utilizada en el avatar del usuario. La representación de los avatares en el modelo no se puede llevar a cabo con las posiciones que el dispositivo iOS obtiene ya que no son suficientemente precisas. Por lo que se diseñó un algoritmo que genera a partir de las posiciones GPS recibidas una ruta realista, factible y libre de obstáculos en el modelo. Un detalle importante por ejemplo, es que hace que los avatares utilicen escaleras y puertas de edificios cuando se detecta que han cambiado de altitud o entrado a un edificio respectivamente

    Malliavin and dirichlet structures for independent random variables

    Get PDF
    On any denumerable product of probability spaces, we construct a Malliavin gradient and then a divergence and a number operator. This yields a Dirichlet structure which can be shown to approach the usual structures for Poisson and Brownian processes. We obtain versions of almost all the classical functional inequalities in discrete settings which show that the Efron-Stein inequality can be interpreted as a Poincar{\'e} inequality or that Hoeffding decomposition of U-statistics can be interpreted as a chaos decomposition. We obtain a version of the Lyapounov central limit theorem for independent random variables without resorting to ad-hoc couplings, thus increasing the scope of the Stein method

    3DPortraitGAN: Learning One-Quarter Headshot 3D GANs from a Single-View Portrait Dataset with Diverse Body Poses

    Full text link
    3D-aware face generators are typically trained on 2D real-life face image datasets that primarily consist of near-frontal face data, and as such, they are unable to construct one-quarter headshot 3D portraits with complete head, neck, and shoulder geometry. Two reasons account for this issue: First, existing facial recognition methods struggle with extracting facial data captured from large camera angles or back views. Second, it is challenging to learn a distribution of 3D portraits covering the one-quarter headshot region from single-view data due to significant geometric deformation caused by diverse body poses. To this end, we first create the dataset 360{\deg}-Portrait-HQ (360{\deg}PHQ for short) which consists of high-quality single-view real portraits annotated with a variety of camera parameters (the yaw angles span the entire 360{\deg} range) and body poses. We then propose 3DPortraitGAN, the first 3D-aware one-quarter headshot portrait generator that learns a canonical 3D avatar distribution from the 360{\deg}PHQ dataset with body pose self-learning. Our model can generate view-consistent portrait images from all camera angles with a canonical one-quarter headshot 3D representation. Our experiments show that the proposed framework can accurately predict portrait body poses and generate view-consistent, realistic portrait images with complete geometry from all camera angles

    A Comprehensive Review of Data-Driven Co-Speech Gesture Generation

    Full text link
    Gestures that accompany speech are an essential part of natural and efficient embodied human communication. The automatic generation of such co-speech gestures is a long-standing problem in computer animation and is considered an enabling technology in film, games, virtual social spaces, and for interaction with social robots. The problem is made challenging by the idiosyncratic and non-periodic nature of human co-speech gesture motion, and by the great diversity of communicative functions that gestures encompass. Gesture generation has seen surging interest recently, owing to the emergence of more and larger datasets of human gesture motion, combined with strides in deep-learning-based generative models, that benefit from the growing availability of data. This review article summarizes co-speech gesture generation research, with a particular focus on deep generative models. First, we articulate the theory describing human gesticulation and how it complements speech. Next, we briefly discuss rule-based and classical statistical gesture synthesis, before delving into deep learning approaches. We employ the choice of input modalities as an organizing principle, examining systems that generate gestures from audio, text, and non-linguistic input. We also chronicle the evolution of the related training data sets in terms of size, diversity, motion quality, and collection method. Finally, we identify key research challenges in gesture generation, including data availability and quality; producing human-like motion; grounding the gesture in the co-occurring speech in interaction with other speakers, and in the environment; performing gesture evaluation; and integration of gesture synthesis into applications. We highlight recent approaches to tackling the various key challenges, as well as the limitations of these approaches, and point toward areas of future development.Comment: Accepted for EUROGRAPHICS 202

    Self-assembly of thermo and light responsive amphiphilic linear dendritic block copolymers

    Get PDF
    The synthesis and structural characterization of a new dual responsive linear-dendritic block copolymer (LDBC) is presented. The LDBC is constituted by a thermoresponsive linear block from polymethacrylate of oligo- and diethylene glycol, and a light responsive den- dron block of bis-MPA decorated at the periphery with 4-isobutyloxyazobenzene and alkyl chains in a 50:50 M ratio. Blocks are coupled together by copper(I) catalyzed alkyne–azide cycloaddition (CuAAC). The ability of the LDBC to form vesicle self-assemblies in water is described, as well as the effect of light and temperature on the vesicles morphology, on the basis of transmission electron microscopy (TEM), dynamic light scattering (DLS) and UV–vis spectroscopy studies. The effect of UV light and temperature on the vesicles struc- ture by SAXS and WAXS conducted on real time is also presented. Finally, the potential use of the vesicles to load and stimuli controlled release of small fluorescent molecules is probed

    HDHumans: A Hybrid Approach for High-fidelity Digital Humans

    Get PDF
    Photo-real digital human avatars are of enormous importance in graphics, asthey enable immersive communication over the globe, improve gaming andentertainment experiences, and can be particularly beneficial for AR and VRsettings. However, current avatar generation approaches either fall short inhigh-fidelity novel view synthesis, generalization to novel motions,reproduction of loose clothing, or they cannot render characters at the highresolution offered by modern displays. To this end, we propose HDHumans, whichis the first method for HD human character synthesis that jointly produces anaccurate and temporally coherent 3D deforming surface and highlyphoto-realistic images of arbitrary novel views and of motions not seen attraining time. At the technical core, our method tightly integrates a classicaldeforming character template with neural radiance fields (NeRF). Our method iscarefully designed to achieve a synergy between classical surface deformationand NeRF. First, the template guides the NeRF, which allows synthesizing novelviews of a highly dynamic and articulated character and even enables thesynthesis of novel motions. Second, we also leverage the dense pointcloudsresulting from NeRF to further improve the deforming surface via 3D-to-3Dsupervision. We outperform the state of the art quantitatively andqualitatively in terms of synthesis quality and resolution, as well as thequality of 3D surface reconstruction.<br
    corecore