An Implementation of Multimodal Fusion System for Intelligent Digital
  Human Generation

Bi, Kaiyue; Chen, Yaodong; Liu, Hui; Xiong, Lian; Zhou, Yingjie

An Implementation of Multimodal Fusion System for Intelligent Digital Human Generation

Authors: Kaiyue Bi
Yaodong Chen
Hui Liu
Lian Xiong
Yingjie Zhou
Publication date: 31 October 2023
Publisher

Abstract

With the rapid development of artificial intelligence (AI), digital humans have attracted more and more attention and are expected to achieve a wide range of applications in several industries. Then, most of the existing digital humans still rely on manual modeling by designers, which is a cumbersome process and has a long development cycle. Therefore, facing the rise of digital humans, there is an urgent need for a digital human generation system combined with AI to improve development efficiency. In this paper, an implementation scheme of an intelligent digital human generation system with multimodal fusion is proposed. Specifically, text, speech and image are taken as inputs, and interactive speech is synthesized using large language model (LLM), voiceprint extraction, and text-to-speech conversion techniques. Then the input image is age-transformed and a suitable image is selected as the driving image. Then, the modification and generation of digital human video content is realized by digital human driving, novel view synthesis, and intelligent dressing techniques. Finally, we enhance the user experience through style transfer, super-resolution, and quality evaluation. Experimental results show that the system can effectively realize digital human generation. The related code is released at https://github.com/zyj-2000/CUMT_2D_PhotoSpeaker

Similar works

Full text

Available Versions

arXiv.org e-Print Archive

oai:arXiv.org:2310.20251

Last time updated on 18/01/2024