17 research outputs found
D-IF: Uncertainty-aware Human Digitization via Implicit Distribution Field
Realistic virtual humans play a crucial role in numerous industries, such as
metaverse, intelligent healthcare, and self-driving simulation. But creating
them on a large scale with high levels of realism remains a challenge. The
utilization of deep implicit function sparks a new era of image-based 3D
clothed human reconstruction, enabling pixel-aligned shape recovery with fine
details. Subsequently, the vast majority of works locate the surface by
regressing the deterministic implicit value for each point. However, should all
points be treated equally regardless of their proximity to the surface? In this
paper, we propose replacing the implicit value with an adaptive uncertainty
distribution, to differentiate between points based on their distance to the
surface. This simple ``value to distribution'' transition yields significant
improvements on nearly all the baselines. Furthermore, qualitative results
demonstrate that the models trained using our uncertainty distribution loss,
can capture more intricate wrinkles, and realistic limbs. Code and models are
available for research purposes at https://github.com/psyai-net/D-IF_release
TeCH: Text-guided Reconstruction of Lifelike Clothed Humans
Despite recent research advancements in reconstructing clothed humans from a
single image, accurately restoring the "unseen regions" with high-level details
remains an unsolved challenge that lacks attention. Existing methods often
generate overly smooth back-side surfaces with a blurry texture. But how to
effectively capture all visual attributes of an individual from a single image,
which are sufficient to reconstruct unseen areas (e.g., the back view)?
Motivated by the power of foundation models, TeCH reconstructs the 3D human by
leveraging 1) descriptive text prompts (e.g., garments, colors, hairstyles)
which are automatically generated via a garment parsing model and Visual
Question Answering (VQA), 2) a personalized fine-tuned Text-to-Image diffusion
model (T2I) which learns the "indescribable" appearance. To represent
high-resolution 3D clothed humans at an affordable cost, we propose a hybrid 3D
representation based on DMTet, which consists of an explicit body shape grid
and an implicit distance field. Guided by the descriptive prompts +
personalized T2I diffusion model, the geometry and texture of the 3D humans are
optimized through multi-view Score Distillation Sampling (SDS) and
reconstruction losses based on the original observation. TeCH produces
high-fidelity 3D clothed humans with consistent & delicate texture, and
detailed full-body geometry. Quantitative and qualitative experiments
demonstrate that TeCH outperforms the state-of-the-art methods in terms of
reconstruction accuracy and rendering quality. The code will be publicly
available for research purposes at https://huangyangyi.github.io/TeCHComment: Project: https://huangyangyi.github.io/TeCH, Code:
https://github.com/huangyangyi/TeC
Ghost on the Shell: An Expressive Representation of General 3D Shapes
The creation of photorealistic virtual worlds requires the accurate modeling
of 3D surface geometry for a wide range of objects. For this, meshes are
appealing since they 1) enable fast physics-based rendering with realistic
material and lighting, 2) support physical simulation, and 3) are
memory-efficient for modern graphics pipelines. Recent work on reconstructing
and statistically modeling 3D shape, however, has critiqued meshes as being
topologically inflexible. To capture a wide range of object shapes, any 3D
representation must be able to model solid, watertight, shapes as well as thin,
open, surfaces. Recent work has focused on the former, and methods for
reconstructing open surfaces do not support fast reconstruction with material
and lighting or unconditional generative modelling. Inspired by the observation
that open surfaces can be seen as islands floating on watertight surfaces, we
parameterize open surfaces by defining a manifold signed distance field on
watertight templates. With this parameterization, we further develop a
grid-based and differentiable representation that parameterizes both watertight
and non-watertight meshes of arbitrary topology. Our new representation, called
Ghost-on-the-Shell (G-Shell), enables two important applications:
differentiable rasterization-based reconstruction from multiview images and
generative modelling of non-watertight meshes. We empirically demonstrate that
G-Shell achieves state-of-the-art performance on non-watertight mesh
reconstruction and generation tasks, while also performing effectively for
watertight meshes.Comment: Technical Report (26 pages, 16 figures, Project Page:
https://gshell3d.github.io/
TADA! Text to Animatable Digital Avatars
We introduce TADA, a simple-yet-effective approach that takes textual
descriptions and produces expressive 3D avatars with high-quality geometry and
lifelike textures, that can be animated and rendered with traditional graphics
pipelines. Existing text-based character generation methods are limited in
terms of geometry and texture quality, and cannot be realistically animated due
to inconsistent alignment between the geometry and the texture, particularly in
the face region. To overcome these limitations, TADA leverages the synergy of a
2D diffusion model and an animatable parametric body model. Specifically, we
derive an optimizable high-resolution body model from SMPL-X with 3D
displacements and a texture map, and use hierarchical rendering with score
distillation sampling (SDS) to create high-quality, detailed, holistic 3D
avatars from text. To ensure alignment between the geometry and texture, we
render normals and RGB images of the generated character and exploit their
latent embeddings in the SDS training process. We further introduce various
expression parameters to deform the generated character during training,
ensuring that the semantics of our generated character remain consistent with
the original SMPL-X model, resulting in an animatable character. Comprehensive
evaluations demonstrate that TADA significantly surpasses existing approaches
on both qualitative and quantitative measures. TADA enables creation of
large-scale digital character assets that are ready for animation and
rendering, while also being easily editable through natural language. The code
will be public for research purposes
AlphaPose: Whole-Body Regional Multi-Person Pose Estimation and Tracking in Real-Time
Accurate whole-body multi-person pose estimation and tracking is an important
yet challenging topic in computer vision. To capture the subtle actions of
humans for complex behavior analysis, whole-body pose estimation including the
face, body, hand and foot is essential over conventional body-only pose
estimation. In this paper, we present AlphaPose, a system that can perform
accurate whole-body pose estimation and tracking jointly while running in
realtime. To this end, we propose several new techniques: Symmetric Integral
Keypoint Regression (SIKR) for fast and fine localization, Parametric Pose
Non-Maximum-Suppression (P-NMS) for eliminating redundant human detections and
Pose Aware Identity Embedding for jointly pose estimation and tracking. During
training, we resort to Part-Guided Proposal Generator (PGPG) and multi-domain
knowledge distillation to further improve the accuracy. Our method is able to
localize whole-body keypoints accurately and tracks humans simultaneously given
inaccurate bounding boxes and redundant detections. We show a significant
improvement over current state-of-the-art methods in both speed and accuracy on
COCO-wholebody, COCO, PoseTrack, and our proposed Halpe-FullBody pose
estimation dataset. Our model, source codes and dataset are made publicly
available at https://github.com/MVIG-SJTU/AlphaPose.Comment: Documents for AlphaPose, accepted to TPAM
High-Fidelity Clothed Avatar Reconstruction from a Single Image
This paper presents a framework for efficient 3D clothed avatar
reconstruction. By combining the advantages of the high accuracy of
optimization-based methods and the efficiency of learning-based methods, we
propose a coarse-to-fine way to realize a high-fidelity clothed avatar
reconstruction (CAR) from a single image. At the first stage, we use an
implicit model to learn the general shape in the canonical space of a person in
a learning-based way, and at the second stage, we refine the surface detail by
estimating the non-rigid deformation in the posed space in an optimization way.
A hyper-network is utilized to generate a good initialization so that the
convergence o f the optimization process is greatly accelerated. Extensive
experiments on various datasets show that the proposed CAR successfully
produces high-fidelity avatars for arbitrarily clothed humans in real scenes
Parameter-Efficient Orthogonal Finetuning via Butterfly Factorization
Large foundation models are becoming ubiquitous, but training them from
scratch is prohibitively expensive. Thus, efficiently adapting these powerful
models to downstream tasks is increasingly important. In this paper, we study a
principled finetuning paradigm -- Orthogonal Finetuning (OFT) -- for downstream
task adaptation. Despite demonstrating good generalizability, OFT still uses a
fairly large number of trainable parameters due to the high dimensionality of
orthogonal matrices. To address this, we start by examining OFT from an
information transmission perspective, and then identify a few key desiderata
that enable better parameter-efficiency. Inspired by how the Cooley-Tukey fast
Fourier transform algorithm enables efficient information transmission, we
propose an efficient orthogonal parameterization using butterfly structures. We
apply this parameterization to OFT, creating a novel parameter-efficient
finetuning method, called Orthogonal Butterfly (BOFT). By subsuming OFT as a
special case, BOFT introduces a generalized orthogonal finetuning framework.
Finally, we conduct an extensive empirical study of adapting large vision
transformers, large language models, and text-to-image diffusion models to
various downstream tasks in vision and language.Comment: Technical Report (33 pages, 18 figures
Intraoperative ultrasound-guided iodine-125 seed implantation for unresectable pancreatic carcinoma
<p>Abstract</p> <p>Background</p> <p>To assess the feasibility and efficacy of using <sup>125</sup>I seed implantation under intraoperative ultrasound guidance for unresectable pancreatic carcinoma.</p> <p>Methods</p> <p>Fourteen patients with pancreatic carcinoma that underwent laparotomy and considered unresectable were included in this study. Nine patients were pathologically diagnosed with Stage II disease, five patients with Stage III disease. Fourteen patients were treated with <sup>125</sup>I seed implantation guided by intraoperative ultrasound and received D<sub>90 </sub>of <sup>125</sup>I seeds ranging from 60 to 140 Gy with a median of 120 Gy. Five patients received an additional 35–50 Gy from external beam radiotherapy after seed implantation and six patients received 2–6 cycles of chemotherapy.</p> <p>Results</p> <p>87.5% (7/8) of patients received partial to complete pain relief. The response rate of tumor was 78.6%, One-, two-and three-year survival rates were 33.9% and 16.9%, 7.8%, with local control of disease achieved in 78.6% (11/14), and the median survival was 10 months (95% CI: 7.7–12.3).</p> <p>Conclusion</p> <p>There were no deaths related to <sup>125</sup>I seed implant. In this preliminary investigation, <sup>125</sup>I seed implant provided excellent palliation of pain relief, local control and prolong the survival of patients with stage II and III disease to some extent.</p