9 research outputs found
Deep Deformable Models: Learning 3D Shape Abstractions with Part Consistency
The task of shape abstraction with semantic part consistency is challenging
due to the complex geometries of natural objects. Recent methods learn to
represent an object shape using a set of simple primitives to fit the target.
\textcolor{black}{However, in these methods, the primitives used do not always
correspond to real parts or lack geometric flexibility for semantic
interpretation.} In this paper, we investigate salient and efficient primitive
descriptors for accurate shape abstractions, and propose \textit{Deep
Deformable Models (DDMs)}. DDM employs global deformations and diffeomorphic
local deformations. These properties enable DDM to abstract complex object
shapes with significantly fewer primitives that offer broader geometry coverage
and finer details. DDM is also capable of learning part-level semantic
correspondences due to the differentiable and invertible properties of our
primitive deformation. Moreover, DDM learning formulation is based on dynamic
and kinematic modeling, which enables joint regularization of each
sub-transformation during primitive fitting. Extensive experiments on
\textit{ShapeNet} demonstrate that DDM outperforms the state-of-the-art in
terms of reconstruction and part consistency by a notable margin
Sign language video anonymization
Deaf signers who wish to communicate in their native language frequently share videos on the Web. However, videos cannot preserve privacy—as is often desirable for discussion of sensitive topics—since both hands and face convey critical linguistic information and therefore cannot be obscured without degrading communication. Deaf signers have expressed interest in video anonymization that would preserve linguistic content. However, attempts to develop such technology have thus far shown limited success. We are developing a new method for such anonymization, with input from ASL signers. We modify a motion-based image animation model to generate high-resolution videos with the signer identity changed, but with preservation of linguistically significant motions and facial expressions. An asymmetric encoder-decoder structured image generator is used to generate the high-resolution target frame from the low-resolution source frame based on the optical flow and confidence map. We explicitly guide the model to attain clear generation of hands and face by using bounding boxes to improve the loss computation. FID and KID scores are used for evaluation of the realism of the generated frames. This technology shows great potential for practical applications to benefit deaf signers.Published versio
Dealing With Heterogeneous 3D MR Knee Images: A Federated Few-Shot Learning Method With Dual Knowledge Distillation
Federated Learning has gained popularity among medical institutions since it
enables collaborative training between clients (e.g., hospitals) without
aggregating data. However, due to the high cost associated with creating
annotations, especially for large 3D image datasets, clinical institutions do
not have enough supervised data for training locally. Thus, the performance of
the collaborative model is subpar under limited supervision. On the other hand,
large institutions have the resources to compile data repositories with
high-resolution images and labels. Therefore, individual clients can utilize
the knowledge acquired in the public data repositories to mitigate the shortage
of private annotated images. In this paper, we propose a federated few-shot
learning method with dual knowledge distillation. This method allows joint
training with limited annotations across clients without jeopardizing privacy.
The supervised learning of the proposed method extracts features from limited
labeled data in each client, while the unsupervised data is used to distill
both feature and response-based knowledge from a national data repository to
further improve the accuracy of the collaborative model and reduce the
communication cost. Extensive evaluations are conducted on 3D magnetic
resonance knee images from a private clinical dataset. Our proposed method
shows superior performance and less training time than other semi-supervised
federated learning methods. Codes and additional visualization results are
available at https://github.com/hexiaoxiao-cs/fedml-knee
Region Proposal Rectification Towards Robust Instance Segmentation of Biological Images
Top-down instance segmentation framework has shown its superiority in object
detection compared to the bottom-up framework. While it is efficient in
addressing over-segmentation, top-down instance segmentation suffers from
over-crop problem. However, a complete segmentation mask is crucial for
biological image analysis as it delivers important morphological properties
such as shapes and volumes. In this paper, we propose a region proposal
rectification (RPR) module to address this challenging incomplete segmentation
problem. In particular, we offer a progressive ROIAlign module to introduce
neighbor information into a series of ROIs gradually. The ROI features are fed
into an attentive feed-forward network (FFN) for proposal box regression. With
additional neighbor information, the proposed RPR module shows significant
improvement in correction of region proposal locations and thereby exhibits
favorable instance segmentation performances on three biological image datasets
compared to state-of-the-art baseline methods. Experimental results demonstrate
that the proposed RPR module is effective in both anchor-based and anchor-free
top-down instance segmentation approaches, suggesting the proposed method can
be applied to general top-down instance segmentation of biological images. Code
is available
Improving Negative-Prompt Inversion via Proximal Guidance
DDIM inversion has revealed the remarkable potential of real image editing
within diffusion-based methods. However, the accuracy of DDIM reconstruction
degrades as larger classifier-free guidance (CFG) scales being used for
enhanced editing. Null-text inversion (NTI) optimizes null embeddings to align
the reconstruction and inversion trajectories with larger CFG scales, enabling
real image editing with cross-attention control. Negative-prompt inversion
(NPI) further offers a training-free closed-form solution of NTI. However, it
may introduce artifacts and is still constrained by DDIM reconstruction
quality. To overcome these limitations, we propose Proximal Negative-Prompt
Inversion (ProxNPI), extending the concepts of NTI and NPI. We enhance NPI with
a regularization term and reconstruction guidance, which reduces artifacts
while capitalizing on its training-free nature. Our method provides an
efficient and straightforward approach, effectively addressing real image
editing tasks with minimal computational overhead.Comment: Code at https://github.com/phymhan/prompt-to-promp
Harnessing the Power of Artificial Intelligence to Teach Cleft Lip Surgery
Background:. Artificial intelligence (AI) leverages today’s exceptional computational powers and algorithmic abilities to learn from large data sets and solve complex problems. The aim of this study was to construct an AI model that can intelligently and reliably recognize the anatomy of cleft lip and nasal deformity and automate placement of nasolabial markings that can guide surgical design.
Methods:. We adopted the high-resolution net architecture, a recent family of convolutional neural networks–based deep learning architecture specialized in computer-vision tasks to train an AI model, which can detect and place the 21 cleft anthropometric points on cleft lip photographs and videos. The model was tested by calculating the Euclidean distance between hand-marked anthropometric points placed by an expert cleft surgeon to ones generated by our cleft AI model. A normalized mean error (NME) was calculated for each point.
Results:. All NME values were between 0.029 and 0.055. The largest NME was for cleft-side cphi. The smallest NME value was for cleft-side alare. These errors were well within standard AI benchmarks.
Conclusions:. We successfully developed an AI algorithm that can identify the 21 surgically important anatomic landmarks of the unilateral cleft lip. This model can be used alone or integrated with surface projection to guide various cleft lip/nose repairs. Having demonstrated the feasibility of creating such a model on the complex three-dimensional surface of the lip and nose, it is easy to envision expanding the use of AI models to understand all of human surface anatomy—the full territory and playground of plastic surgeons
DeepRecon : Joint 2D Cardiac Segmentation and 3D Volume Reconstruction via a Structure-Specific Generative Method
Joint 2D cardiac segmentation and 3D volume reconstruction are fundamental in building statistical cardiac anatomy models and understanding functional mechanisms from motion patterns. However, due to the low through-plane resolution of cine MR and high inter-subject variance, accurately segmenting cardiac images and reconstructing the 3D volume are challenging. In this study, we propose an end-to-end latent-space-based framework, DeepRecon, that generates multiple clinically essential outcomes, including accurate image segmentation, synthetic high-resolution 3D image, and 3D reconstructed volume. Our method identifies the optimal latent representation of the cine image that contains accurate semantic information for cardiac structures. In particular, our model jointly generates synthetic images with accurate semantic information and segmentation of the cardiac structures using the optimal latent representation. We further explore downstream applications of 3D shape reconstruction and 4D motion pattern adaptation by the different latent-space manipulation strategies. The simultaneously generated high-resolution images present a high interpretable value to assess the cardiac shape and motion. Experimental results demonstrate the effectiveness of our approach on multiple fronts including 2D segmentation, 3D reconstruction, downstream 4D motion pattern adaption performance
TransFusion: Multi-view Divergent Fusion for Medical Image Segmentation with Transformers
Combining information from multi-view images is crucial to improve the
performance and robustness of automated methods for disease diagnosis. However,
due to the non-alignment characteristics of multi-view images, building
correlation and data fusion across views largely remain an open problem. In
this study, we present TransFusion, a Transformer-based architecture to merge
divergent multi-view imaging information using convolutional layers and
powerful attention mechanisms. In particular, the Divergent Fusion Attention
(DiFA) module is proposed for rich cross-view context modeling and semantic
dependency mining, addressing the critical issue of capturing long-range
correlations between unaligned data from different image views. We further
propose the Multi-Scale Attention (MSA) to collect global correspondence of
multi-scale feature representations. We evaluate TransFusion on the
Multi-Disease, Multi-View \& Multi-Center Right Ventricular Segmentation in
Cardiac MRI (M\&Ms-2) challenge cohort. TransFusion demonstrates leading
performance against the state-of-the-art methods and opens up new perspectives
for multi-view imaging integration towards robust medical image segmentation
DeepRecon: Joint 2D Cardiac Segmentation and 3D Volume Reconstruction via A Structure-Specific Generative Method
Joint 2D cardiac segmentation and 3D volume reconstruction are fundamental to
building statistical cardiac anatomy models and understanding functional
mechanisms from motion patterns. However, due to the low through-plane
resolution of cine MR and high inter-subject variance, accurately segmenting
cardiac images and reconstructing the 3D volume are challenging. In this study,
we propose an end-to-end latent-space-based framework, DeepRecon, that
generates multiple clinically essential outcomes, including accurate image
segmentation, synthetic high-resolution 3D image, and 3D reconstructed volume.
Our method identifies the optimal latent representation of the cine image that
contains accurate semantic information for cardiac structures. In particular,
our model jointly generates synthetic images with accurate semantic information
and segmentation of the cardiac structures using the optimal latent
representation. We further explore downstream applications of 3D shape
reconstruction and 4D motion pattern adaptation by the different latent-space
manipulation strategies.The simultaneously generated high-resolution images
present a high interpretable value to assess the cardiac shape and
motion.Experimental results demonstrate the effectiveness of our approach on
multiple fronts including 2D segmentation, 3D reconstruction, downstream 4D
motion pattern adaption performance.Comment: MICCAI202