Search CORE

9 research outputs found

Deep Deformable Models: Learning 3D Shape Abstractions with Part Consistency

Author: Gao Yunhe
Liu Di
Liu Ting
Metaxas Dimitris N.
Zhangli Qilong
Zhao Long
Publication venue
Publication date: 02/09/2023
Field of study

The task of shape abstraction with semantic part consistency is challenging due to the complex geometries of natural objects. Recent methods learn to represent an object shape using a set of simple primitives to fit the target. \textcolor{black}{However, in these methods, the primitives used do not always correspond to real parts or lack geometric flexibility for semantic interpretation.} In this paper, we investigate salient and efficient primitive descriptors for accurate shape abstractions, and propose \textit{Deep Deformable Models (DDMs)}. DDM employs global deformations and diffeomorphic local deformations. These properties enable DDM to abstract complex object shapes with significantly fewer primitives that offer broader geometry coverage and finer details. DDM is also capable of learning part-level semantic correspondences due to the differentiable and invertible properties of our primitive deformation. Moreover, DDM learning formulation is based on dynamic and kinematic modeling, which enables joint regularization of each sub-transformation during primitive fitting. Extensive experiments on \textit{ShapeNet} demonstrate that DDM outperforms the state-of-the-art in terms of reconstruction and part consistency by a notable margin

arXiv.org e-Print Archive

Sign language video anonymization

Author: Chen Yuxiao
Huenerfauth Matt
Metaxas Dimitri
Neidle Carol
Xia Zhaoyang
Zhangli Qilong
Publication venue: European Language Resources Association (ELRA)
Publication date: 25/06/2022
Field of study

Deaf signers who wish to communicate in their native language frequently share videos on the Web. However, videos cannot preserve privacy—as is often desirable for discussion of sensitive topics—since both hands and face convey critical linguistic information and therefore cannot be obscured without degrading communication. Deaf signers have expressed interest in video anonymization that would preserve linguistic content. However, attempts to develop such technology have thus far shown limited success. We are developing a new method for such anonymization, with input from ASL signers. We modify a motion-based image animation model to generate high-resolution videos with the signer identity changed, but with preservation of linguistically significant motions and facial expressions. An asymmetric encoder-decoder structured image generator is used to generate the high-resolution target frame from the low-resolution source frame based on the optical flow and confidence map. We explicitly guide the model to attain clear generation of hands and face by using bounding boxes to improve the loss computation. FID and KID scores are used for evaluation of the realism of the generated frames. This technology shows great potential for practical applications to benefit deaf signers.Published versio

Boston University Institutional Repository (OpenBU)

Dealing With Heterogeneous 3D MR Knee Images: A Federated Few-Shot Learning Method With Dual Knowledge Distillation

Author: Chang Qi
He Xiaoxiao
Li Kang
Liu Bo
Liu Di
Metaxas Dimitris N.
Si Liping
Tan Chaowei
Yao Weiwu
Zhangli Qilong
Zhao Liang
Publication venue
Publication date: 17/04/2023
Field of study

Federated Learning has gained popularity among medical institutions since it enables collaborative training between clients (e.g., hospitals) without aggregating data. However, due to the high cost associated with creating annotations, especially for large 3D image datasets, clinical institutions do not have enough supervised data for training locally. Thus, the performance of the collaborative model is subpar under limited supervision. On the other hand, large institutions have the resources to compile data repositories with high-resolution images and labels. Therefore, individual clients can utilize the knowledge acquired in the public data repositories to mitigate the shortage of private annotated images. In this paper, we propose a federated few-shot learning method with dual knowledge distillation. This method allows joint training with limited annotations across clients without jeopardizing privacy. The supervised learning of the proposed method extracts features from limited labeled data in each client, while the unsupervised data is used to distill both feature and response-based knowledge from a national data repository to further improve the accuracy of the collaborative model and reduce the communication cost. Extensive evaluations are conducted on 3D magnetic resonance knee images from a private clinical dataset. Our proposed method shows superior performance and less training time than other semi-supervised federated learning methods. Codes and additional visualization results are available at https://github.com/hexiaoxiao-cs/fedml-knee

arXiv.org e-Print Archive

Region Proposal Rectification Towards Robust Instance Segmentation of Biological Images

Author: Chang Qi
Gao Yunhe
Han Ligong
He Xiaoxiao
Liu Di
Metaxas Dimitris
Tang Haiming
Wang He
Wen Song
Xia Zhaoyang
Yi Jingru
Zhangli Qilong
Zhou Mu
Publication venue
Publication date: 29/07/2022
Field of study

Top-down instance segmentation framework has shown its superiority in object detection compared to the bottom-up framework. While it is efficient in addressing over-segmentation, top-down instance segmentation suffers from over-crop problem. However, a complete segmentation mask is crucial for biological image analysis as it delivers important morphological properties such as shapes and volumes. In this paper, we propose a region proposal rectification (RPR) module to address this challenging incomplete segmentation problem. In particular, we offer a progressive ROIAlign module to introduce neighbor information into a series of ROIs gradually. The ROI features are fed into an attentive feed-forward network (FFN) for proposal box regression. With additional neighbor information, the proposed RPR module shows significant improvement in correction of region proposal locations and thereby exhibits favorable instance segmentation performances on three biological image datasets compared to state-of-the-art baseline methods. Experimental results demonstrate that the proposed RPR module is effective in both anchor-based and anchor-free top-down instance segmentation approaches, suggesting the proposed method can be applied to general top-down instance segmentation of biological images. Code is available

arXiv.org e-Print Archive

Improving Negative-Prompt Inversion via Proximal Guidance

Author: Chen Qi
Chen Yuxiao
Gao Ruijiang
Han Ligong
Jiang Jindong
Liu Di
Metaxas Dimitris
Ren Mengwei
Song Kunpeng
Srivastava Akash
Stathopoulos Anastasis
Wen Song
Xia Zhaoyang
Zhang Zhixing
Zhangli Qilong
Publication venue
Publication date: 08/06/2023
Field of study

DDIM inversion has revealed the remarkable potential of real image editing within diffusion-based methods. However, the accuracy of DDIM reconstruction degrades as larger classifier-free guidance (CFG) scales being used for enhanced editing. Null-text inversion (NTI) optimizes null embeddings to align the reconstruction and inversion trajectories with larger CFG scales, enabling real image editing with cross-attention control. Negative-prompt inversion (NPI) further offers a training-free closed-form solution of NTI. However, it may introduce artifacts and is still constrained by DDIM reconstruction quality. To overcome these limitations, we propose Proximal Negative-Prompt Inversion (ProxNPI), extending the concepts of NTI and NPI. We enhance NPI with a regularization term and reconstruction guidance, which reduces artifacts while capitalizing on its training-free nature. Our method provides an efficient and straightforward approach, effectively addressing real image editing tasks with minimal computational overhead.Comment: Code at https://github.com/phymhan/prompt-to-promp

arXiv.org e-Print Archive

Harnessing the Power of Artificial Intelligence to Teach Cleft Lip Surgery

Author: James Hu BS
Lohrasb Ross Sayadi MD
Qilong Zhangli BS
Raj M. Vyas MD, FACS
Usama S. Hamdan MD, FICS
Publication venue: 'Ovid Technologies (Wolters Kluwer Health)'
Publication date: 01/07/2022
Field of study

Background:. Artificial intelligence (AI) leverages today’s exceptional computational powers and algorithmic abilities to learn from large data sets and solve complex problems. The aim of this study was to construct an AI model that can intelligently and reliably recognize the anatomy of cleft lip and nasal deformity and automate placement of nasolabial markings that can guide surgical design. Methods:. We adopted the high-resolution net architecture, a recent family of convolutional neural networks–based deep learning architecture specialized in computer-vision tasks to train an AI model, which can detect and place the 21 cleft anthropometric points on cleft lip photographs and videos. The model was tested by calculating the Euclidean distance between hand-marked anthropometric points placed by an expert cleft surgeon to ones generated by our cleft AI model. A normalized mean error (NME) was calculated for each point. Results:. All NME values were between 0.029 and 0.055. The largest NME was for cleft-side cphi. The smallest NME value was for cleft-side alare. These errors were well within standard AI benchmarks. Conclusions:. We successfully developed an AI algorithm that can identify the 21 surgically important anatomic landmarks of the unilateral cleft lip. This model can be used alone or integrated with surface projection to guide various cleft lip/nose repairs. Having demonstrated the feasibility of creating such a model on the complex three-dimensional surface of the lip and nose, it is easy to envision expanding the use of AI models to understand all of human surface anatomy—the full territory and playground of plastic surgeons

Directory of Open Access Journals

DeepRecon : Joint 2D Cardiac Segmentation and 3D Volume Reconstruction via a Structure-Specific Generative Method

Author: Al’Aref Subhi
Axel Leon
Chang Qi
Kanski Mikael
Liu Di
Metaxas Dimitris
Sawalha Khalid
Yan Zhennan
Ye Meng
Zhangli Qilong
Zhou Mu
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2022
Field of study

Joint 2D cardiac segmentation and 3D volume reconstruction are fundamental in building statistical cardiac anatomy models and understanding functional mechanisms from motion patterns. However, due to the low through-plane resolution of cine MR and high inter-subject variance, accurately segmenting cardiac images and reconstructing the 3D volume are challenging. In this study, we propose an end-to-end latent-space-based framework, DeepRecon, that generates multiple clinically essential outcomes, including accurate image segmentation, synthetic high-resolution 3D image, and 3D reconstructed volume. Our method identifies the optimal latent representation of the cine image that contains accurate semantic information for cardiac structures. In particular, our model jointly generates synthetic images with accurate semantic information and segmentation of the cardiac structures using the optimal latent representation. We further explore downstream applications of 3D shape reconstruction and 4D motion pattern adaptation by the different latent-space manipulation strategies. The simultaneously generated high-resolution images present a high interpretable value to assess the cardiac shape and motion. Experimental results demonstrate the effectiveness of our approach on multiple fronts including 2D segmentation, 3D reconstruction, downstream 4D motion pattern adaption performance

Lund University Publications

TransFusion: Multi-view Divergent Fusion for Medical Image Segmentation with Transformers

Author: Chang Qi
Gao Yunhe
Han Ligong
He Xiaoxiao
Liu Di
Metaxas Dimitris
Wen Song
Xia Zhaoyang
Yan Zhennan
Zhangli Qilong
Zhou Mu
Publication venue
Publication date: 16/06/2022
Field of study

Combining information from multi-view images is crucial to improve the performance and robustness of automated methods for disease diagnosis. However, due to the non-alignment characteristics of multi-view images, building correlation and data fusion across views largely remain an open problem. In this study, we present TransFusion, a Transformer-based architecture to merge divergent multi-view imaging information using convolutional layers and powerful attention mechanisms. In particular, the Divergent Fusion Attention (DiFA) module is proposed for rich cross-view context modeling and semantic dependency mining, addressing the critical issue of capturing long-range correlations between unaligned data from different image views. We further propose the Multi-Scale Attention (MSA) to collect global correspondence of multi-scale feature representations. We evaluate TransFusion on the Multi-Disease, Multi-View \& Multi-Center Right Ventricular Segmentation in Cardiac MRI (M\&Ms-2) challenge cohort. TransFusion demonstrates leading performance against the state-of-the-art methods and opens up new perspectives for multi-view imaging integration towards robust medical image segmentation

arXiv.org e-Print Archive

DeepRecon: Joint 2D Cardiac Segmentation and 3D Volume Reconstruction via A Structure-Specific Generative Method

Author: Aref Subhi Al
Axel Leon
Chang Qi
Kanski Mikael
Liu Di
Metaxas Dimitris
Sawalha Khalid
Yan Zhennan
Ye Meng
Zhangli Qilong
Zhou Mu
Publication venue
Publication date: 01/01/2022
Field of study

Joint 2D cardiac segmentation and 3D volume reconstruction are fundamental to building statistical cardiac anatomy models and understanding functional mechanisms from motion patterns. However, due to the low through-plane resolution of cine MR and high inter-subject variance, accurately segmenting cardiac images and reconstructing the 3D volume are challenging. In this study, we propose an end-to-end latent-space-based framework, DeepRecon, that generates multiple clinically essential outcomes, including accurate image segmentation, synthetic high-resolution 3D image, and 3D reconstructed volume. Our method identifies the optimal latent representation of the cine image that contains accurate semantic information for cardiac structures. In particular, our model jointly generates synthetic images with accurate semantic information and segmentation of the cardiac structures using the optimal latent representation. We further explore downstream applications of 3D shape reconstruction and 4D motion pattern adaptation by the different latent-space manipulation strategies.The simultaneously generated high-resolution images present a high interpretable value to assess the cardiac shape and motion.Experimental results demonstrate the effectiveness of our approach on multiple fronts including 2D segmentation, 3D reconstruction, downstream 4D motion pattern adaption performance.Comment: MICCAI202

arXiv.org e-Print Archive

Lund University Publications