96 research outputs found
Full-length-body CBCT imaging in upright position with robotic-arm system: a simulation study
Upright position CT scans make it possible for full-length-body imaging at
conditions more relevant to daily situations, but the substantial weight of the
upright CT scanners increases the risks to floor's stability and
patients'safety. Robotic-arm CBCT systems are supposed to be a better solution
for this task, but such systems still face challenges including long scanning
time and low reconstruction quality. To address the above challenges, this
paper proposes a novel method to calculate optimal scanning pitch based on data
completeness analysis, which can complete the whole-body scan in the shortest
time without a significant decline in image quality. Besides, an FDK-style
reconstruction method based on normalized projections is proposed to obtain
fast image reconstruction. Extensive experiments prove the effectiveness of the
proposed optimal scanning trajectory. Qualitative and quantitative comparisons
with FDK and iterative algorithms show that the proposed reconstruction method
can obtain high imaging quality with reasonable computation costs. The method
proposed in this paper is expected to promote the application of robotic-arm
CBCT systems in orthopedic functional analysis.Comment: Submitted to ISBI'2
Turbo Learning Framework for Human-Object Interactions Recognition and Human Pose Estimation
Human-object interactions (HOI) recognition and pose estimation are two
closely related tasks. Human pose is an essential cue for recognizing actions
and localizing the interacted objects. Meanwhile, human action and their
interacted objects' localizations provide guidance for pose estimation. In this
paper, we propose a turbo learning framework to perform HOI recognition and
pose estimation simultaneously. First, two modules are designed to enforce
message passing between the tasks, i.e. pose aware HOI recognition module and
HOI guided pose estimation module. Then, these two modules form a closed loop
to utilize the complementary information iteratively, which can be trained in
an end-to-end manner. The proposed method achieves the state-of-the-art
performance on two public benchmarks including Verbs in COCO (V-COCO) and
HICO-DET datasets.Comment: AAAI201
Frozen CLIP Model is An Efficient Point Cloud Backbone
The pretraining-finetuning paradigm has demonstrated great success in NLP and
2D image fields because of the high-quality representation ability and
transferability of their pretrained models. However, pretraining such a strong
model is difficult in the 3D point cloud field since the training data is
limited and point cloud collection is expensive. This paper introduces
Efficient Point Cloud Learning (EPCL), an effective and efficient point cloud
learner for directly training high-quality point cloud models with a frozen
CLIP model. Our EPCL connects the 2D and 3D modalities by semantically aligning
the 2D features and point cloud features without paired 2D-3D data.
Specifically, the input point cloud is divided into a sequence of tokens and
directly fed into the frozen CLIP model to learn point cloud representation.
Furthermore, we design a task token to narrow the gap between 2D images and 3D
point clouds. Comprehensive experiments on 3D detection, semantic segmentation,
classification and few-shot learning demonstrate that the 2D CLIP model can be
an efficient point cloud backbone and our method achieves state-of-the-art
accuracy on both real-world and synthetic downstream tasks. Code will be
available.Comment: Technical repor
Two-Stage Hybrid Supervision Framework for Fast, Low-resource, and Accurate Organ and Pan-cancer Segmentation in Abdomen CT
Abdominal organ and tumour segmentation has many important clinical
applications, such as organ quantification, surgical planning, and disease
diagnosis. However, manual assessment is inherently subjective with
considerable inter- and intra-expert variability. In the paper, we propose a
hybrid supervised framework, StMt, that integrates self-training and mean
teacher for the segmentation of abdominal organs and tumors using partially
labeled and unlabeled data. We introduce a two-stage segmentation pipeline and
whole-volume-based input strategy to maximize segmentation accuracy while
meeting the requirements of inference time and GPU memory usage. Experiments on
the validation set of FLARE2023 demonstrate that our method achieves excellent
segmentation performance as well as fast and low-resource model inference. Our
method achieved an average DSC score of 89.79\% and 45.55 \% for the organs and
lesions on the validation set and the average running time and area under GPU
memory-time cure are 11.25s and 9627.82MB, respectively
Federated Learning Algorithms for Generalized Mixed-effects Model (GLMM) on Horizontally Partitioned Data from Distributed Sources
Objectives: This paper develops two algorithms to achieve federated
generalized linear mixed effect models (GLMM), and compares the developed
model's outcomes with each other, as well as that from the standard R package
(`lme4').
Methods: The log-likelihood function of GLMM is approximated by two numerical
methods (Laplace approximation and Gaussian Hermite approximation), which
supports federated decomposition of GLMM to bring computation to data.
Results: Our developed method can handle GLMM to accommodate hierarchical
data with multiple non-independent levels of observations in a federated
setting. The experiment results demonstrate comparable (Laplace) and superior
(Gaussian-Hermite) performances with simulated and real-world data.
Conclusion: We developed and compared federated GLMMs with different
approximations, which can support researchers in analyzing biomedical data to
accommodate mixed effects and address non-independence due to hierarchical
structures (i.e., institutes, region, country, etc.).Comment: 19 pages, 5 figures, submitted to Journal of Biomedical Informatic
Adapter Learning in Pretrained Feature Extractor for Continual Learning of Diseases
Currently intelligent diagnosis systems lack the ability of continually
learning to diagnose new diseases once deployed, under the condition of
preserving old disease knowledge. In particular, updating an intelligent
diagnosis system with training data of new diseases would cause catastrophic
forgetting of old disease knowledge. To address the catastrophic forgetting
issue, a novel adapter-based strategy is proposed to help effectively learn a
set of new diseases at each round (or task) of continual learning, without
changing the shared feature extractor. The learnable lightweight task-specific
adapter(s) can be flexibly designed (e.g., two convolutional layers) and then
added to the pretrained and fixed feature extractor. Together with a specially
designed task-specific head which absorbs all previously learned old diseases
as a single 'out-of-distribution' category, task-specific adapter(s) can help
the pretrained feature extractor more effectively extract discriminative
features between diseases. In addition, a simple yet effective fine-tuning is
applied to collaboratively fine-tune multiple task-specific heads such that
outputs from different heads are comparable and consequently the appropriate
classifier head can be more accurately selected during model inference.
Extensive empirical evaluations on three image datasets demonstrate the
superior performance of the proposed method in continual learning of new
diseases. The source code will be released publicly.Comment: 10 page
Semantic-aware Node Synthesis for Imbalanced Heterogeneous Information Networks
Heterogeneous graph neural networks (HGNNs) have exhibited exceptional
efficacy in modeling the complex heterogeneity in heterogeneous information
networks (HINs). The critical advantage of HGNNs is their ability to handle
diverse node and edge types in HINs by extracting and utilizing the abundant
semantic information for effective representation learning. However, as a
widespread phenomenon in many real-world scenarios, the class-imbalance
distribution in HINs creates a performance bottleneck for existing HGNNs. Apart
from the quantity imbalance of nodes, another more crucial and distinctive
challenge in HINs is semantic imbalance. Minority classes in HINs often lack
diverse and sufficient neighbor nodes, resulting in biased and incomplete
semantic information. This semantic imbalance further compounds the difficulty
of accurately classifying minority nodes, leading to the performance
degradation of HGNNs. To tackle the imbalance of minority classes and
supplement their inadequate semantics, we present the first method for the
semantic imbalance problem in imbalanced HINs named Semantic-aware Node
Synthesis (SNS). By assessing the influence on minority classes, SNS adaptively
selects the heterogeneous neighbor nodes and augments the network with
synthetic nodes while preserving the minority semantics. In addition, we
introduce two regularization approaches for HGNNs that constrain the
representation of synthetic nodes from both semantic and class perspectives to
effectively suppress the potential noises from synthetic nodes, facilitating
more expressive embeddings for classification. The comprehensive experimental
study demonstrates that SNS consistently outperforms existing methods by a
large margin in different benchmark datasets
PHTrans: Parallelly Aggregating Global and Local Representations for Medical Image Segmentation
The success of Transformer in computer vision has attracted increasing
attention in the medical imaging community. Especially for medical image
segmentation, many excellent hybrid architectures based on convolutional neural
networks (CNNs) and Transformer have been presented and achieve impressive
performance. However, most of these methods, which embed modular Transformer
into CNNs, struggle to reach their full potential. In this paper, we propose a
novel hybrid architecture for medical image segmentation called PHTrans, which
parallelly hybridizes Transformer and CNN in main building blocks to produce
hierarchical representations from global and local features and adaptively
aggregate them, aiming to fully exploit their strengths to obtain better
segmentation performance. Specifically, PHTrans follows the U-shaped
encoder-decoder design and introduces the parallel hybird module in deep
stages, where convolution blocks and the modified 3D Swin Transformer learn
local features and global dependencies separately, then a sequence-to-volume
operation unifies the dimensions of the outputs to achieve feature aggregation.
Extensive experimental results on both Multi-Atlas Labeling Beyond the Cranial
Vault and Automated Cardiac Diagnosis Challeng datasets corroborate its
effectiveness, consistently outperforming state-of-the-art methods. The code is
available at: https://github.com/lseventeen/PHTrans.Comment: 10 pages, 3 figure
PLGSLAM: Progressive Neural Scene Represenation with Local to Global Bundle Adjustment
Neural implicit scene representations have recently shown encouraging results
in dense visual SLAM. However, existing methods produce low-quality scene
reconstruction and low-accuracy localization performance when scaling up to
large indoor scenes and long sequences. These limitations are mainly due to
their single, global radiance field with finite capacity, which does not adapt
to large scenarios. Their end-to-end pose networks are also not robust enough
with the growth of cumulative errors in large scenes. To this end, we introduce
PLGSLAM, a neural visual SLAM system capable of high-fidelity surface
reconstruction and robust camera tracking in real-time. To handle large-scale
indoor scenes, PLGSLAM proposes a progressive scene representation method which
dynamically allocates new local scene representation trained with frames within
a local sliding window. This allows us to scale up to larger indoor scenes and
improves robustness (even under pose drifts). In local scene representation,
PLGSLAM utilizes tri-planes for local high-frequency features with multi-layer
perceptron (MLP) networks for the low-frequency feature, achieving smoothness
and scene completion in unobserved areas. Moreover, we propose local-to-global
bundle adjustment method with a global keyframe database to address the
increased pose drifts on long sequences. Experimental results demonstrate that
PLGSLAM achieves state-of-the-art scene reconstruction results and tracking
performance across various datasets and scenarios (both in small and
large-scale indoor environments).Comment: Accepted by CVPR 202
Image-Guided Autonomous Guidewire Navigation in Robot-Assisted Endovascular Interventions using Reinforcement Learning
Autonomous robots in endovascular interventions possess the potential to
navigate guidewires with safety and reliability, while reducing human error and
shortening surgical time. However, current methods of guidewire navigation
based on Reinforcement Learning (RL) depend on manual demonstration data or
magnetic guidance. In this work, we propose an Image-guided Autonomous
Guidewire Navigation (IAGN) method. Specifically, we introduce BDA-star, a path
planning algorithm with boundary distance constraints, for the trajectory
planning of guidewire navigation. We established an IAGN-RL environment where
the observations are real-time guidewire feeding images highlighting the
position of the guidewire tip and the planned path. We proposed a reward
function based on the distances from both the guidewire tip to the planned path
and the target to evaluate the agent's actions. Furthermore, in policy network,
we employ a pre-trained convolutional neural network to extract features,
mitigating stability issues and slow convergence rates associated with direct
learning from raw pixels. Experiments conducted on the aortic simulation IAGN
platform demonstrated that the proposed method, targeting the left subclavian
artery and the brachiocephalic artery, achieved a 100% guidewire navigation
success rate, along with reduced movement and retraction distances and
trajectories tend to the center of the vessels
- …