53 research outputs found
Pixel-wise Orthogonal Decomposition for Color Illumination Invariant and Shadow-free Image
In this paper, we propose a novel, effective and fast method to obtain a
color illumination invariant and shadow-free image from a single outdoor image.
Different from state-of-the-art methods for shadow-free image that either need
shadow detection or statistical learning, we set up a linear equation set for
each pixel value vector based on physically-based shadow invariants, deduce a
pixel-wise orthogonal decomposition for its solutions, and then get an
illumination invariant vector for each pixel value vector on an image. The
illumination invariant vector is the unique particular solution of the linear
equation set, which is orthogonal to its free solutions. With this illumination
invariant vector and Lab color space, we propose an algorithm to generate a
shadow-free image which well preserves the texture and color information of the
original image. A series of experiments on a diverse set of outdoor images and
the comparisons with the state-of-the-art methods validate our method.Comment: This paper has been published in Optics Express, Vol. 23, Issue 3,
pp. 2220-2239. The final version is available on
http://dx.doi.org/10.1364/OE.23.002220. Please refer to that version when
citing this pape
Dual-domain convolutional neural networks for improving structural information in 3 T MRI
We propose a novel dual-domain convolutional neural network framework to improve structural information of routine 3T images. We introduce a parameter-efficient butterfly network that involves two complementary domains: a spatial domain and a frequency domain. The butterfly network allows the interaction of these two domains in learning the complex mapping from 3T to 7T images. We verified the efficacy of the dual-domain strategy and butterfly network using 3T and 7T image pairs. Experimental results demonstrate that the proposed framework generates synthetic 7T-like images and achieves performance superior to state-of-the-art methods
Exploring Self- and Cross-Triplet Correlations for Human-Object Interaction Detection
Human-Object Interaction (HOI) detection plays a vital role in scene
understanding, which aims to predict the HOI triplet in the form of <human,
object, action>. Existing methods mainly extract multi-modal features (e.g.,
appearance, object semantics, human pose) and then fuse them together to
directly predict HOI triplets. However, most of these methods focus on seeking
for self-triplet aggregation, but ignore the potential cross-triplet
dependencies, resulting in ambiguity of action prediction. In this work, we
propose to explore Self- and Cross-Triplet Correlations (SCTC) for HOI
detection. Specifically, we regard each triplet proposal as a graph where
Human, Object represent nodes and Action indicates edge, to aggregate
self-triplet correlation. Also, we try to explore cross-triplet dependencies by
jointly considering instance-level, semantic-level, and layout-level relations.
Besides, we leverage the CLIP model to assist our SCTC obtain interaction-aware
feature by knowledge distillation, which provides useful action clues for HOI
detection. Extensive experiments on HICO-DET and V-COCO datasets verify the
effectiveness of our proposed SCTC
Residual Denoising Diffusion Models
We propose residual denoising diffusion models (RDDM), a novel dual diffusion
process that decouples the traditional single denoising diffusion process into
residual diffusion and noise diffusion. This dual diffusion framework expands
the denoising-based diffusion models, initially uninterpretable for image
restoration, into a unified and interpretable model for both image generation
and restoration by introducing residuals. Specifically, our residual diffusion
represents directional diffusion from the target image to the degraded input
image and explicitly guides the reverse generation process for image
restoration, while noise diffusion represents random perturbations in the
diffusion process. The residual prioritizes certainty, while the noise
emphasizes diversity, enabling RDDM to effectively unify tasks with varying
certainty or diversity requirements, such as image generation and restoration.
We demonstrate that our sampling process is consistent with that of DDPM and
DDIM through coefficient transformation, and propose a partially
path-independent generation process to better understand the reverse process.
Notably, our RDDM enables a generic UNet, trained with only an loss
and a batch size of 1, to compete with state-of-the-art image restoration
methods. We provide code and pre-trained models to encourage further
exploration, application, and development of our innovative framework
(https://github.com/nachifur/RDDM)
Unleashing the Potential of SAM for Medical Adaptation via Hierarchical Decoding
The Segment Anything Model (SAM) has garnered significant attention for its
versatile segmentation abilities and intuitive prompt-based interface. However,
its application in medical imaging presents challenges, requiring either
substantial training costs and extensive medical datasets for full model
fine-tuning or high-quality prompts for optimal performance. This paper
introduces H-SAM: a prompt-free adaptation of SAM tailored for efficient
fine-tuning of medical images via a two-stage hierarchical decoding procedure.
In the initial stage, H-SAM employs SAM's original decoder to generate a prior
probabilistic mask, guiding a more intricate decoding process in the second
stage. Specifically, we propose two key designs: 1) A class-balanced,
mask-guided self-attention mechanism addressing the unbalanced label
distribution, enhancing image embedding; 2) A learnable mask cross-attention
mechanism spatially modulating the interplay among different image regions
based on the prior mask. Moreover, the inclusion of a hierarchical pixel
decoder in H-SAM enhances its proficiency in capturing fine-grained and
localized details. This approach enables SAM to effectively integrate learned
medical priors, facilitating enhanced adaptation for medical image segmentation
with limited samples. Our H-SAM demonstrates a 4.78% improvement in average
Dice compared to existing prompt-free SAM variants for multi-organ segmentation
using only 10% of 2D slices. Notably, without using any unlabeled data, H-SAM
even outperforms state-of-the-art semi-supervised models relying on extensive
unlabeled training data across various medical datasets. Our code is available
at https://github.com/Cccccczh404/H-SAM.Comment: CVPR 202
Iterative Label Denoising Network: Segmenting Male Pelvic Organs in CT From 3D Bounding Box Annotations
Obtaining accurate segmentation of the prostate and nearby organs at risk (e.g., bladder and rectum) in CT images is critical for radiotherapy of prostate cancer. Currently, the leading automatic segmentation algorithms are based on Fully Convolutional Networks (FCNs), which achieve remarkable performance but usually need large-scale datasets with high-quality voxel-wise annotations for full supervision of the training. Unfortunately, such annotations are difficult to acquire, which becomes a bottleneck to build accurate segmentation models in real clinical applications. In this paper, we propose a novel weakly supervised segmentation approach that only needs 3D bounding box annotations covering the organs of interest to start the training. Obviously, the bounding box includes many non-organ voxels that carry noisy labels to mislead the segmentation model. To this end, we propose the label denoising module and embed it into the iterative training scheme of the label denoising network (LDnet) for segmentation. The labels of the training voxels are predicted by the tentative LDnet, while the label denoising module identifies the voxels with unreliable labels. As only the good training voxels are preserved, the iteratively re-trained LDnet can refine its segmentation capability gradually. Our results are remarkable, i.e., reaching ~ 94% (prostate), ~ 91% (bladder), and ~ 86% (rectum) of the Dice Similarity Coefficients (DSCs), compared to the case of fully supervised learning upon high-quality voxel-wise annotations and also superior to several state-of-the-art approaches. To our best knowledge, this is the first work to achieve voxel-wise segmentation in CT images from simple 3D bounding box annotations, which can greatly reduce many labeling efforts and meet the demands of the practical clinical applications
FedConv: Enhancing Convolutional Neural Networks for Handling Data Heterogeneity in Federated Learning
Federated learning (FL) is an emerging paradigm in machine learning, where a
shared model is collaboratively learned using data from multiple devices to
mitigate the risk of data leakage. While recent studies posit that Vision
Transformer (ViT) outperforms Convolutional Neural Networks (CNNs) in
addressing data heterogeneity in FL, the specific architectural components that
underpin this advantage have yet to be elucidated. In this paper, we
systematically investigate the impact of different architectural elements, such
as activation functions and normalization layers, on the performance within
heterogeneous FL. Through rigorous empirical analyses, we are able to offer the
first-of-its-kind general guidance on micro-architecture design principles for
heterogeneous FL.
Intriguingly, our findings indicate that with strategic architectural
modifications, pure CNNs can achieve a level of robustness that either matches
or even exceeds that of ViTs when handling heterogeneous data clients in FL.
Additionally, our approach is compatible with existing FL techniques and
delivers state-of-the-art solutions across a broad spectrum of FL benchmarks.
The code is publicly available at https://github.com/UCSC-VLAA/FedConvComment: 9 pages, 6 figures. Equal contribution by P. Xu and Z. Wan
CT Male Pelvic Organ Segmentation via Hybrid Loss Network With Incomplete Annotation
Sufficient data with complete annotation is essential for training deep models to perform automatic and accurate segmentation of CT male pelvic organs, especially when such data is with great challenges such as low contrast and large shape variation. However, manual annotation is expensive in terms of both finance and human effort, which usually results in insufficient completely annotated data in real applications. To this end, we propose a novel deep framework to segment male pelvic organs in CT images with incomplete annotation delineated in a very user-friendly manner. Specifically, we design a hybrid loss network derived from both voxel classification and boundary regression, to jointly improve the organ segmentation performance in an iterative way. Moreover, we introduce a label completion strategy to complete the labels of the rich unannotated voxels and then embed them into the training data to enhance the model capability. To reduce the computation complexity and improve segmentation performance, we locate the pelvic region based on salient bone structures to focus on the candidate segmentation organs. Experimental results on a large planning CT pelvic organ dataset show that our proposed method with incomplete annotation achieves comparable segmentation performance to the state-of-the-art methods with complete annotation. Moreover, our proposed method requires much less effort of manual contouring from medical professionals such that an institutional specific model can be more easily established
- …