53 research outputs found

    Pixel-wise Orthogonal Decomposition for Color Illumination Invariant and Shadow-free Image

    Full text link
    In this paper, we propose a novel, effective and fast method to obtain a color illumination invariant and shadow-free image from a single outdoor image. Different from state-of-the-art methods for shadow-free image that either need shadow detection or statistical learning, we set up a linear equation set for each pixel value vector based on physically-based shadow invariants, deduce a pixel-wise orthogonal decomposition for its solutions, and then get an illumination invariant vector for each pixel value vector on an image. The illumination invariant vector is the unique particular solution of the linear equation set, which is orthogonal to its free solutions. With this illumination invariant vector and Lab color space, we propose an algorithm to generate a shadow-free image which well preserves the texture and color information of the original image. A series of experiments on a diverse set of outdoor images and the comparisons with the state-of-the-art methods validate our method.Comment: This paper has been published in Optics Express, Vol. 23, Issue 3, pp. 2220-2239. The final version is available on http://dx.doi.org/10.1364/OE.23.002220. Please refer to that version when citing this pape

    Dual-domain convolutional neural networks for improving structural information in 3 T MRI

    Get PDF
    We propose a novel dual-domain convolutional neural network framework to improve structural information of routine 3T images. We introduce a parameter-efficient butterfly network that involves two complementary domains: a spatial domain and a frequency domain. The butterfly network allows the interaction of these two domains in learning the complex mapping from 3T to 7T images. We verified the efficacy of the dual-domain strategy and butterfly network using 3T and 7T image pairs. Experimental results demonstrate that the proposed framework generates synthetic 7T-like images and achieves performance superior to state-of-the-art methods

    Exploring Self- and Cross-Triplet Correlations for Human-Object Interaction Detection

    Full text link
    Human-Object Interaction (HOI) detection plays a vital role in scene understanding, which aims to predict the HOI triplet in the form of <human, object, action>. Existing methods mainly extract multi-modal features (e.g., appearance, object semantics, human pose) and then fuse them together to directly predict HOI triplets. However, most of these methods focus on seeking for self-triplet aggregation, but ignore the potential cross-triplet dependencies, resulting in ambiguity of action prediction. In this work, we propose to explore Self- and Cross-Triplet Correlations (SCTC) for HOI detection. Specifically, we regard each triplet proposal as a graph where Human, Object represent nodes and Action indicates edge, to aggregate self-triplet correlation. Also, we try to explore cross-triplet dependencies by jointly considering instance-level, semantic-level, and layout-level relations. Besides, we leverage the CLIP model to assist our SCTC obtain interaction-aware feature by knowledge distillation, which provides useful action clues for HOI detection. Extensive experiments on HICO-DET and V-COCO datasets verify the effectiveness of our proposed SCTC

    Residual Denoising Diffusion Models

    Full text link
    We propose residual denoising diffusion models (RDDM), a novel dual diffusion process that decouples the traditional single denoising diffusion process into residual diffusion and noise diffusion. This dual diffusion framework expands the denoising-based diffusion models, initially uninterpretable for image restoration, into a unified and interpretable model for both image generation and restoration by introducing residuals. Specifically, our residual diffusion represents directional diffusion from the target image to the degraded input image and explicitly guides the reverse generation process for image restoration, while noise diffusion represents random perturbations in the diffusion process. The residual prioritizes certainty, while the noise emphasizes diversity, enabling RDDM to effectively unify tasks with varying certainty or diversity requirements, such as image generation and restoration. We demonstrate that our sampling process is consistent with that of DDPM and DDIM through coefficient transformation, and propose a partially path-independent generation process to better understand the reverse process. Notably, our RDDM enables a generic UNet, trained with only an â„“1\ell _1 loss and a batch size of 1, to compete with state-of-the-art image restoration methods. We provide code and pre-trained models to encourage further exploration, application, and development of our innovative framework (https://github.com/nachifur/RDDM)

    Unleashing the Potential of SAM for Medical Adaptation via Hierarchical Decoding

    Full text link
    The Segment Anything Model (SAM) has garnered significant attention for its versatile segmentation abilities and intuitive prompt-based interface. However, its application in medical imaging presents challenges, requiring either substantial training costs and extensive medical datasets for full model fine-tuning or high-quality prompts for optimal performance. This paper introduces H-SAM: a prompt-free adaptation of SAM tailored for efficient fine-tuning of medical images via a two-stage hierarchical decoding procedure. In the initial stage, H-SAM employs SAM's original decoder to generate a prior probabilistic mask, guiding a more intricate decoding process in the second stage. Specifically, we propose two key designs: 1) A class-balanced, mask-guided self-attention mechanism addressing the unbalanced label distribution, enhancing image embedding; 2) A learnable mask cross-attention mechanism spatially modulating the interplay among different image regions based on the prior mask. Moreover, the inclusion of a hierarchical pixel decoder in H-SAM enhances its proficiency in capturing fine-grained and localized details. This approach enables SAM to effectively integrate learned medical priors, facilitating enhanced adaptation for medical image segmentation with limited samples. Our H-SAM demonstrates a 4.78% improvement in average Dice compared to existing prompt-free SAM variants for multi-organ segmentation using only 10% of 2D slices. Notably, without using any unlabeled data, H-SAM even outperforms state-of-the-art semi-supervised models relying on extensive unlabeled training data across various medical datasets. Our code is available at https://github.com/Cccccczh404/H-SAM.Comment: CVPR 202

    A New Intrinsic-Lighting Color Space for Daytime Outdoor Images

    Full text link

    Iterative Label Denoising Network: Segmenting Male Pelvic Organs in CT From 3D Bounding Box Annotations

    Get PDF
    Obtaining accurate segmentation of the prostate and nearby organs at risk (e.g., bladder and rectum) in CT images is critical for radiotherapy of prostate cancer. Currently, the leading automatic segmentation algorithms are based on Fully Convolutional Networks (FCNs), which achieve remarkable performance but usually need large-scale datasets with high-quality voxel-wise annotations for full supervision of the training. Unfortunately, such annotations are difficult to acquire, which becomes a bottleneck to build accurate segmentation models in real clinical applications. In this paper, we propose a novel weakly supervised segmentation approach that only needs 3D bounding box annotations covering the organs of interest to start the training. Obviously, the bounding box includes many non-organ voxels that carry noisy labels to mislead the segmentation model. To this end, we propose the label denoising module and embed it into the iterative training scheme of the label denoising network (LDnet) for segmentation. The labels of the training voxels are predicted by the tentative LDnet, while the label denoising module identifies the voxels with unreliable labels. As only the good training voxels are preserved, the iteratively re-trained LDnet can refine its segmentation capability gradually. Our results are remarkable, i.e., reaching ~ 94% (prostate), ~ 91% (bladder), and ~ 86% (rectum) of the Dice Similarity Coefficients (DSCs), compared to the case of fully supervised learning upon high-quality voxel-wise annotations and also superior to several state-of-the-art approaches. To our best knowledge, this is the first work to achieve voxel-wise segmentation in CT images from simple 3D bounding box annotations, which can greatly reduce many labeling efforts and meet the demands of the practical clinical applications

    FedConv: Enhancing Convolutional Neural Networks for Handling Data Heterogeneity in Federated Learning

    Full text link
    Federated learning (FL) is an emerging paradigm in machine learning, where a shared model is collaboratively learned using data from multiple devices to mitigate the risk of data leakage. While recent studies posit that Vision Transformer (ViT) outperforms Convolutional Neural Networks (CNNs) in addressing data heterogeneity in FL, the specific architectural components that underpin this advantage have yet to be elucidated. In this paper, we systematically investigate the impact of different architectural elements, such as activation functions and normalization layers, on the performance within heterogeneous FL. Through rigorous empirical analyses, we are able to offer the first-of-its-kind general guidance on micro-architecture design principles for heterogeneous FL. Intriguingly, our findings indicate that with strategic architectural modifications, pure CNNs can achieve a level of robustness that either matches or even exceeds that of ViTs when handling heterogeneous data clients in FL. Additionally, our approach is compatible with existing FL techniques and delivers state-of-the-art solutions across a broad spectrum of FL benchmarks. The code is publicly available at https://github.com/UCSC-VLAA/FedConvComment: 9 pages, 6 figures. Equal contribution by P. Xu and Z. Wan

    CT Male Pelvic Organ Segmentation via Hybrid Loss Network With Incomplete Annotation

    Get PDF
    Sufficient data with complete annotation is essential for training deep models to perform automatic and accurate segmentation of CT male pelvic organs, especially when such data is with great challenges such as low contrast and large shape variation. However, manual annotation is expensive in terms of both finance and human effort, which usually results in insufficient completely annotated data in real applications. To this end, we propose a novel deep framework to segment male pelvic organs in CT images with incomplete annotation delineated in a very user-friendly manner. Specifically, we design a hybrid loss network derived from both voxel classification and boundary regression, to jointly improve the organ segmentation performance in an iterative way. Moreover, we introduce a label completion strategy to complete the labels of the rich unannotated voxels and then embed them into the training data to enhance the model capability. To reduce the computation complexity and improve segmentation performance, we locate the pelvic region based on salient bone structures to focus on the candidate segmentation organs. Experimental results on a large planning CT pelvic organ dataset show that our proposed method with incomplete annotation achieves comparable segmentation performance to the state-of-the-art methods with complete annotation. Moreover, our proposed method requires much less effort of manual contouring from medical professionals such that an institutional specific model can be more easily established
    • …
    corecore