412 research outputs found
Domesticating AI in medical diagnosis
We consider the anticipated adoption of Artificial Intelligence (AI) in medical diagnosis. We examine how seemingly compelling claims are tested as AI tools move into real-world settings and discuss how analysts can develop effective understandings in novel and rapidly changing settings.Four case studies highlight the challenges of utilising diagnostic AI tools at differing stages in their innovation journey. Two ‘upstream’ cases seeking to demonstrate the practical applicability of AI and two ‘downstream’ cases focusing on the roll out and scaling of more established applications.We observed an unfolding uncoordinated process of social learning capturing two key moments: i) experiments to create and establish the clinical potential of AI tools; and, ii) attempts to verify their dependability in clinical settings while extending their scale and scope. Health professionals critically appraise tool performance, relying on them selectively where their results can be demonstrably trusted, in a de facto model of responsible use. We note a shift from procuring stand-alone solutions to deploying suites of AI tools through platforms to facilitate adoption and reduce the costs of procurement, implementation and evaluation which impede the viability of stand-alone solutions.New conceptual frameworks and methodological strategies are needed to address the rapid evolution of AI tools as they move from research settings and are deployed in real-world care across multiple settings. We observe how, in this process of deployment, AI tools become ‘domesticated’. We propose longitudinal and multisite `biographical’ investigations of medical AI rather than snapshot studies of emerging technologies that fail to capture change and variation in performance across contexts
DSCA-PSPNet: Dynamic spatial-channel attention pyramid scene parsing network for sugarcane field segmentation in satellite imagery
Sugarcane plays a vital role in many global economies, and its efficient cultivation is critical for sustainable development. A central challenge in sugarcane yield prediction and cultivation management is the precise segmentation of sugarcane fields from satellite imagery. This task is complicated by numerous factors, including varying environmental conditions, scale variability, and spectral similarities between crops and non-crop elements. To address these segmentation challenges, we introduce DSCA-PSPNet, a novel deep learning model with a unique architecture that combines a modified ResNet34 backbone, the Pyramid Scene Parsing Network (PSPNet), and newly proposed Dynamic Squeeze-and-Excitation Context (D-scSE) blocks. Our model effectively adapts to discern the importance of both spatial and channel-wise information, providing superior feature representation for sugarcane fields. We have also created a comprehensive high-resolution satellite imagery dataset from Guangxi’s Fusui County, captured on December 17, 2017, which encompasses a broad spectrum of sugarcane field characteristics and environmental conditions. In comparative studies, DSCA-PSPNet outperforms other state-of-the-art models, achieving an Intersection over Union (IoU) of 87.58%, an accuracy of 92.34%, a precision of 93.80%, a recall of 93.21%, and an F1-Score of 92.38%. Application tests on an RTX 3090 GPU, with input image resolutions of 512 × 512, yielded a prediction time of 4.57ms, a parameter size of 22.57MB, GFLOPs of 11.41, and a memory size of 84.47MB. An ablation study emphasized the vital role of the D-scSE module in enhancing DSCA-PSPNet’s performance. Our contributions in dataset generation and model development open new avenues for tackling the complexities of sugarcane field segmentation, thus contributing to advances in precision agriculture. The source code and dataset will be available on the GitHub repository https://github.com/JulioYuan/DSCA-PSPNet/tree/main
Paste, Inpaint and Harmonize via Denoising: Subject-Driven Image Editing with Pre-Trained Diffusion Model
Text-to-image generative models have attracted rising attention for flexible
image editing via user-specified descriptions. However, text descriptions alone
are not enough to elaborate the details of subjects, often compromising the
subjects' identity or requiring additional per-subject fine-tuning. We
introduce a new framework called \textit{Paste, Inpaint and Harmonize via
Denoising} (PhD), which leverages an exemplar image in addition to text
descriptions to specify user intentions. In the pasting step, an off-the-shelf
segmentation model is employed to identify a user-specified subject within an
exemplar image which is subsequently inserted into a background image to serve
as an initialization capturing both scene context and subject identity in one.
To guarantee the visual coherence of the generated or edited image, we
introduce an inpainting and harmonizing module to guide the pre-trained
diffusion model to seamlessly blend the inserted subject into the scene
naturally. As we keep the pre-trained diffusion model frozen, we preserve its
strong image synthesis ability and text-driven ability, thus achieving
high-quality results and flexible editing with diverse texts. In our
experiments, we apply PhD to both subject-driven image editing tasks and
explore text-driven scene generation given a reference subject. Both
quantitative and qualitative comparisons with baseline methods demonstrate that
our approach achieves state-of-the-art performance in both tasks. More
qualitative results can be found at
\url{https://sites.google.com/view/phd-demo-page}.Comment: 10 pages, 12 figure
A Closer Look at Audio-Visual Semantic Segmentation
Audio-visual segmentation (AVS) is a complex task that involves accurately
segmenting the corresponding sounding object based on audio-visual queries.
Successful audio-visual learning requires two essential components: 1) an
unbiased dataset with high-quality pixel-level multi-class labels, and 2) a
model capable of effectively linking audio information with its corresponding
visual object. However, these two requirements are only partially addressed by
current methods, with training sets containing biased audio-visual data, and
models that generalise poorly beyond this biased training set. In this work, we
propose a new strategy to build cost-effective and relatively unbiased
audio-visual semantic segmentation benchmarks. Our strategy, called Visual
Post-production (VPO), explores the observation that it is not necessary to
have explicit audio-visual pairs extracted from single video sources to build
such benchmarks. We also refine the previously proposed AVSBench to transform
it into the audio-visual semantic segmentation benchmark AVSBench-Single+.
Furthermore, this paper introduces a new pixel-wise audio-visual contrastive
learning method to enable a better generalisation of the model beyond the
training set. We verify the validity of the VPO strategy by showing that
state-of-the-art (SOTA) models trained with datasets built by matching audio
and visual data from different sources or with datasets containing audio and
visual data from the same video source produce almost the same accuracy. Then,
using the proposed VPO benchmarks and AVSBench-Single+, we show that our method
produces more accurate audio-visual semantic segmentation than SOTA models.
Code and dataset will be available
Dual Meta-Learning with Longitudinally Generalized Regularization for One-Shot Brain Tissue Segmentation Across the Human Lifespan
Brain tissue segmentation is essential for neuroscience and clinical studies.
However, segmentation on longitudinal data is challenging due to dynamic brain
changes across the lifespan. Previous researches mainly focus on
self-supervision with regularizations and will lose longitudinal generalization
when fine-tuning on a specific age group. In this paper, we propose a dual
meta-learning paradigm to learn longitudinally consistent representations and
persist when fine-tuning. Specifically, we learn a plug-and-play feature
extractor to extract longitudinal-consistent anatomical representations by
meta-feature learning and a well-initialized task head for fine-tuning by
meta-initialization learning. Besides, two class-aware regularizations are
proposed to encourage longitudinal consistency. Experimental results on the
iSeg2019 and ADNI datasets demonstrate the effectiveness of our method. Our
code is available at https://github.com/ladderlab-xjtu/DuMeta.Comment: ICCV 202
Deep Learning Approaches for Data Augmentation in Medical Imaging: A Review
Deep learning has become a popular tool for medical image analysis, but the
limited availability of training data remains a major challenge, particularly
in the medical field where data acquisition can be costly and subject to
privacy regulations. Data augmentation techniques offer a solution by
artificially increasing the number of training samples, but these techniques
often produce limited and unconvincing results. To address this issue, a
growing number of studies have proposed the use of deep generative models to
generate more realistic and diverse data that conform to the true distribution
of the data. In this review, we focus on three types of deep generative models
for medical image augmentation: variational autoencoders, generative
adversarial networks, and diffusion models. We provide an overview of the
current state of the art in each of these models and discuss their potential
for use in different downstream tasks in medical imaging, including
classification, segmentation, and cross-modal translation. We also evaluate the
strengths and limitations of each model and suggest directions for future
research in this field. Our goal is to provide a comprehensive review about the
use of deep generative models for medical image augmentation and to highlight
the potential of these models for improving the performance of deep learning
algorithms in medical image analysis
OSNet & MNetO: Two Types of General Reconstruction Architectures for Linear Computed Tomography in Multi-Scenarios
Recently, linear computed tomography (LCT) systems have actively attracted
attention. To weaken projection truncation and image the region of interest
(ROI) for LCT, the backprojection filtration (BPF) algorithm is an effective
solution. However, in BPF for LCT, it is difficult to achieve stable interior
reconstruction, and for differentiated backprojection (DBP) images of LCT,
multiple rotation-finite inversion of Hilbert transform (Hilbert
filtering)-inverse rotation operations will blur the image. To satisfy multiple
reconstruction scenarios for LCT, including interior ROI, complete object, and
exterior region beyond field-of-view (FOV), and avoid the rotation operations
of Hilbert filtering, we propose two types of reconstruction architectures. The
first overlays multiple DBP images to obtain a complete DBP image, then uses a
network to learn the overlying Hilbert filtering function, referred to as the
Overlay-Single Network (OSNet). The second uses multiple networks to train
different directional Hilbert filtering models for DBP images of multiple
linear scannings, respectively, and then overlays the reconstructed results,
i.e., Multiple Networks Overlaying (MNetO). In two architectures, we introduce
a Swin Transformer (ST) block to the generator of pix2pixGAN to extract both
local and global features from DBP images at the same time. We investigate two
architectures from different networks, FOV sizes, pixel sizes, number of
projections, geometric magnification, and processing time. Experimental results
show that two architectures can both recover images. OSNet outperforms BPF in
various scenarios. For the different networks, ST-pix2pixGAN is superior to
pix2pixGAN and CycleGAN. MNetO exhibits a few artifacts due to the differences
among the multiple models, but any one of its models is suitable for imaging
the exterior edge in a certain direction.Comment: 13 pages, 13 figure
Weakly supervised medical image segmentation through dense combinations of dense pseudo-l-abels
Annotating a large amount of medical imaging data thoroughly for training purposes can be expensive, particularly for medical image segmentation tasks; whereas obtaining scribbles, a less precise
form of annotation, is more feasible for clinicians. Nevertheless, training semantic segmentation networks with limited-signal supervision remains a technical challenge. In this paper, we present an innovative
scribble-supervised image segmentation via densely ensembling dense
pseudos called Collaborative Hybrid Networks(CHNets), which consists
of groups of CNN- and ViT-based segmentation networks. A simple yet
efficient densely collaboration scheme is introduced to ensemble dense
pseudo label to expand dataset allowing full-signal supervision. Additionally, internal consistency and external consistency training among
networks are proposed to ensure that each network is beneficial to the
other, resulting in a significant improvement. Our experiments on a public MRI benchmark dataset demonstrate that our proposed approach
outperforms other weakly-supervised methods on various metrics
- …