73 research outputs found
Non-Iterative Scribble-Supervised Learning with Pacing Pseudo-Masks for Medical Image Segmentation
Scribble-supervised medical image segmentation tackles the limitation of
sparse masks. Conventional approaches alternate between: labeling pseudo-masks
and optimizing network parameters. However, such iterative two-stage paradigm
is unwieldy and could be trapped in poor local optima since the networks
undesirably regress to the erroneous pseudo-masks. To address these issues, we
propose a non-iterative method where a stream of varying (pacing) pseudo-masks
teach a network via consistency training, named PacingPseudo. Our motivation
lies first in a non-iterative process. Interestingly, it can be achieved
gracefully by a siamese architecture, wherein a stream of pseudo-masks
naturally assimilate a stream of predicted masks during training. Second, we
make the consistency training effective with two necessary designs: (i) entropy
regularization to obtain high-confidence pseudo-masks for effective teaching;
and (ii) distorted augmentations to create discrepancy between the pseudo-mask
and predicted-mask streams for consistency regularization. Third, we devise a
new memory bank mechanism that provides an extra source of ensemble features to
complement scarce labeled pixels. The efficacy of the proposed PacingPseudo is
validated on three public medical image datasets, including the segmentation
tasks of abdominal multi-organs, cardiac structures, and myocardium. Extensive
experiments demonstrate our PacingPseudo improves the baseline by large margins
and consistently outcompetes several previous methods. In some cases, our
PacingPseudo achieves comparable performance with its fully-supervised
counterparts, showing the feasibility of our method for the challenging
scribble-supervised segmentation applications. The code and scribble
annotations will be publicly available.Comment: 12 pages, 8 figure
AI as Extraherics: Fostering Higher-order Thinking Skills in Human-AI Interaction
As artificial intelligence (AI) technologies, including generative AI, continue to evolve, concerns have arisen about over-reliance on AI, which may lead to human deskilling and diminished cognitive engagement. Over-reliance on AI can also lead users to accept information given by AI without performing critical examinations, causing negative consequences, such as misleading users with hallucinated contents. This paper introduces extraheric AI, a human-AI interaction conceptual framework that fosters users\u27 higher-order thinking skills, such as creativity, critical thinking, and problem-solving, during task completion. Unlike existing human-AI interaction designs, which replace or augment human cognition, extraheric AI fosters cognitive engagement by posing questions or providing alternative perspectives to users, rather than direct answers. We discuss interaction strategies, evaluation methods aligned with cognitive load theory and Bloom\u27s taxonomy, and future research directions to ensure that human cognitive skills remain a crucial element in AI-integrated environments, promoting a balanced partnership between humans and AI
Online Streaming Video Super-Resolution with Convolutional Look-Up Table
Online video streaming has fundamental limitations on the transmission
bandwidth and computational capacity and super-resolution is a promising
potential solution. However, applying existing video super-resolution methods
to online streaming is non-trivial. Existing video codecs and streaming
protocols (\eg, WebRTC) dynamically change the video quality both spatially and
temporally, which leads to diverse and dynamic degradations. Furthermore,
online streaming has a strict requirement for latency that most existing
methods are less applicable. As a result, this paper focuses on the rarely
exploited problem setting of online streaming video super resolution. To
facilitate the research on this problem, a new benchmark dataset named
LDV-WebRTC is constructed based on a real-world online streaming system.
Leveraging the new benchmark dataset, we proposed a novel method specifically
for online video streaming, which contains a convolution and Look-Up Table
(LUT) hybrid model to achieve better performance-latency trade-off. To tackle
the changing degradations, we propose a mixture-of-expert-LUT module, where a
set of LUT specialized in different degradations are built and adaptively
combined to handle different degradations. Experiments show our method achieves
720P video SR around 100 FPS, while significantly outperforms existing
LUT-based methods and offers competitive performance compared to efficient
CNN-based methods
Software-Defined Virtual Synchronous Condenser
Synchronous condensers (SCs) play important roles in integrating wind energy
into relatively weak power grids. However, the design of SCs usually depends on
specific application requirements and may not be adaptive enough to the
frequently-changing grid conditions caused by the transition from conventional
to renewable power generation. This paper devises a software-defined virtual
synchronous condenser (SDViSC) method to address the challenges. Our
contributions are fourfold: 1) design of a virtual synchronous condenser (ViSC)
to enable full converter wind turbines to provide built-in SC functionalities;
2) engineering SDViSCs to transfer hardware-based ViSC controllers into
software services, where a Tustin transformation-based software-defined control
algorithm guarantees accurate tracking of fast dynamics under limited
communication bandwidth; 3) a software-defined networking-enhanced SDViSC
communication scheme to allow enhanced communication reliability and reduced
communication bandwidth occupation; and 4) Prototype of SDViSC on our
real-time, cyber-in-the-loop digital twin of large-wind-farm in an RTDS
environment. Extensive test results validate the excellent performance of
SDViSC to support reliable and resilient operations of wind farms under various
physical and cyber conditions
VeCAF: Vision-language Collaborative Active Finetuning with Training Objective Awareness
Finetuning a pretrained vision model (PVM) is a common technique for learning
downstream vision tasks. However, the conventional finetuning process with
randomly sampled data points results in diminished training efficiency. To
address this drawback, we propose a novel approach, Vision-language
Collaborative Active Finetuning (VeCAF). With the emerging availability of
labels and natural language annotations of images through web-scale crawling or
controlled generation, VeCAF makes use of these information to perform
parametric data selection for PVM finetuning. VeCAF incorporates the finetuning
objective to select significant data points that effectively guide the PVM
towards faster convergence to meet the performance goal. This process is
assisted by the inherent semantic richness of the text embedding space which we
use to augment image features. Furthermore, the flexibility of text-domain
augmentation allows VeCAF to handle out-of-distribution scenarios without
external data. Extensive experiments show the leading performance and high
computational efficiency of VeCAF that is superior to baselines in both
in-distribution and out-of-distribution image classification tasks. On
ImageNet, VeCAF uses up to 3.3x less training batches to reach the target
performance compared to full finetuning, and achieves an accuracy improvement
of 2.7% over the state-of-the-art active finetuning method with the same number
of batches.Comment: 13 page
Characterizing the Structural Pattern Predicting Medication Response in Herpes Zoster Patients Using Multivoxel Pattern Analysis
Herpes zoster (HZ) can cause a blistering skin rash with severe neuropathic pain. Pharmacotherapy is the most common treatment for HZ patients. However, most patients are usually the elderly or those that are immunocompromised, and thus often suffer from side effects or easily get intractable post-herpetic neuralgia (PHN) if medication fails. It is challenging for clinicians to tailor treatment to patients, due to the lack of prognosis information on the neurological pathogenesis that underlies HZ. In the current study, we aimed at characterizing the brain structural pattern of HZ before treatment with medication that could help predict medication responses. High-resolution structural magnetic resonance imaging (MRI) scans of 14 right-handed HZ patients (aged 61.0 ± 7.0, 8 males) with poor response and 15 (aged 62.6 ± 8.3, 5 males) age- (p = 0.58), gender-matched (p = 0.20) patients responding well, were acquired and analyzed. Multivoxel pattern analysis (MVPA) with a searchlight algorithm and support vector machine (SVM), was applied to identify the spatial pattern of the gray matter (GM) volume, with high predicting accuracy. The predictive regions, with an accuracy higher than 79%, were located within the cerebellum, posterior insular cortex (pIC), middle and orbital frontal lobes (mFC and OFC), anterior and middle cingulum (ACC and MCC), precuneus (PCu) and cuneus. Among these regions, mFC, pIC and MCC displayed significant increases of GM volumes in patients with poor response, compared to those with a good response. The combination of sMRI and MVPA might be a useful tool to explore the neuroanatomical imaging biomarkers of HZ-related pain associated with medication responses
Recurrent Feature Propagation and Edge Skip-Connections for Automatic Abdominal Organ Segmentation
Automatic segmentation of abdominal organs in computed tomography (CT) images
can support radiation therapy and image-guided surgery workflows. Developing of
such automatic solutions remains challenging mainly owing to complex organ
interactions and blurry boundaries in CT images. To address these issues, we
focus on effective spatial context modeling and explicit edge segmentation
priors. Accordingly, we propose a 3D network with four main components trained
end-to-end including shared encoder, edge detector, decoder with edge
skip-connections (ESCs) and recurrent feature propagation head (RFP-Head). To
capture wide-range spatial dependencies, the RFP-Head propagates and harvests
local features through directed acyclic graphs (DAGs) formulated with recurrent
connections in an efficient slice-wise manner, with regard to spatial
arrangement of image units. To leverage edge information, the edge detector
learns edge prior knowledge specifically tuned for semantic segmentation by
exploiting intermediate features from the encoder with the edge supervision.
The ESCs then aggregate the edge knowledge with multi-level decoder features to
learn a hierarchy of discriminative features explicitly modeling
complementarity between organs' interiors and edges for segmentation. We
conduct extensive experiments on two challenging abdominal CT datasets with
eight annotated organs. Experimental results show that the proposed network
outperforms several state-of-the-art models, especially for the segmentation of
small and complicated structures (gallbladder, esophagus, stomach, pancreas and
duodenum). The code will be publicly available
- …
