102 research outputs found
Can Semantic-Phonemic Discrepancy in Verbal Fluency Help to Detect Alzheimer’s Dementia?
Word retrieval difficulty is one of the early signs of Alzheimer’s disease, although such difficulties can also occur in typically aging. Therefore, it is necessary to find a task that differentiates the early stages of Alzheimer’s dementia from typically aging. Verbal fluency is a widely used measure to assess subjects’ cognitive processes following neurological damage, and often includes two subtests: semantic fluency, in which participants are asked to produce words which meet a semantic criterion, such as food or animals; and letter fluency, which requires participants to produce words starting with a certain letter, such as F or S. People with Alzheimer’s disease have more difficulty with semantic than letter fluency, although this pattern has also been shown in typically aging. In the current research, we investigate whether the semantic-letter discrepancy can differentiate Alzheimer’s dementia from typically aging
Image Quality-aware Diagnosis via Meta-knowledge Co-embedding
Medical images usually suffer from image degradation in clinical practice,
leading to decreased performance of deep learning-based models. To resolve this
problem, most previous works have focused on filtering out degradation-causing
low-quality images while ignoring their potential value for models. Through
effectively learning and leveraging the knowledge of degradations, models can
better resist their adverse effects and avoid misdiagnosis. In this paper, we
raise the problem of image quality-aware diagnosis, which aims to take
advantage of low-quality images and image quality labels to achieve a more
accurate and robust diagnosis. However, the diversity of degradations and
superficially unrelated targets between image quality assessment and disease
diagnosis makes it still quite challenging to effectively leverage quality
labels to assist diagnosis. Thus, to tackle these issues, we propose a novel
meta-knowledge co-embedding network, consisting of two subnets: Task Net and
Meta Learner. Task Net constructs an explicit quality information utilization
mechanism to enhance diagnosis via knowledge co-embedding features, while Meta
Learner ensures the effectiveness and constrains the semantics of these
features via meta-learning and joint-encoding masking. Superior performance on
five datasets with four widely-used medical imaging modalities demonstrates the
effectiveness and generalizability of our method.Comment: Accepted by CVPR 202
Learning Robust Representation for Joint Grading of Ophthalmic Diseases via Adaptive Curriculum and Feature Disentanglement
Diabetic retinopathy (DR) and diabetic macular edema (DME) are leading causes
of permanent blindness worldwide. Designing an automatic grading system with
good generalization ability for DR and DME is vital in clinical practice.
However, prior works either grade DR or DME independently, without considering
internal correlations between them, or grade them jointly by shared feature
representation, yet ignoring potential generalization issues caused by
difficult samples and data bias. Aiming to address these problems, we propose a
framework for joint grading with the dynamic difficulty-aware weighted loss
(DAW) and the dual-stream disentangled learning architecture (DETACH). Inspired
by curriculum learning, DAW learns from simple samples to difficult samples
dynamically via measuring difficulty adaptively. DETACH separates features of
grading tasks to avoid potential emphasis on the bias. With the addition of DAW
and DETACH, the model learns robust disentangled feature representations to
explore internal correlations between DR and DME and achieve better grading
performance. Experiments on three benchmarks show the effectiveness and
robustness of our framework under both the intra-dataset and cross-dataset
tests.Comment: Accepted by MICCAI2
Towards Generalizable Diabetic Retinopathy Grading in Unseen Domains
Diabetic Retinopathy (DR) is a common complication of diabetes and a leading
cause of blindness worldwide. Early and accurate grading of its severity is
crucial for disease management. Although deep learning has shown great
potential for automated DR grading, its real-world deployment is still
challenging due to distribution shifts among source and target domains, known
as the domain generalization problem. Existing works have mainly attributed the
performance degradation to limited domain shifts caused by simple visual
discrepancies, which cannot handle complex real-world scenarios. Instead, we
present preliminary evidence suggesting the existence of three-fold
generalization issues: visual and degradation style shifts, diagnostic pattern
diversity, and data imbalance. To tackle these issues, we propose a novel
unified framework named Generalizable Diabetic Retinopathy Grading Network
(GDRNet). GDRNet consists of three vital components: fundus visual-artifact
augmentation (FundusAug), dynamic hybrid-supervised loss (DahLoss), and
domain-class-aware re-balancing (DCR). FundusAug generates realistic augmented
images via visual transformation and image degradation, while DahLoss jointly
leverages pixel-level consistency and image-level semantics to capture the
diverse diagnostic patterns and build generalizable feature representations.
Moreover, DCR mitigates the data imbalance from a domain-class view and avoids
undesired over-emphasis on rare domain-class pairs. Finally, we design a
publicly available benchmark for fair evaluations. Extensive comparison
experiments against advanced methods and exhaustive ablation studies
demonstrate the effectiveness and generalization ability of GDRNet.Comment: Earyly Accepted by MICCAI 2023, the 26th International Conference on
Medical Image Computing and Computer Assisted Interventio
SceneTex: High-Quality Texture Synthesis for Indoor Scenes via Diffusion Priors
We propose SceneTex, a novel method for effectively generating high-quality
and style-consistent textures for indoor scenes using depth-to-image diffusion
priors. Unlike previous methods that either iteratively warp 2D views onto a
mesh surface or distillate diffusion latent features without accurate geometric
and style cues, SceneTex formulates the texture synthesis task as an
optimization problem in the RGB space where style and geometry consistency are
properly reflected. At its core, SceneTex proposes a multiresolution texture
field to implicitly encode the mesh appearance. We optimize the target texture
via a score-distillation-based objective function in respective RGB renderings.
To further secure the style consistency across views, we introduce a
cross-attention decoder to predict the RGB values by cross-attending to the
pre-sampled reference locations in each instance. SceneTex enables various and
accurate texture synthesis for 3D-FRONT scenes, demonstrating significant
improvements in visual quality and prompt fidelity over the prior texture
generation methods.Comment: Project website: https://daveredrum.github.io/SceneTex
Learning Gradient Fields for Scalable and Generalizable Irregular Packing
The packing problem, also known as cutting or nesting, has diverse
applications in logistics, manufacturing, layout design, and atlas generation.
It involves arranging irregularly shaped pieces to minimize waste while
avoiding overlap. Recent advances in machine learning, particularly
reinforcement learning, have shown promise in addressing the packing problem.
In this work, we delve deeper into a novel machine learning-based approach that
formulates the packing problem as conditional generative modeling. To tackle
the challenges of irregular packing, including object validity constraints and
collision avoidance, our method employs the score-based diffusion model to
learn a series of gradient fields. These gradient fields encode the
correlations between constraint satisfaction and the spatial relationships of
polygons, learned from teacher examples. During the testing phase, packing
solutions are generated using a coarse-to-fine refinement mechanism guided by
the learned gradient fields. To enhance packing feasibility and optimality, we
introduce two key architectural designs: multi-scale feature extraction and
coarse-to-fine relation extraction. We conduct experiments on two typical
industrial packing domains, considering translations only. Empirically, our
approach demonstrates spatial utilization rates comparable to, or even
surpassing, those achieved by the teacher algorithm responsible for training
data generation. Additionally, it exhibits some level of generalization to
shape variations. We are hopeful that this method could pave the way for new
possibilities in solving the packing problem
Physics-Informed Neural Operator for Learning Partial Differential Equations
Machine learning methods have recently shown promise in solving partial
differential equations (PDEs). They can be classified into two broad
categories: approximating the solution function and learning the solution
operator. The Physics-Informed Neural Network (PINN) is an example of the
former while the Fourier neural operator (FNO) is an example of the latter.
Both these approaches have shortcomings. The optimization in PINN is
challenging and prone to failure, especially on multi-scale dynamic systems.
FNO does not suffer from this optimization issue since it carries out
supervised learning on a given dataset, but obtaining such data may be too
expensive or infeasible. In this work, we propose the physics-informed neural
operator (PINO), where we combine the operating-learning and
function-optimization frameworks. This integrated approach improves convergence
rates and accuracy over both PINN and FNO models. In the operator-learning
phase, PINO learns the solution operator over multiple instances of the
parametric PDE family. In the test-time optimization phase, PINO optimizes the
pre-trained operator ansatz for the querying instance of the PDE. Experiments
show PINO outperforms previous ML methods on many popular PDE families while
retaining the extraordinary speed-up of FNO compared to solvers. In particular,
PINO accurately solves challenging long temporal transient flows and Kolmogorov
flows where other baseline ML methods fail to converge
FaaSwap: SLO-Aware, GPU-Efficient Serverless Inference via Model Swapping
The dynamic request patterns of machine learning (ML) inference workloads
have driven an increasing trend towards exploiting serverless computing for
scalable ML model serving. However, today's serverless platforms lack efficient
support for GPUs -- provisioning functions on GPUs incurs extremely high
overhead, forcing them to keep long-running even when idling for reduced cold
starts. This leads to significant resource waste to perform ML inference and
hinders the pay-per-use billing for GPUs.
In this paper, we present FaaSwap, a serverless platform enabling
fine-grained, request-level GPU sharing for resource-efficient ML inference.
FaaSwap leverages model swapping to support fast inference execution at low
resource cost. It keeps models in a host which has a large amount of cheap
memory and quickly swaps models to GPUs when requested, reducing per-function
keep-alive cost and enabling efficient GPU sharing across much more functions.
FaaSwap also supports swapping models between GPUs for load balancing and
improved inference performance. In FaaSwap, we design sophisticated request
scheduling and memory management algorithms that efficiently exploit model
swapping to reduce GPU cost and meet latency service-level objectives (SLOs)
for all inference functions. We have implemented and integrated FaaSwap into
Alibaba Cloud Function Compute (FC), one of the world's largest commercial
serverless platform. Evaluation results show that FaaSwap can achieve
low-latency model swapping, efficiently share a GPU across hundreds of
functions, and satisfy per-function latency SLOs at scale
Data augmentation and intelligent fault diagnosis of planetary gearbox using ILoFGAN under extremely limited samples
Though the existing generative adversarial networks (GAN) have the potential for data augmentation and intelligent fault diagnosis of planetary gearbox, it remains difficult to deal with extremely limited training samples and effectively fuse the representative and diverse information. To tackle the above challenges, an improved local fusion generative adversarial network (ILoFGAN) is proposed. Time-domain waveforms are firstly transformed into the time-frequency diagrams to highlight the fault characteristics. Subsequently, a local fusion module is used to fully utilize extremely limited samples and fuse the local features. Finally, a new generator embedded with multi-head attention modules is constructed to effectively improve the accuracy and flexibility of the feature fusion process. The proposed method is applied to the analysis of planetary gearbox vibration signals. The results show that the proposed method can generate a large number of samples with higher similarity and better diversity compared with the existing mainstream GANs using 6 training samples in each type. The generated samples are used to augment the limited dataset, prominently improving the accuracy of the fault diagnosis task
IdealGPT: Iteratively Decomposing Vision and Language Reasoning via Large Language Models
The field of vision-and-language (VL) understanding has made unprecedented
progress with end-to-end large pre-trained VL models (VLMs). However, they
still fall short in zero-shot reasoning tasks that require multi-step
inferencing. To achieve this goal, previous works resort to a
divide-and-conquer pipeline. In this paper, we argue that previous efforts have
several inherent shortcomings: 1) They rely on domain-specific sub-question
decomposing models. 2) They force models to predict the final answer even if
the sub-questions or sub-answers provide insufficient information. We address
these limitations via IdealGPT, a framework that iteratively decomposes VL
reasoning using large language models (LLMs). Specifically, IdealGPT utilizes
an LLM to generate sub-questions, a VLM to provide corresponding sub-answers,
and another LLM to reason to achieve the final answer. These three modules
perform the divide-and-conquer procedure iteratively until the model is
confident about the final answer to the main question. We evaluate IdealGPT on
multiple challenging VL reasoning tasks under a zero-shot setting. In
particular, our IdealGPT outperforms the best existing GPT-4-like models by an
absolute 10% on VCR and 15% on SNLI-VE. Code is available at
https://github.com/Hxyou/IdealGPTComment: 13 pages, 5 figure
- …