102 research outputs found

    Can Semantic-Phonemic Discrepancy in Verbal Fluency Help to Detect Alzheimer’s Dementia?

    Get PDF
    Word retrieval difficulty is one of the early signs of Alzheimer’s disease, although such difficulties can also occur in typically aging. Therefore, it is necessary to find a task that differentiates the early stages of Alzheimer’s dementia from typically aging. Verbal fluency is a widely used measure to assess subjects’ cognitive processes following neurological damage, and often includes two subtests: semantic fluency, in which participants are asked to produce words which meet a semantic criterion, such as food or animals; and letter fluency, which requires participants to produce words starting with a certain letter, such as F or S. People with Alzheimer’s disease have more difficulty with semantic than letter fluency, although this pattern has also been shown in typically aging. In the current research, we investigate whether the semantic-letter discrepancy can differentiate Alzheimer’s dementia from typically aging

    Image Quality-aware Diagnosis via Meta-knowledge Co-embedding

    Full text link
    Medical images usually suffer from image degradation in clinical practice, leading to decreased performance of deep learning-based models. To resolve this problem, most previous works have focused on filtering out degradation-causing low-quality images while ignoring their potential value for models. Through effectively learning and leveraging the knowledge of degradations, models can better resist their adverse effects and avoid misdiagnosis. In this paper, we raise the problem of image quality-aware diagnosis, which aims to take advantage of low-quality images and image quality labels to achieve a more accurate and robust diagnosis. However, the diversity of degradations and superficially unrelated targets between image quality assessment and disease diagnosis makes it still quite challenging to effectively leverage quality labels to assist diagnosis. Thus, to tackle these issues, we propose a novel meta-knowledge co-embedding network, consisting of two subnets: Task Net and Meta Learner. Task Net constructs an explicit quality information utilization mechanism to enhance diagnosis via knowledge co-embedding features, while Meta Learner ensures the effectiveness and constrains the semantics of these features via meta-learning and joint-encoding masking. Superior performance on five datasets with four widely-used medical imaging modalities demonstrates the effectiveness and generalizability of our method.Comment: Accepted by CVPR 202

    Learning Robust Representation for Joint Grading of Ophthalmic Diseases via Adaptive Curriculum and Feature Disentanglement

    Full text link
    Diabetic retinopathy (DR) and diabetic macular edema (DME) are leading causes of permanent blindness worldwide. Designing an automatic grading system with good generalization ability for DR and DME is vital in clinical practice. However, prior works either grade DR or DME independently, without considering internal correlations between them, or grade them jointly by shared feature representation, yet ignoring potential generalization issues caused by difficult samples and data bias. Aiming to address these problems, we propose a framework for joint grading with the dynamic difficulty-aware weighted loss (DAW) and the dual-stream disentangled learning architecture (DETACH). Inspired by curriculum learning, DAW learns from simple samples to difficult samples dynamically via measuring difficulty adaptively. DETACH separates features of grading tasks to avoid potential emphasis on the bias. With the addition of DAW and DETACH, the model learns robust disentangled feature representations to explore internal correlations between DR and DME and achieve better grading performance. Experiments on three benchmarks show the effectiveness and robustness of our framework under both the intra-dataset and cross-dataset tests.Comment: Accepted by MICCAI2

    Towards Generalizable Diabetic Retinopathy Grading in Unseen Domains

    Full text link
    Diabetic Retinopathy (DR) is a common complication of diabetes and a leading cause of blindness worldwide. Early and accurate grading of its severity is crucial for disease management. Although deep learning has shown great potential for automated DR grading, its real-world deployment is still challenging due to distribution shifts among source and target domains, known as the domain generalization problem. Existing works have mainly attributed the performance degradation to limited domain shifts caused by simple visual discrepancies, which cannot handle complex real-world scenarios. Instead, we present preliminary evidence suggesting the existence of three-fold generalization issues: visual and degradation style shifts, diagnostic pattern diversity, and data imbalance. To tackle these issues, we propose a novel unified framework named Generalizable Diabetic Retinopathy Grading Network (GDRNet). GDRNet consists of three vital components: fundus visual-artifact augmentation (FundusAug), dynamic hybrid-supervised loss (DahLoss), and domain-class-aware re-balancing (DCR). FundusAug generates realistic augmented images via visual transformation and image degradation, while DahLoss jointly leverages pixel-level consistency and image-level semantics to capture the diverse diagnostic patterns and build generalizable feature representations. Moreover, DCR mitigates the data imbalance from a domain-class view and avoids undesired over-emphasis on rare domain-class pairs. Finally, we design a publicly available benchmark for fair evaluations. Extensive comparison experiments against advanced methods and exhaustive ablation studies demonstrate the effectiveness and generalization ability of GDRNet.Comment: Earyly Accepted by MICCAI 2023, the 26th International Conference on Medical Image Computing and Computer Assisted Interventio

    SceneTex: High-Quality Texture Synthesis for Indoor Scenes via Diffusion Priors

    Full text link
    We propose SceneTex, a novel method for effectively generating high-quality and style-consistent textures for indoor scenes using depth-to-image diffusion priors. Unlike previous methods that either iteratively warp 2D views onto a mesh surface or distillate diffusion latent features without accurate geometric and style cues, SceneTex formulates the texture synthesis task as an optimization problem in the RGB space where style and geometry consistency are properly reflected. At its core, SceneTex proposes a multiresolution texture field to implicitly encode the mesh appearance. We optimize the target texture via a score-distillation-based objective function in respective RGB renderings. To further secure the style consistency across views, we introduce a cross-attention decoder to predict the RGB values by cross-attending to the pre-sampled reference locations in each instance. SceneTex enables various and accurate texture synthesis for 3D-FRONT scenes, demonstrating significant improvements in visual quality and prompt fidelity over the prior texture generation methods.Comment: Project website: https://daveredrum.github.io/SceneTex

    Learning Gradient Fields for Scalable and Generalizable Irregular Packing

    Full text link
    The packing problem, also known as cutting or nesting, has diverse applications in logistics, manufacturing, layout design, and atlas generation. It involves arranging irregularly shaped pieces to minimize waste while avoiding overlap. Recent advances in machine learning, particularly reinforcement learning, have shown promise in addressing the packing problem. In this work, we delve deeper into a novel machine learning-based approach that formulates the packing problem as conditional generative modeling. To tackle the challenges of irregular packing, including object validity constraints and collision avoidance, our method employs the score-based diffusion model to learn a series of gradient fields. These gradient fields encode the correlations between constraint satisfaction and the spatial relationships of polygons, learned from teacher examples. During the testing phase, packing solutions are generated using a coarse-to-fine refinement mechanism guided by the learned gradient fields. To enhance packing feasibility and optimality, we introduce two key architectural designs: multi-scale feature extraction and coarse-to-fine relation extraction. We conduct experiments on two typical industrial packing domains, considering translations only. Empirically, our approach demonstrates spatial utilization rates comparable to, or even surpassing, those achieved by the teacher algorithm responsible for training data generation. Additionally, it exhibits some level of generalization to shape variations. We are hopeful that this method could pave the way for new possibilities in solving the packing problem

    Physics-Informed Neural Operator for Learning Partial Differential Equations

    Full text link
    Machine learning methods have recently shown promise in solving partial differential equations (PDEs). They can be classified into two broad categories: approximating the solution function and learning the solution operator. The Physics-Informed Neural Network (PINN) is an example of the former while the Fourier neural operator (FNO) is an example of the latter. Both these approaches have shortcomings. The optimization in PINN is challenging and prone to failure, especially on multi-scale dynamic systems. FNO does not suffer from this optimization issue since it carries out supervised learning on a given dataset, but obtaining such data may be too expensive or infeasible. In this work, we propose the physics-informed neural operator (PINO), where we combine the operating-learning and function-optimization frameworks. This integrated approach improves convergence rates and accuracy over both PINN and FNO models. In the operator-learning phase, PINO learns the solution operator over multiple instances of the parametric PDE family. In the test-time optimization phase, PINO optimizes the pre-trained operator ansatz for the querying instance of the PDE. Experiments show PINO outperforms previous ML methods on many popular PDE families while retaining the extraordinary speed-up of FNO compared to solvers. In particular, PINO accurately solves challenging long temporal transient flows and Kolmogorov flows where other baseline ML methods fail to converge

    FaaSwap: SLO-Aware, GPU-Efficient Serverless Inference via Model Swapping

    Full text link
    The dynamic request patterns of machine learning (ML) inference workloads have driven an increasing trend towards exploiting serverless computing for scalable ML model serving. However, today's serverless platforms lack efficient support for GPUs -- provisioning functions on GPUs incurs extremely high overhead, forcing them to keep long-running even when idling for reduced cold starts. This leads to significant resource waste to perform ML inference and hinders the pay-per-use billing for GPUs. In this paper, we present FaaSwap, a serverless platform enabling fine-grained, request-level GPU sharing for resource-efficient ML inference. FaaSwap leverages model swapping to support fast inference execution at low resource cost. It keeps models in a host which has a large amount of cheap memory and quickly swaps models to GPUs when requested, reducing per-function keep-alive cost and enabling efficient GPU sharing across much more functions. FaaSwap also supports swapping models between GPUs for load balancing and improved inference performance. In FaaSwap, we design sophisticated request scheduling and memory management algorithms that efficiently exploit model swapping to reduce GPU cost and meet latency service-level objectives (SLOs) for all inference functions. We have implemented and integrated FaaSwap into Alibaba Cloud Function Compute (FC), one of the world's largest commercial serverless platform. Evaluation results show that FaaSwap can achieve low-latency model swapping, efficiently share a GPU across hundreds of functions, and satisfy per-function latency SLOs at scale

    Data augmentation and intelligent fault diagnosis of planetary gearbox using ILoFGAN under extremely limited samples

    Get PDF
    Though the existing generative adversarial networks (GAN) have the potential for data augmentation and intelligent fault diagnosis of planetary gearbox, it remains difficult to deal with extremely limited training samples and effectively fuse the representative and diverse information. To tackle the above challenges, an improved local fusion generative adversarial network (ILoFGAN) is proposed. Time-domain waveforms are firstly transformed into the time-frequency diagrams to highlight the fault characteristics. Subsequently, a local fusion module is used to fully utilize extremely limited samples and fuse the local features. Finally, a new generator embedded with multi-head attention modules is constructed to effectively improve the accuracy and flexibility of the feature fusion process. The proposed method is applied to the analysis of planetary gearbox vibration signals. The results show that the proposed method can generate a large number of samples with higher similarity and better diversity compared with the existing mainstream GANs using 6 training samples in each type. The generated samples are used to augment the limited dataset, prominently improving the accuracy of the fault diagnosis task

    IdealGPT: Iteratively Decomposing Vision and Language Reasoning via Large Language Models

    Full text link
    The field of vision-and-language (VL) understanding has made unprecedented progress with end-to-end large pre-trained VL models (VLMs). However, they still fall short in zero-shot reasoning tasks that require multi-step inferencing. To achieve this goal, previous works resort to a divide-and-conquer pipeline. In this paper, we argue that previous efforts have several inherent shortcomings: 1) They rely on domain-specific sub-question decomposing models. 2) They force models to predict the final answer even if the sub-questions or sub-answers provide insufficient information. We address these limitations via IdealGPT, a framework that iteratively decomposes VL reasoning using large language models (LLMs). Specifically, IdealGPT utilizes an LLM to generate sub-questions, a VLM to provide corresponding sub-answers, and another LLM to reason to achieve the final answer. These three modules perform the divide-and-conquer procedure iteratively until the model is confident about the final answer to the main question. We evaluate IdealGPT on multiple challenging VL reasoning tasks under a zero-shot setting. In particular, our IdealGPT outperforms the best existing GPT-4-like models by an absolute 10% on VCR and 15% on SNLI-VE. Code is available at https://github.com/Hxyou/IdealGPTComment: 13 pages, 5 figure
    • …