Search CORE

341 research outputs found

MultiMedEval: A Benchmark and a Toolkit for Evaluating Medical Vision-Language Models

Author: Menze Bjoern
Royer Corentin
Sekuboyina Anjany
Publication venue
Publication date: 16/02/2024
Field of study

We introduce MultiMedEval, an open-source toolkit for fair and reproducible evaluation of large, medical vision-language models (VLM). MultiMedEval comprehensively assesses the models' performance on a broad array of six multi-modal tasks, conducted over 23 datasets, and spanning over 11 medical domains. The chosen tasks and performance metrics are based on their widespread adoption in the community and their diversity, ensuring a thorough evaluation of the model's overall generalizability. We open-source a Python toolkit (github.com/corentin-ryr/MultiMedEval) with a simple interface and setup process, enabling the evaluation of any VLM in just a few lines of code. Our goal is to simplify the intricate landscape of VLM evaluation, thus promoting fair and uniform benchmarking of future models.Comment: Under review at MIDL 202

arXiv.org e-Print Archive

Deep Quality Estimation: Creating Surrogate Models for Human Quality Ratings

Author: et al
Kofler Florian
Menze Bjoern
Publication venue
Publication date: 17/05/2022
Field of study

Human ratings are abstract representations of segmentation quality. To approximate human quality ratings on scarce expert data, we train surrogate quality estimation models. We evaluate on a complex multi-class segmentation problem, specifically glioma segmentation following the BraTS annotation protocol. The training data features quality ratings from 15 expert neuroradiologists on a scale ranging from 1 to 6 stars for various computer-generated and manual 3D annotations. Even though the networks operate on 2D images and with scarce training data, we can approximate segmentation quality within a margin of error comparable to human intra-rater reliability. Segmentation quality prediction has broad applications. While an understanding of segmentation quality is imperative for successful clinical translation of automatic segmentation quality algorithms, it can play an essential role in training new segmentation models. Due to the split-second inference times, it can be directly applied within a loss function or as a fully-automatic dataset curation mechanism in a federated learning setting

ZORA

A Dempster-Shafer approach to trustworthy AI with application to fetal brain MRI segmentation

Author: et al
Fidon Lucas
Menze Bjoern
Publication venue
Publication date: 09/07/2022
Field of study

Deep learning models for medical image segmentation can fail unexpectedly and spectacularly for pathological cases and images acquired at different centers than training images, with labeling errors that violate expert knowledge. Such errors undermine the trustworthiness of deep learning models for medical image segmentation. Mechanisms for detecting and correcting such failures are essential for safely translating this technology into clinics and are likely to be a requirement of future regulations on artificial intelligence (AI). In this work, we propose a trustworthy AI theoretical framework and a practical system that can augment any backbone AI system using a fallback method and a fail-safe mechanism based on Dempster-Shafer theory. Our approach relies on an actionable definition of trustworthy AI. Our method automatically discards the voxel-level labeling predicted by the backbone AI that violate expert knowledge and relies on a fallback for those voxels. We demonstrate the effectiveness of the proposed trustworthy AI approach on the largest reported annotated dataset of fetal MRI consisting of 540 manually annotated fetal brain 3D T2w MRIs from 13 centers. Our trustworthy AI method improves the robustness of a state-of-the-art backbone AI for fetal brain MRIs acquired across various centers and for fetuses with various brain abnormalities

ZORA

Recommended from our members

Multitemporal Fusion for the Detection of Static Spatial Patterns in Multispectral Satellite Images--with Application to Archaeological Survey

Author: Menze Bjoern H.
Ur Jason Alik
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 24/06/2014
Field of study

We evaluate and further develop a multitemporal fusion strategy that we use to detect the location of ancient settlement sites in the Near East and to map their distribution, a spatial pattern that remains static over time. For each ASTER images that has been acquired in our survey area in north-eastern Syria, we use a pattern classiﬁcation strategy to map locations with a multispectral signal similar to the one from (few) known archaeological sites nearby. We obtain maps indicating the presence of anthrosol – soils that formed in the location of ancient settlements and that have a distinct spectral pattern under certain environmental conditions – and ﬁnd that pooling the probability maps from all available time points reduces the variance of the spatial anthrosol pattern signiﬁcantly. Removing biased classiﬁcation maps – i.e. those that rank last when comparing the probability maps with the (limited) ground truth we have – reduces the overall prediction error even further, and we estimate optimal weights for each image using a non-negative least squares regression strategy. The ranking and pooling strategy approach we propose in this study shows a signiﬁcant improvement over the plain averaging of anthrosol probability maps that we used in an earlier attempt to map archaeological sites in a 20,000 km2 area in northern Mesopotamia, and we expect it to work well in other surveying tasks that aim at mapping static surface patterns with limited ground truth in long series of multispectral images.Anthropolog

Harvard University - DASH

blob loss: instance imbalance aware loss functions for semantic segmentation

Author: et al
Kofler Florian
Menze Bjoern
Publication venue
Publication date: 14/07/2022
Field of study

Deep convolutional neural networks have proven to be remarkably effective in semantic segmentation tasks. Most popular loss functions were introduced targeting improved volumetric scores, such as the Sorensen Dice coefficient. By design, DSC can tackle class imbalance; however, it does not recognize instance imbalance within a class. As a result, a large foreground instance can dominate minor instances and still produce a satisfactory Sorensen Dice coefficient. Nevertheless, missing out on instances will lead to poor detection performance. This represents a critical issue in applications such as disease progression monitoring. For example, it is imperative to locate and surveil small-scale lesions in the follow-up of multiple sclerosis patients. We propose a novel family of loss functions, nicknamed blob loss, primarily aimed at maximizing instance-level detection metrics, such as F1 score and sensitivity. Blob loss is designed for semantic segmentation problems in which the instances are the connected components within a class. We extensively evaluate a DSC-based blob loss in five complex 3D semantic segmentation tasks featuring pronounced instance heterogeneity in terms of texture and morphology. Compared to soft Dice loss, we achieve 5 percent improvement for MS lesions, 3 percent improvement for liver tumor, and an average 2 percent improvement for Microscopy segmentation tasks considering F1 score

ZORA

Semi-Implicit Neural Solver for Time-dependent Partial Differential Equations

Author: et al
Menze Bjoern
Shit Suprosanna
Publication venue
Publication date: 21/09/2021
Field of study

Fast and accurate solutions of time-dependent partial differential equations (PDEs) are of pivotal interest to many research fields, including physics, engineering, and biology. Generally, implicit/semi-implicit schemes are preferred over explicit ones to improve stability and correctness. However, existing semi-implicit methods are usually iterative and employ a general-purpose solver, which may be sub-optimal for a specific class of PDEs. In this paper, we propose a neural solver to learn an optimal iterative scheme in a data-driven fashion for any class of PDEs. Specifically, we modify a single iteration of a semi-implicit solver using a deep neural network. We provide theoretical guarantees for the correctness and convergence of neural solvers analogous to conventional iterative solvers. In addition to the commonly used Dirichlet boundary condition, we adopt a diffuse domain approach to incorporate a diverse type of boundary conditions, e.g., Neumann. We show that the proposed neural solver can go beyond linear PDEs and applies to a class of non-linear PDEs, where the non-linear component is non-stiff. We demonstrate the efficacy of our method on 2D and 3D scenarios. To this end, we show how our model generalizes to parameter settings, which are different from training; and achieves faster convergence than semi-implicit schemes

ZORA