43 research outputs found
Understanding metric-related pitfalls in image analysis validation
Validation metrics are key for the reliable tracking of scientific progress
and for bridging the current chasm between artificial intelligence (AI)
research and its translation into practice. However, increasing evidence shows
that particularly in image analysis, metrics are often chosen inadequately in
relation to the underlying research problem. This could be attributed to a lack
of accessibility of metric-related knowledge: While taking into account the
individual strengths, weaknesses, and limitations of validation metrics is a
critical prerequisite to making educated choices, the relevant knowledge is
currently scattered and poorly accessible to individual researchers. Based on a
multi-stage Delphi process conducted by a multidisciplinary expert consortium
as well as extensive community feedback, the present work provides the first
reliable and comprehensive common point of access to information on pitfalls
related to validation metrics in image analysis. Focusing on biomedical image
analysis but with the potential of transfer to other fields, the addressed
pitfalls generalize across application domains and are categorized according to
a newly created, domain-agnostic taxonomy. To facilitate comprehension,
illustrations and specific examples accompany each pitfall. As a structured
body of information accessible to researchers of all levels of expertise, this
work enhances global comprehension of a key topic in image analysis validation.Comment: Shared first authors: Annika Reinke, Minu D. Tizabi; shared senior
authors: Paul F. J\"ager, Lena Maier-Hei
Common Limitations of Image Processing Metrics:A Picture Story
While the importance of automatic image analysis is continuously increasing,
recent meta-research revealed major flaws with respect to algorithm validation.
Performance metrics are particularly key for meaningful, objective, and
transparent performance assessment and validation of the used automatic
algorithms, but relatively little attention has been given to the practical
pitfalls when using specific metrics for a given image analysis task. These are
typically related to (1) the disregard of inherent metric properties, such as
the behaviour in the presence of class imbalance or small target structures,
(2) the disregard of inherent data set properties, such as the non-independence
of the test cases, and (3) the disregard of the actual biomedical domain
interest that the metrics should reflect. This living dynamically document has
the purpose to illustrate important limitations of performance metrics commonly
applied in the field of image analysis. In this context, it focuses on
biomedical image analysis problems that can be phrased as image-level
classification, semantic segmentation, instance segmentation, or object
detection task. The current version is based on a Delphi process on metrics
conducted by an international consortium of image analysis experts from more
than 60 institutions worldwide.Comment: This is a dynamic paper on limitations of commonly used metrics. The
current version discusses metrics for image-level classification, semantic
segmentation, object detection and instance segmentation. For missing use
cases, comments or questions, please contact [email protected] or
[email protected]. Substantial contributions to this document will be
acknowledged with a co-authorshi
Evaluation of individual and ensemble probabilistic forecasts of COVID-19 mortality in the United States
Short-term probabilistic forecasts of the trajectory of the COVID-19 pandemic in the United States have served as a visible and important communication channel between the scientific modeling community and both the general public and decision-makers. Forecasting models provide specific, quantitative, and evaluable predictions that inform short-term decisions such as healthcare staffing needs, school closures, and allocation of medical supplies. Starting in April 2020, the US COVID-19 Forecast Hub (https://covid19forecasthub.org/) collected, disseminated, and synthesized tens of millions of specific predictions from more than 90 different academic, industry, and independent research groups. A multimodel ensemble forecast that combined predictions from dozens of groups every week provided the most consistently accurate probabilistic forecasts of incident deaths due to COVID-19 at the state and national level from April 2020 through October 2021. The performance of 27 individual models that submitted complete forecasts of COVID-19 deaths consistently throughout this year showed high variability in forecast skill across time, geospatial units, and forecast horizons. Two-thirds of the models evaluated showed better accuracy than a naïve baseline model. Forecast accuracy degraded as models made predictions further into the future, with probabilistic error at a 20-wk horizon three to five times larger than when predicting at a 1-wk horizon. This project underscores the role that collaboration and active coordination between governmental public-health agencies, academic modeling teams, and industry partners can play in developing modern modeling capabilities to support local, state, and federal response to outbreaks
Hyperoxemia and excess oxygen use in early acute respiratory distress syndrome : Insights from the LUNG SAFE study
Publisher Copyright: © 2020 The Author(s). Copyright: Copyright 2020 Elsevier B.V., All rights reserved.Background: Concerns exist regarding the prevalence and impact of unnecessary oxygen use in patients with acute respiratory distress syndrome (ARDS). We examined this issue in patients with ARDS enrolled in the Large observational study to UNderstand the Global impact of Severe Acute respiratory FailurE (LUNG SAFE) study. Methods: In this secondary analysis of the LUNG SAFE study, we wished to determine the prevalence and the outcomes associated with hyperoxemia on day 1, sustained hyperoxemia, and excessive oxygen use in patients with early ARDS. Patients who fulfilled criteria of ARDS on day 1 and day 2 of acute hypoxemic respiratory failure were categorized based on the presence of hyperoxemia (PaO2 > 100 mmHg) on day 1, sustained (i.e., present on day 1 and day 2) hyperoxemia, or excessive oxygen use (FIO2 ≥ 0.60 during hyperoxemia). Results: Of 2005 patients that met the inclusion criteria, 131 (6.5%) were hypoxemic (PaO2 < 55 mmHg), 607 (30%) had hyperoxemia on day 1, and 250 (12%) had sustained hyperoxemia. Excess FIO2 use occurred in 400 (66%) out of 607 patients with hyperoxemia. Excess FIO2 use decreased from day 1 to day 2 of ARDS, with most hyperoxemic patients on day 2 receiving relatively low FIO2. Multivariate analyses found no independent relationship between day 1 hyperoxemia, sustained hyperoxemia, or excess FIO2 use and adverse clinical outcomes. Mortality was 42% in patients with excess FIO2 use, compared to 39% in a propensity-matched sample of normoxemic (PaO2 55-100 mmHg) patients (P = 0.47). Conclusions: Hyperoxemia and excess oxygen use are both prevalent in early ARDS but are most often non-sustained. No relationship was found between hyperoxemia or excessive oxygen use and patient outcome in this cohort. Trial registration: LUNG-SAFE is registered with ClinicalTrials.gov, NCT02010073publishersversionPeer reviewe
The United States COVID-19 Forecast Hub dataset
Academic researchers, government agencies, industry groups, and individuals have produced forecasts at an unprecedented scale during the COVID-19 pandemic. To leverage these forecasts, the United States Centers for Disease Control and Prevention (CDC) partnered with an academic research lab at the University of Massachusetts Amherst to create the US COVID-19 Forecast Hub. Launched in April 2020, the Forecast Hub is a dataset with point and probabilistic forecasts of incident cases, incident hospitalizations, incident deaths, and cumulative deaths due to COVID-19 at county, state, and national, levels in the United States. Included forecasts represent a variety of modeling approaches, data sources, and assumptions regarding the spread of COVID-19. The goal of this dataset is to establish a standardized and comparable set of short-term forecasts from modeling teams. These data can be used to develop ensemble models, communicate forecasts to the public, create visualizations, compare models, and inform policies regarding COVID-19 mitigation. These open-source data are available via download from GitHub, through an online API, and through R packages
First Sagittarius A* Event Horizon Telescope results. II. EHT and multiwavelength observations, data processing, and calibration
We present Event Horizon Telescope (EHT) 1.3 mm measurements of the radio source located at the position of the supermassive black hole Sagittarius A* (Sgr A*), collected during the 2017 April 5–11 campaign. The observations were carried out with eight facilities at six locations across the globe. Novel calibration methods are employed to account for Sgr A*'s flux variability. The majority of the 1.3 mm emission arises from horizon scales, where intrinsic structural source variability is detected on timescales of minutes to hours. The effects of interstellar scattering on the image and its variability are found to be subdominant to intrinsic source structure. The calibrated visibility amplitudes, particularly the locations of the visibility minima, are broadly consistent with a blurred ring with a diameter of ∼50 μas, as determined in later works in this series. Contemporaneous multiwavelength monitoring of Sgr A* was performed at 22, 43, and 86 GHz and at near-infrared and X-ray wavelengths. Several X-ray flares from Sgr A* are detected by Chandra, one at low significance jointly with Swift on 2017 April 7 and the other at higher significance jointly with NuSTAR on 2017 April 11. The brighter April 11 flare is not observed simultaneously by the EHT but is followed by a significant increase in millimeter flux variability immediately after the X-ray outburst, indicating a likely connection in the emission physics near the event horizon. We compare Sgr A*'s broadband flux during the EHT campaign to its historical spectral energy distribution and find that both the quiescent emission and flare emission are consistent with its long-term behavior.http://iopscience.iop.org/2041-8205Physic
First Sagittarius A* Event Horizon Telescope Results. II. EHT and Multiwavelength Observations, Data Processing, and Calibration
We present Event Horizon Telescope (EHT) 1.3 mm measurements of the radio source located at the position of the supermassive black hole Sagittarius A* (Sgr A*), collected during the 2017 April 5–11 campaign. The observations were carried out with eight facilities at six locations across the globe. Novel calibration methods are employed to account for Sgr A*'s flux variability. The majority of the 1.3 mm emission arises from horizon scales, where intrinsic structural source variability is detected on timescales of minutes to hours. The effects of interstellar scattering on the image and its variability are found to be subdominant to intrinsic source structure. The calibrated visibility amplitudes, particularly the locations of the visibility minima, are broadly consistent with a blurred ring with a diameter of ∼50 μas, as determined in later works in this series. Contemporaneous multiwavelength monitoring of Sgr A* was performed at 22, 43, and 86 GHz and at near-infrared and X-ray wavelengths. Several X-ray flares from Sgr A* are detected by Chandra, one at low significance jointly with Swift on 2017 April 7 and the other at higher significance jointly with NuSTAR on 2017 April 11. The brighter April 11 flare is not observed simultaneously by the EHT but is followed by a significant increase in millimeter flux variability immediately after the X-ray outburst, indicating a likely connection in the emission physics near the event horizon. We compare Sgr A*’s broadband flux during the EHT campaign to its historical spectral energy distribution and find that both the quiescent emission and flare emission are consistent with its long-term behavior
Inhibition of Gsk3b Reduces Nfkb1 Signaling and Rescues Synaptic Activity to Improve the Rett Syndrome Phenotype in Mecp2-Knockout Mice
Summary: Rett syndrome (RTT) is the second leading cause of mental impairment in girls and is currently untreatable. RTT is caused, in more than 95% of cases, by loss-of-function mutations in the methyl CpG-binding protein 2 gene (MeCP2). We propose here a molecular target involved in RTT: the glycogen synthase kinase-3b (Gsk3b) pathway. Gsk3b activity is deregulated in Mecp2-knockout (KO) mice models, and SB216763, a specific inhibitor, is able to alleviate the clinical symptoms with consequences at the molecular and cellular levels. In vivo, inhibition of Gsk3b prolongs the lifespan of Mecp2-KO mice and reduces motor deficits. At the molecular level, SB216763 rescues dendritic networks and spine density, while inducing changes in the properties of excitatory synapses. Gsk3b inhibition can also decrease the nuclear activity of the Nfkb1 pathway and neuroinflammation. Altogether, our findings indicate that Mecp2 deficiency in the RTT mouse model is partially rescued following treatment with SB216763. : Rett syndrome (RTT) is a severe neurodevelopmental disorder characterized by loss-of-function mutations in the MeCP2 gene. Jorge-Torres et al. show that mice models of RTT display hyperactivation of Gsk3b kinase and neuroinflammation. Inhibition of the Gsk3b pathway partially rescues the phenotype and affects neuronal morphology and synaptic activity. Keywords: Rett syndrome, Gsk3b, Nfkb1, neurons, mice models, SB216763, neuroinflammation, Mecp