Search CORE

19 research outputs found

Is this model reliable for everyone? Testing for strong calibration

Author: Feng Jean
Gossmann Alexej
Pennello Gene
Petrick Nicholas
Pirracchio Romain
Sahiner Berkman
Publication venue
Publication date: 27/07/2023
Field of study

In a well-calibrated risk prediction model, the average predicted probability is close to the true event rate for any given subgroup. Such models are reliable across heterogeneous populations and satisfy strong notions of algorithmic fairness. However, the task of auditing a model for strong calibration is well-known to be difficult -- particularly for machine learning (ML) algorithms -- due to the sheer number of potential subgroups. As such, common practice is to only assess calibration with respect to a few predefined subgroups. Recent developments in goodness-of-fit testing offer potential solutions but are not designed for settings with weak signal or where the poorly calibrated subgroup is small, as they either overly subdivide the data or fail to divide the data at all. We introduce a new testing procedure based on the following insight: if we can reorder observations by their expected residuals, there should be a change in the association between the predicted and observed residuals along this sequence if a poorly calibrated subgroup exists. This lets us reframe the problem of calibration testing into one of changepoint detection, for which powerful methods already exist. We begin with introducing a sample-splitting procedure where a portion of the data is used to train a suite of candidate models for predicting the residual, and the remaining data are used to perform a score-based cumulative sum (CUSUM) test. To further improve power, we then extend this adaptive CUSUM test to incorporate cross-validation, while maintaining Type I error control under minimal assumptions. Compared to existing methods, the proposed procedure consistently achieved higher power in simulation studies and more than doubled the power when auditing a mortality risk prediction model

arXiv.org e-Print Archive

A Causal Roadmap for Generating High-Quality Real-World Evidence

Author: Brian D Williamson
Carla Y Vossen
Charles E Barr
David Martin
Debashis Ghosh
Demissie Alemayehu
Elizabeth A Stuart
Emre Kıcıman
Gene Pennello
Hana Lee
Henrik Ravn
Issa J Dahabreh
Iván Díaz
John B Buse
Kajsa Kvist
Katherine L Hoffman
Lauren E Dang
Mark van der Laan
Maya Petersen
Mei-Chiung Shih
Mouna Akacha
Raymond A Huml
Richard Pratley
Richard Wyss
Salina P Waddy
Susan Gruber
Publication venue: Cambridge University Press
Publication date
Field of study

Directory of Open Access Journals

Book Review

Author: Gene Pennello
Obuchowski N. A.
Publication venue: 'Informa UK Limited'
Publication date
Field of study

Crossref

Net Benefit of Diagnostic Tests for Multistate Diseases: an Indicator Variables Approach

Author: Ahmeda Ferdous
Pennello Gene
Samawi Hani
Yin Jingjing
Publication venue: 'Informa UK Limited'
Publication date: 29/01/2023
Field of study

A limitation of the common measures of diagnostic test performance, such as sensitivity and specificity, is that they do not consider the relative importance of false negative and false positive test results, which are likely to have different clinical consequences. Therefore, the use of classification or prediction measures alone to compare diagnostic tests or biomarkers can be inconclusive for clinicians. Comparing tests on net benefit can be more conclusive because clinical consequences of misdiagnoses are considered. The literature suggested evaluating the binary diagnostic tests based on net benefit, but did not consider diagnostic tests that classify more than two disease states, e.g., stroke subtype (large-artery atherosclerosis, cardioembolism, small-vessel occlusion, stroke of other determined etiology, stroke of undetermined etiology), skin lesion subtype, breast cancer subtypes (benign, mass, calcification, architectural distortion, etc.), METAVIR liver fibrosis state (F0- F4), histopathological classification of cervical intraepithelial neoplasia (CIN), prostate Gleason grade, brain injury (intracranial hemorrhage, mass effect, midline shift, cranial fracture) . Other diseases have more than two stages, such as Alzheimer\u27s disease (dementia due to Alzheimer\u27s disease, mild cognitive disability (MCI) due to Alzheimer\u27s disease, and preclinical presymptomatics due to Alzheimer\u27s disease). In diseases with more than two states, the benefits and risks may vary between states. This paper extends the net-benefit approach of evaluating binary diagnostic tests to multi-state clinical conditions to rule-in or rule-out a clinical condition based on adverse consequences of work-up delay (due to false negative test result) and unnecessary workup (due to false positive test result). We demonstrate our approach with numerical examples and real data

Georgia Southern University: Digital Commons@Georgia Southern

Missing Data in the Regulation of Medical Devices

Author: Buyse M.
Gene Pennello
Gregory Campbell
Hagdu A.
Lilly Yue
Pepe M.
Schatzkin A.
Tanner M.
Publication venue: 'Informa UK Limited'
Publication date
Field of study

Crossref

Bayesian Analysis of Diagnostic Test Accuracy When Disease State is Unverified for Some Subjects

Author: Carlin B. P.
Drum D. E.
Gene A. Pennello
Johnson W. O.
Joseph L.
Molenberghs G.
Spiegelhalter D. J.
Publication venue: 'Informa UK Limited'
Publication date
Field of study

Crossref

Statistical Considerations on Subgroup Analysis in Clinical Trials

Author: Estelle Russek-Cohen
Fraser Smith
Gene Pennello
Kathleen Fritsch
Kooros Mahjoob
Lilly Yue
Mark Rothmann
Mohamed Alosh
Mohammad Huque
Permutt T.
Stephen Wilson
———
———
———
———
———
———
———
Publication venue: 'Informa UK Limited'
Publication date
Field of study

Crossref

Desirability of Outcome Ranking (DOOR) and Response Adjusted for Duration of Antibiotic Risk (RADAR)

Author: Chambers Henry F
Chuang-Stein Christy
Cosgrove Sara E
Evans Scott R
Follmann Dean
Fowler Vance G
Huskins W Charles
Lautenbach Ebbing
Pennello Gene
Powers John H
Rubin Daniel
Schoenfeld David
Publication venue: eScholarship, University of California
Publication date: 25/06/2015
Field of study

Clinical trials that compare strategies to optimize antibiotic use are of critical importance but are limited by competing risks that distort outcome interpretation, complexities of noninferiority trials, large sample sizes, and inadequate evaluation of benefits and harms at the patient level. The Antibacterial Resistance Leadership Group strives to overcome these challenges through innovative trial design. Response adjusted for duration of antibiotic risk (RADAR) is a novel methodology utilizing a superiority design and a 2-step process: (1) categorizing patients into an overall clinical outcome (based on benefits and harms), and (2) ranking patients with respect to a desirability of outcome ranking (DOOR). DOORs are constructed by assigning higher ranks to patients with (1) better overall clinical outcomes and (2) shorter durations of antibiotic use for similar overall clinical outcomes. DOOR distributions are compared between antibiotic use strategies. The probability that a randomly selected patient will have a better DOOR if assigned to the new strategy is estimated. DOOR/RADAR represents a new paradigm in assessing the risks and benefits of new strategies to optimize antibiotic use

PubMed Central

eScholarship - University of California

Statistical issues in the comparison of quantitative imaging biomarker algorithms using pulmonary nodule volume as an example

Author: Andrew J Buckler
Anthony P Reeves
Gene Pennello
Huiman X Barnhart
Hyun J (Grace) Kim
Jayashree Kalpathy-Cramer
Nancy A Obuchowski
Sidak Z
Taylor JR
Xiao-Feng Wang
Publication venue: 'SAGE Publications'
Publication date
Field of study

Crossref

Desirability of Outcome Ranking (DOOR) and Response Adjusted for Days of Antibiotic Risk (RADAR).

Author: Chip Chambers Henry F
Chuang-Stein Christy
Cosgrove Sara E
Evans Scott R
Follmann Dean
Fowler Vance G
Huskins W Charles
Lautenbach Ebbing
Pennello Gene
Powers John H
Rubin Daniel
Schoenfeld David
Publication venue: Health Sciences Research Commons
Publication date: 25/06/2015
Field of study

George Washington University: Health Sciences Research Commons (HSRC)