19 research outputs found
Sample Size Analysis for Machine Learning Clinical Validation Studies
Background: Before integrating new machine learning (ML) into clinical practice, algorithms must undergo validation. Validation studies require sample size estimates. Unlike hypothesis testing studies seeking a p-value, the goal of validating predictive models is obtaining estimates of model performance. There is no standard tool for determining sample size estimates for clinical validation studies for machine learning models. Methods: Our open-source method, Sample Size Analysis for Machine Learning (SSAML) was described and was tested in three previously published models: brain age to predict mortality (Cox Proportional Hazard), COVID hospitalization risk prediction (ordinal regression), and seizure risk forecasting (deep learning). Results: Minimum sample sizes were obtained in each dataset using standardized criteria. Discussion: SSAML provides a formal expectation of precision and accuracy at a desired confidence level. SSAML is open-source and agnostic to data type and ML model. It can be used for clinical validation studies of ML models
Recommended from our members
A big data approach to the development of mixed‐effects models for seizure count data
ObjectiveOur objective was to develop a generalized linear mixed model for predicting seizure count that is useful in the design and analysis of clinical trials. This model also may benefit the design and interpretation of seizure-recording paradigms. Most existing seizure count models do not include children, and there is currently no consensus regarding the most suitable model that can be applied to children and adults. Therefore, an additional objective was to develop a model that accounts for both adult and pediatric epilepsy.MethodsUsing data from SeizureTracker.com, a patient-reported seizure diary tool with >1.2 million recorded seizures across 8 years, we evaluated the appropriateness of Poisson, negative binomial, zero-inflated negative binomial, and modified negative binomial models for seizure count data based on minimization of the Bayesian information criterion. Generalized linear mixed-effects models were used to account for demographic and etiologic covariates and for autocorrelation structure. Holdout cross-validation was used to evaluate predictive accuracy in simulating seizure frequencies.ResultsFor both adults and children, we found that a negative binomial model with autocorrelation over 1 day was optimal. Using holdout cross-validation, the proposed model was found to provide accurate simulation of seizure counts for patients with up to four seizures per day.SignificanceThe optimal model can be used to generate more realistic simulated patient data with very few input parameters. The availability of a parsimonious, realistic virtual patient model can be of great utility in simulations of phase II/III clinical trials, epilepsy monitoring units, outpatient biosensors, and mobile Health (mHealth) applications
Recommended from our members
Does accounting for seizure frequency variability increase clinical trial power?
ObjectiveSeizure frequency variability is associated with placebo responses in randomized controlled trials (RCT). Increased variability can result in drug misclassification and, hence, decreased statistical power. We investigated a new method that directly incorporated variability into RCT analysis, ZV.MethodsTwo models were assessed: the traditional 50%-responder rate (RR50), and the variability-corrected score, ZV. Each predicted seizure frequency upper and lower limits using prior seizures. Accuracy was defined as percentage of time-intervals when the observed seizure frequencies were within the predicted limits. First, we tested the ZV method on three datasets (SeizureTracker: n=3016, Human Epilepsy Project: n=107, and NeuroVista: n=15). An additional independent SeizureTracker validation dataset was used to generate a set of 200 simulated trials each for 5 different sample sizes (total N=100 to 500 by 100), assuming 20% dropout and 30% drug efficacy. "Power" was determined as the percentage of trials successfully distinguishing placebo from drug (p<0.05).ResultsPrediction accuracy across datasets was, ZV: 91-100%, RR50: 42-80%. Simulated RCT ZV analysis achieved >90% power at N=100 per arm while RR50 required N=200 per arm.SignificanceZV may increase the statistical power of an RCT relative to the traditional RR50
Does accounting for seizure frequency variability increase clinical trial power?
Objective: Seizure frequency variability is associated with placebo responses in randomized controlled trials (RCT). Increased variability can result in drug misclassification and, hence, decreased statistical power. We investigated a new method that directly incorporated variability into RCT analysis, Z(V).
Methods: Two models were assessed: the traditional 50%-responder rate (RR50), and the variability-corrected score, Z(V). Each predicted seizure frequency upper and lower limits using prior seizures. Accuracy was defined as percentage of time-intervals when the observed seizure frequencies were within the predicted limits. First, we tested the Z(V) method on three datasets (SeizureTracker: n = 3016, Human Epilepsy Project: n = 107, and NeuroVista: n = 15). An additional independent SeizureTracker validation dataset was used to generate a set of 200 simulated trials each for 5 different sample sizes (total N = 100 to 500 by 100), assuming 20% dropout and 30% drug efficacy. "Power" was determined as the percentage of trials successfully distinguishing placebo from drug (p < 0.05).
Results: Prediction accuracy across datasets was, Z(V): 91-100%, RR50: 42-80%. Simulated RCT Z(V) analysis achieved > 90% power at N = 100 per arm while RR50 required N = 200 per arm.
Significance: Z(V), may increase the statistical power of an RCT relative to the traditional RR50
Guidelines for Conducting Ethical Artificial Intelligence Research in Neurology: A Systematic Approach for Clinicians and Researchers.
Preemptive recognition of the ethical implications of study design and algorithm choices in artificial intelligence (AI) research is an important but challenging process. AI applications have begun to transition from a promising future to clinical reality in neurology. As the clinical management of neurology is often concerned with discrete, often unpredictable, and highly consequential events linked to multimodal data streams over long timescales, forthcoming advances in AI have great potential to transform care for patients. However, critical ethical questions have been raised with implementation of the first AI applications in clinical practice. Clearly, AI will have far-reaching potential to promote, but also to endanger, ethical clinical practice. This article employs an anticipatory ethics approach to scrutinize how researchers in neurology can methodically identify ethical ramifications of design choices early in the research and development process, with a goal of preempting unintended consequences that may violate principles of ethical clinical care. First, we discuss the use of a systematic framework for researchers to identify ethical ramifications of various study design and algorithm choices. Second, using epilepsy as a paradigmatic example, anticipatory clinical scenarios that illustrate unintended ethical consequences are discussed, and failure points in each scenario evaluated. Third, we provide practical recommendations for understanding and addressing ethical ramifications early in methods development stages. Awareness of the ethical implications of study design and algorithm choices that may unintentionally enter AI is crucial to ensuring that incorporation of AI into neurology care leads to patient benefit rather than harm
Individualizing the definition of seizure clusters based on temporal clustering analysis.
OBJECTIVE
Seizure clusters are often encountered in people with poorly controlled epilepsy. Detection of seizure clusters is currently based on simple clinical rules, such as two seizures separated by four or fewer hours or multiple seizures in 24 h. Current definitions fail to distinguish between statistically significant clusters and those that may result from natural variation in the person's seizures. Ability to systematically define when a seizure cluster is significant for the individual carries major implications for treatment. However, there is no uniform consensus on how to define seizure clusters. This study proposes a principled statistical approach to defining seizure clusters that addresses these issues.
METHODS
A total of 533,968 clinical seizures from 1,748 people with epilepsy in the Seizure Tracker™ seizure diary database were used for algorithm development. We propose an algorithm for automated individualized seizure cluster identification combining cumulative sum change-point analysis with bootstrapping and aberration detection, which provides a new approach to personalized seizure cluster identification at user-specified levels of clinical significance. We develop a standalone user interface to make the proposed algorithm accessible for real-time seizure cluster identification (ClusterCalc™). Clinical impact of systematizing cluster identification is demonstrated by comparing empirically-defined clusters to those identified by routine seizure cluster definitions. We also demonstrate use of the Hurst exponent as a standardized measure of seizure clustering for comparison of seizure clustering burden within or across patients.
RESULTS
Seizure clustering was present in 26.7 % (95 % CI, 24.5-28.7 %) of people with epilepsy. Empirical tables were provided for standardizing inter- and intra-patient comparisons of seizure cluster tendency. Using the proposed algorithm, we found that 37.7-59.4 % of seizures identified as clusters based on routine definitions had high probability of occurring by chance. Several clusters identified by the algorithm were missed by conventional definitions. The utility of the ClusterCalc algorithm for individualized seizure cluster detection is demonstrated.
SIGNIFICANCE
This study proposes a principled statistical approach to individualized seizure cluster identification and demonstrates potential for real-time clinical usage through ClusterCalc. Using this approach accounts for individual variations in baseline seizure frequency and evaluates statistical significance. This new definition has the potential to improve individualized epilepsy treatment by systematizing identification of unrecognized seizure clusters and preventing unnecessary intervention for random events previously considered clusters