18 research outputs found

    Sample Size Analysis for Machine Learning Clinical Validation Studies

    No full text
    Background: Before integrating new machine learning (ML) into clinical practice, algorithms must undergo validation. Validation studies require sample size estimates. Unlike hypothesis testing studies seeking a p-value, the goal of validating predictive models is obtaining estimates of model performance. There is no standard tool for determining sample size estimates for clinical validation studies for machine learning models. Methods: Our open-source method, Sample Size Analysis for Machine Learning (SSAML) was described and was tested in three previously published models: brain age to predict mortality (Cox Proportional Hazard), COVID hospitalization risk prediction (ordinal regression), and seizure risk forecasting (deep learning). Results: Minimum sample sizes were obtained in each dataset using standardized criteria. Discussion: SSAML provides a formal expectation of precision and accuracy at a desired confidence level. SSAML is open-source and agnostic to data type and ML model. It can be used for clinical validation studies of ML models

    Is seizure frequency variance a predictable quantity?

    No full text

    Does accounting for seizure frequency variability increase clinical trial power?

    No full text
    Objective: Seizure frequency variability is associated with placebo responses in randomized controlled trials (RCT). Increased variability can result in drug misclassification and, hence, decreased statistical power. We investigated a new method that directly incorporated variability into RCT analysis, Z(V). Methods: Two models were assessed: the traditional 50%-responder rate (RR50), and the variability-corrected score, Z(V). Each predicted seizure frequency upper and lower limits using prior seizures. Accuracy was defined as percentage of time-intervals when the observed seizure frequencies were within the predicted limits. First, we tested the Z(V) method on three datasets (SeizureTracker: n = 3016, Human Epilepsy Project: n = 107, and NeuroVista: n = 15). An additional independent SeizureTracker validation dataset was used to generate a set of 200 simulated trials each for 5 different sample sizes (total N = 100 to 500 by 100), assuming 20% dropout and 30% drug efficacy. "Power" was determined as the percentage of trials successfully distinguishing placebo from drug (p < 0.05). Results: Prediction accuracy across datasets was, Z(V): 91-100%, RR50: 42-80%. Simulated RCT Z(V) analysis achieved > 90% power at N = 100 per arm while RR50 required N = 200 per arm. Significance: Z(V), may increase the statistical power of an RCT relative to the traditional RR50

    Guidelines for Conducting Ethical Artificial Intelligence Research in Neurology: A Systematic Approach for Clinicians and Researchers.

    No full text
    Preemptive recognition of the ethical implications of study design and algorithm choices in artificial intelligence (AI) research is an important but challenging process. AI applications have begun to transition from a promising future to clinical reality in neurology. As the clinical management of neurology is often concerned with discrete, often unpredictable, and highly consequential events linked to multimodal data streams over long timescales, forthcoming advances in AI have great potential to transform care for patients. However, critical ethical questions have been raised with implementation of the first AI applications in clinical practice. Clearly, AI will have far-reaching potential to promote, but also to endanger, ethical clinical practice. This article employs an anticipatory ethics approach to scrutinize how researchers in neurology can methodically identify ethical ramifications of design choices early in the research and development process, with a goal of preempting unintended consequences that may violate principles of ethical clinical care. First, we discuss the use of a systematic framework for researchers to identify ethical ramifications of various study design and algorithm choices. Second, using epilepsy as a paradigmatic example, anticipatory clinical scenarios that illustrate unintended ethical consequences are discussed, and failure points in each scenario evaluated. Third, we provide practical recommendations for understanding and addressing ethical ramifications early in methods development stages. Awareness of the ethical implications of study design and algorithm choices that may unintentionally enter AI is crucial to ensuring that incorporation of AI into neurology care leads to patient benefit rather than harm

    Individualizing the definition of seizure clusters based on temporal clustering analysis.

    No full text
    OBJECTIVE Seizure clusters are often encountered in people with poorly controlled epilepsy. Detection of seizure clusters is currently based on simple clinical rules, such as two seizures separated by four or fewer hours or multiple seizures in 24 h. Current definitions fail to distinguish between statistically significant clusters and those that may result from natural variation in the person's seizures. Ability to systematically define when a seizure cluster is significant for the individual carries major implications for treatment. However, there is no uniform consensus on how to define seizure clusters. This study proposes a principled statistical approach to defining seizure clusters that addresses these issues. METHODS A total of 533,968 clinical seizures from 1,748 people with epilepsy in the Seizure Tracker™ seizure diary database were used for algorithm development. We propose an algorithm for automated individualized seizure cluster identification combining cumulative sum change-point analysis with bootstrapping and aberration detection, which provides a new approach to personalized seizure cluster identification at user-specified levels of clinical significance. We develop a standalone user interface to make the proposed algorithm accessible for real-time seizure cluster identification (ClusterCalc™). Clinical impact of systematizing cluster identification is demonstrated by comparing empirically-defined clusters to those identified by routine seizure cluster definitions. We also demonstrate use of the Hurst exponent as a standardized measure of seizure clustering for comparison of seizure clustering burden within or across patients. RESULTS Seizure clustering was present in 26.7 % (95 % CI, 24.5-28.7 %) of people with epilepsy. Empirical tables were provided for standardizing inter- and intra-patient comparisons of seizure cluster tendency. Using the proposed algorithm, we found that 37.7-59.4 % of seizures identified as clusters based on routine definitions had high probability of occurring by chance. Several clusters identified by the algorithm were missed by conventional definitions. The utility of the ClusterCalc algorithm for individualized seizure cluster detection is demonstrated. SIGNIFICANCE This study proposes a principled statistical approach to individualized seizure cluster identification and demonstrates potential for real-time clinical usage through ClusterCalc. Using this approach accounts for individual variations in baseline seizure frequency and evaluates statistical significance. This new definition has the potential to improve individualized epilepsy treatment by systematizing identification of unrecognized seizure clusters and preventing unnecessary intervention for random events previously considered clusters

    Prospective validation study of an epilepsy seizure risk system for outpatient evaluation

    No full text
    ObjectiveWe conducted clinical testing of an automated Bayesian machine learning&nbsp;algorithm (Epilepsy Seizure Assessment Tool [EpiSAT]) for outpatient seizure risk assessment using seizure counting data, and validated performance against specialized epilepsy clinician&nbsp;experts.MethodsWe conducted a prospective longitudinal study of EpiSAT performance against 24 specialized clinician experts&nbsp;at three tertiary referral epilepsy centers in the United States. Accuracy, interrater reliability, and intra-rater reliability of EpiSAT&nbsp;for correctly identifying&nbsp;changes in&nbsp;seizure risk (improvements, worsening, or no change) were evaluated using 120 seizures from four synthetic seizure diaries (seizure risk known) and 120 seizures from four real seizure diaries (seizure risk unknown). The proportion of observed agreement between EpiSAT and clinicians was evaluated to assess compatibility of EpiSAT with clinical decision patterns by epilepsy experts.ResultsEpiSAT exhibited substantial observed agreement (75.4%) with clinicians for assessing seizure risk. The mean accuracy of epilepsy providers for correctly assessing seizure risk was 74.7%. EpiSAT accurately identified seizure risk in 87.5% of seizure diary entries, corresponding to a significant improvement of 17.4% (P&nbsp;=&nbsp;.002). Clinicians exhibited low-to-moderate interrater reliability for seizure risk assessment (Krippendorff's α&nbsp;=&nbsp;0.46) with good intrarater reliability across a 4- to 12-week evaluation period (Scott's π&nbsp;=&nbsp;0.89).SignificanceThese results validate the ability of EpiSAT to yield objective clinical recommendations on seizure risk which follow decision patterns similar to those from specialized epilepsy providers, but with improved accuracy and reproducibility. This algorithm may serve as a useful clinical decision support system for quantitative analysis of clinical seizure frequency in clinical epilepsy practice
    corecore