12 research outputs found

    Interpretable Multi-Task Deep Neural Networks for Dynamic Predictions of Postoperative Complications

    Full text link
    Accurate prediction of postoperative complications can inform shared decisions between patients and surgeons regarding the appropriateness of surgery, preoperative risk-reduction strategies, and postoperative resource use. Traditional predictive analytic tools are hindered by suboptimal performance and usability. We hypothesized that novel deep learning techniques would outperform logistic regression models in predicting postoperative complications. In a single-center longitudinal cohort of 43,943 adult patients undergoing 52,529 major inpatient surgeries, deep learning yielded greater discrimination than logistic regression for all nine complications. Predictive performance was strongest when leveraging the full spectrum of preoperative and intraoperative physiologic time-series electronic health record data. A single multi-task deep learning model yielded greater performance than separate models trained on individual complications. Integrated gradients interpretability mechanisms demonstrated the substantial importance of missing data. Interpretable, multi-task deep neural networks made accurate, patient-level predictions that harbor the potential to augment surgical decision-making

    Surgical resident experience with common bile duct exploration and assessment of performance and autonomy with formative feedback

    Get PDF
    Background Common bile duct exploration (CBDE) is safe and effective for managing choledocholithiasis, but most US general surgeons have limited experience with CBDE and are uncomfortable performing this procedure in practice. Surgical trainee exposure to CBDE is limited, and their learning curve for achieving autonomous, practice-ready performance has not been previously described. This study tests the hypothesis that receipt of one or more prior CBDE operative performance assessments, combined with formative feedback, is associated with greater resident operative performance and autonomy. Methods Resident and attending assessments of resident operative performance and autonomy were obtained for 189 laparoscopic or open CBDEs performed at 28 institutions. Performance and autonomy were graded along validated ordinal scales. Cases in which the resident had one or more prior CBDE case evaluations (n = 48) were compared with cases in which the resident had no prior evaluations (n = 141). Results Compared with cases in which the resident had no prior CBDE case evaluations, cases with a prior evaluation had greater proportions of practice-ready or exceptional performance ratings according to both residents (27% vs. 11%, p = .009) and attendings (58% vs. 19%, p < .001) and had greater proportions of passive help or supervision only autonomy ratings according to both residents (17% vs. 4%, p = .009) and attendings (69% vs. 32%, p < .01). Conclusions Residents with at least one prior CBDE evaluation and formative feedback demonstrated better operative performance and received greater autonomy than residents without prior evaluations, underscoring the propensity of feedback to help residents achieve autonomous, practice-ready performance for rare operations

    Dynamic predictions of postoperative complications from explainable, uncertainty-aware, and multi-task deep neural networks

    No full text
    Abstract Accurate prediction of postoperative complications can inform shared decisions regarding prognosis, preoperative risk-reduction, and postoperative resource use. We hypothesized that multi-task deep learning models would outperform conventional machine learning models in predicting postoperative complications, and that integrating high-resolution intraoperative physiological time series would result in more granular and personalized health representations that would improve prognostication compared to preoperative predictions. In a longitudinal cohort study of 56,242 patients undergoing 67,481 inpatient surgical procedures at a university medical center, we compared deep learning models with random forests and XGBoost for predicting nine common postoperative complications using preoperative, intraoperative, and perioperative patient data. Our study indicated several significant results across experimental settings that suggest the utility of deep learning for capturing more precise representations of patient health for augmented surgical decision support. Multi-task learning improved efficiency by reducing computational resources without compromising predictive performance. Integrated gradients interpretability mechanisms identified potentially modifiable risk factors for each complication. Monte Carlo dropout methods provided a quantitative measure of prediction uncertainty that has the potential to enhance clinical trust. Multi-task learning, interpretability mechanisms, and uncertainty metrics demonstrated potential to facilitate effective clinical implementation

    Ideal algorithms in healthcare: Explainable, dynamic, precise, autonomous, fair, and reproducible.

    No full text
    Established guidelines describe minimum requirements for reporting algorithms in healthcare; it is equally important to objectify the characteristics of ideal algorithms that confer maximum potential benefits to patients, clinicians, and investigators. We propose a framework for ideal algorithms, including 6 desiderata: explainable (convey the relative importance of features in determining outputs), dynamic (capture temporal changes in physiologic signals and clinical events), precise (use high-resolution, multimodal data and aptly complex architecture), autonomous (learn with minimal supervision and execute without human input), fair (evaluate and mitigate implicit bias and social inequity), and reproducible (validated externally and prospectively and shared with academic communities). We present an ideal algorithms checklist and apply it to highly cited algorithms. Strategies and tools such as the predictive, descriptive, relevant (PDR) framework, the Standard Protocol Items: Recommendations for Interventional Trials-Artificial Intelligence (SPIRIT-AI) extension, sparse regression methods, and minimizing concept drift can help healthcare algorithms achieve these objectives, toward ideal algorithms in healthcare

    Evaluation of federated learning variations for COVID-19 diagnosis using chest radiographs from 42 US and European hospitals

    No full text
    Objective: Federated learning (FL) allows multiple distributed data holders to collaboratively learn a shared model without data sharing. However, individual health system data are heterogeneous. "Personalized" FL variations have been developed to counter data heterogeneity, but few have been evaluated using real-world healthcare data. The purpose of this study is to investigate the performance of a single-site versus a 3-client federated model using a previously described Coronavirus Disease 19 (COVID-19) diagnostic model. Additionally, to investigate the effect of system heterogeneity, we evaluate the performance of 4 FL variations. Materials and methods: We leverage a FL healthcare collaborative including data from 5 international healthcare systems (US and Europe) encompassing 42 hospitals. We implemented a COVID-19 computer vision diagnosis system using the Federated Averaging (FedAvg) algorithm implemented on Clara Train SDK 4.0. To study the effect of data heterogeneity, training data was pooled from 3 systems locally and federation was simulated. We compared a centralized/pooled model, versus FedAvg, and 3 personalized FL variations (FedProx, FedBN, and FedAMP). Results: We observed comparable model performance with respect to internal validation (local model: AUROC 0.94 vs FedAvg: 0.95, P = .5) and improved model generalizability with the FedAvg model (P < .05). When investigating the effects of model heterogeneity, we observed poor performance with FedAvg on internal validation as compared to personalized FL algorithms. FedAvg did have improved generalizability compared to personalized FL algorithms. On average, FedBN had the best rank performance on internal and external validation. Conclusion: FedAvg can significantly improve the generalization of the model compared to other personalization FL algorithms; however, at the cost of poor internal validity. Personalized FL may offer an opportunity to develop both internal and externally validated algorithms
    corecore