8 research outputs found

    Med-BERT: pre-trained contextualized embeddings on large-scale structured electronic health records for disease prediction

    Get PDF
    Deep learning (DL) based predictive models from electronic health records (EHR) deliver impressive performance in many clinical tasks. Large training cohorts, however, are often required to achieve high accuracy, hindering the adoption of DL-based models in scenarios with limited training data size. Recently, bidirectional encoder representations from transformers (BERT) and related models have achieved tremendous successes in the natural language processing domain. The pre-training of BERT on a very large training corpus generates contextualized embeddings that can boost the performance of models trained on smaller datasets. We propose Med-BERT, which adapts the BERT framework for pre-training contextualized embedding models on structured diagnosis data from 28,490,650 patients EHR dataset. Fine-tuning experiments are conducted on two disease-prediction tasks: (1) prediction of heart failure in patients with diabetes and (2) prediction of pancreatic cancer from two clinical databases. Med-BERT substantially improves prediction accuracy, boosting the area under receiver operating characteristics curve (AUC) by 2.02-7.12%. In particular, pre-trained Med-BERT substantially improves the performance of tasks with very small fine-tuning training sets (300-500 samples) boosting the AUC by more than 20% or equivalent to the AUC of 10 times larger training set. We believe that Med-BERT will benefit disease-prediction studies with small local training datasets, reduce data collection expenses, and accelerate the pace of artificial intelligence aided healthcare.Comment: L.R., X.Y., and Z.X. share first authorship of this wor

    Representation of EHR data for predictive modeling: a comparison between UMLS and other terminologies.

    Get PDF
    OBJECTIVE: Predictive disease modeling using electronic health record data is a growing field. Although clinical data in their raw form can be used directly for predictive modeling, it is a common practice to map data to standard terminologies to facilitate data aggregation and reuse. There is, however, a lack of systematic investigation of how different representations could affect the performance of predictive models, especially in the context of machine learning and deep learning. MATERIALS AND METHODS: We projected the input diagnoses data in the Cerner HealthFacts database to Unified Medical Language System (UMLS) and 5 other terminologies, including CCS, CCSR, ICD-9, ICD-10, and PheWAS, and evaluated the prediction performances of these terminologies on 2 different tasks: the risk prediction of heart failure in diabetes patients and the risk prediction of pancreatic cancer. Two popular models were evaluated: logistic regression and a recurrent neural network. RESULTS: For logistic regression, using UMLS delivered the optimal area under the receiver operating characteristics (AUROC) results in both dengue hemorrhagic fever (81.15%) and pancreatic cancer (80.53%) tasks. For recurrent neural network, UMLS worked best for pancreatic cancer prediction (AUROC 82.24%), second only (AUROC 85.55%) to PheWAS (AUROC 85.87%) for dengue hemorrhagic fever prediction. DISCUSSION/CONCLUSION: In our experiments, terminologies with larger vocabularies and finer-grained representations were associated with better prediction performances. In particular, UMLS is consistently 1 of the best-performing ones. We believe that our work may help to inform better designs of predictive models, although further investigation is warranted

    Recalage d'images avec la corrélation d'images basée sur la méthode de Fourier

    No full text
    Image registration is an important technique in many computer vision applications, such as image fusion, object tracking, face recognition, change detection, etc. Registration of multi-date images is based on primitive space, similarity measure, search and optimization strategy. Each component plays a fundamental role in estimating the best spatial transformation, which has a direct impact on the robustness and accuracy of these methods. In this paper, we will be discussing classical and recent image registration methods, including their fundamental principles. This review provides a comprehensive reference resource for researchers involved in image registration with Fourier-based image correlation by describing Fourier-based image correlation methods, describing existing subpixel techniques in the frequency domain, and summarizing comparative studies of subpixel techniques. Keywords: sub-pixel registration, matching, phase correlation, Fourier transformLe recalage d'images est une technique importante dans de nombreuses applications de vision par ordinateur, telles que la fusion d'images, le suivi d'objets, la reconnaissance de visages, la dĂ©tection de changements, etc. Les composantes principales du processus de recalage Ă  savoir l’espace des primitives, la mesure de similaritĂ©, la stratĂ©gie de recherche et d'optimisation, jouent un rĂŽle fondamental dans l’estimation de la meilleure transformation spatiale pour recaler les images multi-dates, qui influence directement la prĂ©cision et la robustesse de ces mĂ©thodes. Cet article se concentre principalement sur les mĂ©thodes classique et rĂ©centes de recalage d’images, y compris les principes fondamentaux. L'objectif spĂ©cifique de cette revue consiste Ă  dĂ©crire les mĂ©thodes de corrĂ©lation d'images basĂ©es sur la mĂ©thode de Fourier, d'exposer les mĂ©thodes sub-pixellique existantes dans le domaine frĂ©quentiel et d'esquisser un rĂ©sumĂ© sur les Ă©tudes comparatives des mĂ©thodes sub-pixelliques de fournir une source de rĂ©fĂ©rence complĂšte aux chercheurs impliquĂ©s dans le recalage d'images avec la corrĂ©lation d'images basĂ©e sur la mĂ©thode de Fourier. Mots clĂ©s: recalage sub-pixellique, mise en correspondance, corrĂ©lation de phase, transformĂ©e de Fourie

    Automatic Sub-Pixel Co-Registration of Remote Sensing Images Using Phase Correlation and Harris Detector

    No full text
    In this paper, we propose a new approach for sub-pixel co-registration based on Fourier phase correlation combined with the Harris detector. Due to the limitation of the standard phase correlation method to achieve only pixel-level accuracy, another approach is required to reach sub-pixel matching precision. We first applied the Harris corner detector to extract corners from both references and sensed images. Then, we identified their corresponding points using phase correlation between the image pairs. To achieve sub-pixel registration accuracy, two optimization algorithms were used. The effectiveness of the proposed method was tested with very high-resolution (VHR) remote sensing images, including Pleiades satellite images and aerial imagery. Compared with the speeded-up robust features (SURF)-based method, phase correlation with the Blackman window function produced 91% more matches with high reliability. Moreover, the results of the optimization analysis have revealed that Nelder–Mead algorithm performs better than the two-point step size gradient algorithm regarding localization accuracy and computation time. The proposed approach achieves better accuracy than 0.5 pixels and outperforms the speeded-up robust features (SURF)-based method. It can achieve sub-pixel accuracy in the presence of noise and produces large numbers of correct matching points

    Time-sensitive clinical concept embeddings learned from large electronic health records

    No full text
    Abstract Background Learning distributional representation of clinical concepts (e.g., diseases, drugs, and labs) is an important research area of deep learning in the medical domain. However, many existing relevant methods do not consider temporal dependencies along the longitudinal sequence of a patient’s records, which may lead to incorrect selection of contexts. Methods To address this issue, we extended three popular concept embedding learning methods: word2vec, positive pointwise mutual information (PPMI) and FastText, to consider time-sensitive information. We then trained them on a large electronic health records (EHR) database containing about 50 million patients to generate concept embeddings and evaluated them for both intrinsic evaluations focusing on concept similarity measure and an extrinsic evaluation to assess the use of generated concept embeddings in the task of predicting disease onset. Results Our experiments show that embeddings learned from information within one visit (time window zero) improve performance on the concept similarity measure and the FastText algorithm usually had better performance than the other two algorithms. For the predictive modeling task, the optimal result was achieved by word2vec embeddings with a 30-day sliding window. Conclusions Considering time constraints are important in training clinical concept embeddings. We expect they can benefit a series of downstream applications

    Vasopressor treatment and mortality following nontraumatic subarachnoid hemorrhage: a nationwide electronic health record analysis

    No full text
    OBJECTIVE: Subarachnoid hemorrhage (SAH) is a devastating cerebrovascular condition, not only due to the effect of initial hemorrhage, but also due to the complication of delayed cerebral ischemia (DCI). While hypertension facilitated by vasopressors is often initiated to prevent DCI, which vasopressor is most effective in improving outcomes is not known. The objective of this study was to determine associations between initial vasopressor choice and mortality in patients with nontraumatic SAH. METHODS: The authors conducted a retrospective cohort study using a large, national electronic medical record data set from 2000-2014 to identify patients with a new diagnosis of nontraumatic SAH (based on ICD-9 codes) who were treated with the vasopressors dopamine, phenylephrine, or norepinephrine. The relationship between the initial choice of vasopressor therapy and the primary outcome, which was defined as in-hospital death or discharge to hospice care, was examined. RESULTS: In total, 2634 patients were identified with nontraumatic SAH who were treated with a vasopressor. In this cohort, the average age was 56.5 years, 63.9% were female, and 36.5% of patients developed the primary outcome. The incidence of the primary outcome was higher in those initially treated with either norepinephrine (47.6%) or dopamine (50.6%) than with phenylephrine (24.5%). After adjusting for possible confounders using propensity score methods, the adjusted OR of the primary outcome was higher with dopamine (OR 2.19, 95% CI 1.70-2.81) and norepinephrine (OR 2.24, 95% CI 1.80-2.80) compared with phenylephrine. Sensitivity analyses using different variable selection procedures, causal inference models, and machine-learning methods confirmed the main findings. CONCLUSIONS: In patients with nontraumatic SAH, phenylephrine was significantly associated with reduced mortality in SAH patients compared to dopamine or norepinephrine. Prospective randomized clinical studies are warranted to confirm this finding

    Dynamic Prognosis Prediction for Patients on DAPT After Drug‐Eluting Stent Implantation: Model Development and Validation

    No full text
    Background The rapid evolution of artificial intelligence (AI) in conjunction with recent updates in dual antiplatelet therapy (DAPT) management guidelines emphasizes the necessity for innovative models to predict ischemic or bleeding events after drug‐eluting stent implantation. Leveraging AI for dynamic prediction has the potential to revolutionize risk stratification and provide personalized decision support for DAPT management. Methods and Results We developed and validated a new AI‐based pipeline using retrospective data of drug‐eluting stent‐treated patients, sourced from the Cerner Health Facts data set (n=98 236) and Optum's de‐identified Clinformatics Data Mart Database (n=9978). The 36 months following drug‐eluting stent implantation were designated as our primary forecasting interval, further segmented into 6 sequential prediction windows. We evaluated 5 distinct AI algorithms for their precision in predicting ischemic and bleeding risks. Model discriminative accuracy was assessed using the area under the receiver operating characteristic curve, among other metrics. The weighted light gradient boosting machine stood out as the preeminent model, thus earning its place as our AI‐DAPT model. The AI‐DAPT demonstrated peak accuracy in the 30 to 36 months window, charting an area under the receiver operating characteristic curve of 90% [95% CI, 88%–92%] for ischemia and 84% [95% CI, 82%–87%] for bleeding predictions. Conclusions Our AI‐DAPT excels in formulating iterative, refined dynamic predictions by assimilating ongoing updates from patients' clinical profiles, holding value as a novel smart clinical tool to facilitate optimal DAPT duration management with high accuracy and adaptability
    corecore