9 research outputs found

    Evaluation of the informatician perspective: determining types of research papers preferred by clinicians

    No full text
    Abstract Background To deliver evidence-based medicine, clinicians often reference resources that are useful to their respective medical practices. Owing to their busy schedules, however, clinicians typically find it challenging to locate these relevant resources out of the rapidly growing number of journals and articles currently being published. The literature-recommender system may provide a possible solution to this issue if the individual needs of clinicians can be identified and applied. Methods We thus collected from the CiteULike website a sample of 96 clinicians and 6,221 scientific articles that they read. We examined the journal distributions, publication types, reading times, and geographic locations. We then compared the distributions of MeSH terms associated with these articles with those of randomly sampled MEDLINE articles using two-sample Z-test and multiple comparison correction, in order to identify the important topics relevant to clinicians. Results We determined that the sampled clinicians followed the latest literature in a timely manner and read papers that are considered landmarks in medical research history. They preferred to read scientific discoveries from human experiments instead of molecular-, cellular- or animal-model-based experiments. Furthermore, the country of publication may impact reading preferences, particularly for clinicians from Egypt, India, Norway, Senegal, and South Africa. Conclusion These findings provide useful guidance for developing personalized literature-recommender systems for clinicians

    A Machine Learning Approach to Real‐World Time to Treatment Discontinuation Prediction

    No full text
    Real‐world time to treatment discontinuation (rwTTD) is an important endpoint measurement of drug efficacy evaluated using real‐world observational data. rwTTD, represented as a set of metrics calculated from a population‐wise curve, cannot be predicted by existing machine learning approaches. Herein, a methodology that enables predicting rwTTD is developed. First, the robust performance of the model in predicting rwTTD across populations of similar or distinct properties with simulated data using a variety of commonly used base learners in machine learning is demonstrated. Then, the robust performance of the approach both within‐cohort and cross‐disease using real‐world observational data of pembrolizumab for advanced lung cancer and head neck cancer is demonstrated. This study establishes a generic pipeline for real‐world time on treatment prediction, which can be extended to any base machine learners and drugs. Currently, there is no existing machine learning approach established for predicting population‐wise rwTTD, despite that it is an essential metric to report real‐world drug efficacy. Therefore, we believe our study opens a new investigation area of rwTTD prediction, and provides an innovative approach to probe this problem and other problems involving population‐wise predictions. An interactive preprint version of the article can be found at: https://doi.org/10.22541/au.166065465.59798123/v1

    Serendipity—A Machine-Learning Application for Mining Serendipitous Drug Usage From Social Media

    No full text

    Additional file 1: Table S1. of Evaluation of the informatician perspective: determining types of research papers preferred by clinicians

    No full text
    Country of residence distributions of the clinicians. Table S2. Publication type distributions of the articles read by the clinicians. Table S3. Times of the journals read by the clinicians in each medical specialty group. Table S4. List of major MeSH terms having significant different frequencies between clinicians’ reading libraries and a random sample Table S5. List of minor MeSH terms having significant different frequencies between clinicians’ reading libraries and a random sample. (XLS 313 kb

    Comparison of Machine Learning Algorithms for Predicting Hospital Readmissions and Worsening Heart Failure Events in Patients With Heart Failure With Reduced Ejection Fraction: Modeling Study

    No full text
    BackgroundHeart failure (HF) is highly prevalent in the United States. Approximately one-third to one-half of HF cases are categorized as HF with reduced ejection fraction (HFrEF). Patients with HFrEF are at risk of worsening HF, have a high risk of adverse outcomes, and experience higher health care use and costs. Therefore, it is crucial to identify patients with HFrEF who are at high risk of subsequent events after HF hospitalization. ObjectiveMachine learning (ML) has been used to predict HF-related outcomes. The objective of this study was to compare different ML prediction models and feature construction methods to predict 30-, 90-, and 365-day hospital readmissions and worsening HF events (WHFEs). MethodsWe used the Veradigm PINNACLE outpatient registry linked to Symphony Health’s Integrated Dataverse data from July 1, 2013, to September 30, 2017. Adults with a confirmed diagnosis of HFrEF and HF-related hospitalization were included. WHFEs were defined as HF-related hospitalizations or outpatient intravenous diuretic use within 1 year of the first HF hospitalization. We used different approaches to construct ML features from clinical codes, including frequencies of clinical classification software (CCS) categories, Bidirectional Encoder Representations From Transformers (BERT) trained with CCS sequences (BERT + CCS), BERT trained on raw clinical codes (BERT + raw), and prespecified features based on clinical knowledge. A multilayer perceptron neural network, extreme gradient boosting (XGBoost), random forest, and logistic regression prediction models were applied and compared. ResultsA total of 30,687 adult patients with HFrEF were included in the analysis; 11.41% (3184/27,917) of adults experienced a hospital readmission within 30 days of their first HF hospitalization, and nearly half (9231/21,562, 42.81%) of the patients experienced at least 1 WHFE within 1 year after HF hospitalization. The prediction models and feature combinations with the best area under the receiver operating characteristic curve (AUC) for each outcome were XGBoost with CCS frequency (AUC=0.595) for 30-day readmission, random forest with CCS frequency (AUC=0.630) for 90-day readmission, XGBoost with CCS frequency (AUC=0.649) for 365-day readmission, and XGBoost with CCS frequency (AUC=0.640) for WHFEs. Our ML models could discriminate between readmission and WHFE among patients with HFrEF. Our model performance was mediocre, especially for the 30-day readmission events, most likely owing to limitations of the data, including an imbalance between positive and negative cases and high missing rates of many clinical variables and outcome definitions. ConclusionsWe predicted readmissions and WHFEs after HF hospitalizations in patients with HFrEF. Features identified by data-driven approaches may be comparable with those identified by clinical domain knowledge. Future work may be warranted to validate and improve the models using more longitudinal electronic health records that are complete, are comprehensive, and have a longer follow-up time

    Improving preeclampsia risk prediction by modeling pregnancy trajectories from routinely collected electronic medical record data

    No full text
    Abstract Preeclampsia is a heterogeneous and complex disease associated with rising morbidity and mortality in pregnant women and newborns in the US. Early recognition of patients at risk is a pressing clinical need to reduce the risk of adverse outcomes. We assessed whether information routinely collected in electronic medical records (EMR) could enhance the prediction of preeclampsia risk beyond what is achieved in standard of care assessments. We developed a digital phenotyping algorithm to curate 108,557 pregnancies from EMRs across the Mount Sinai Health System, accurately reconstructing pregnancy journeys and normalizing these journeys across different hospital EMR systems. We then applied machine learning approaches to a training dataset (N = 60,879) to construct predictive models of preeclampsia across three major pregnancy time periods (ante-, intra-, and postpartum). The resulting models predicted preeclampsia with high accuracy across the different pregnancy periods, with areas under the receiver operating characteristic curves (AUC) of 0.92, 0.82, and 0.89 at 37 gestational weeks, intrapartum and postpartum, respectively. We observed comparable performance in two independent patient cohorts. While our machine learning approach identified known risk factors of preeclampsia (such as blood pressure, weight, and maternal age), it also identified other potential risk factors, such as complete blood count related characteristics for the antepartum period. Our model not only has utility for earlier identification of patients at risk for preeclampsia, but given the prediction accuracy exceeds what is currently achieved in clinical practice, our model provides a path for promoting personalized precision therapeutic strategies for patients at risk
    corecore