208 research outputs found

    Predicting Pancreatic Cancer Using Support Vector Machine

    Get PDF
    This report presents an approach to predict pancreatic cancer using Support Vector Machine Classification algorithm. The research objective of this project it to predict pancreatic cancer on just genomic, just clinical and combination of genomic and clinical data. We have used real genomic data having 22,763 samples and 154 features per sample. We have also created Synthetic Clinical data having 400 samples and 7 features per sample in order to predict accuracy of just clinical data. To validate the hypothesis, we have combined synthetic clinical data with subset of features from real genomic data. In our results, we observed that prediction accuracy, precision, recall with just genomic data is 80.77%, 20%, 4%. Prediction accuracy, precision, recall with just synthetic clinical data is 93.33%, 95%, 30%. While prediction accuracy, precision, recall for combination of real genomic and synthetic clinical data is 90.83%, 10%, 5%. The combination of real genomic and synthetic clinical data decreased the accuracy since the genomic data is weakly correlated. Thus we conclude that the combination of genomic and clinical data does not improve pancreatic cancer prediction accuracy. A dataset with more significant genomic features might help to predict pancreatic cancer more accurately

    Predicting the risk of cancer in adults using supervised machine learning: a scoping review

    Get PDF
    OBJECTIVES: The purpose of this scoping review is to: (1) identify existing supervised machine learning (ML) approaches on the prediction of cancer in asymptomatic adults; (2) to compare the performance of ML models with each other and (3) to identify potential gaps in research. DESIGN: Scoping review using the population, concept and context approach. SEARCH STRATEGY: PubMed search engine was used from inception to 10 November 2020 to identify literature meeting following inclusion criteria: (1) a general adult (≥18 years) population, either sex, asymptomatic (population); (2) any study using ML techniques to derive predictive models for future cancer risk using clinical and/or demographic and/or basic laboratory data (concept) and (3) original research articles conducted in all settings in any region of the world (context). RESULTS: The search returned 627 unique articles, of which 580 articles were excluded because they did not meet the inclusion criteria, were duplicates or were related to benign neoplasm. Full-text reviews were conducted for 47 articles and a final set of 10 articles were included in this scoping review. These 10 very heterogeneous studies used ML to predict future cancer risk in asymptomatic individuals. All studies reported area under the receiver operating characteristics curve (AUC) values as metrics of model performance, but no study reported measures of model calibration. CONCLUSIONS: Research gaps that must be addressed in order to deliver validated ML-based models to assist clinical decision-making include: (1) establishing model generalisability through validation in independent cohorts, including those from low-income and middle-income countries; (2) establishing models for all cancer types; (3) thorough comparisons of ML models with best available clinical tools to ensure transparency of their potential clinical utility; (4) reporting of model calibration performance and (5) comparisons of different methods on the same cohort to reveal important information about model generalisability and performance

    Secondary use of Structured Electronic Health Records Data: From Observational Studies to Deep Learning-based Predictive Modeling

    Get PDF
    With the wide adoption of electronic health records (EHRs), researchers, as well as large healthcare organizations, governmental institutions, insurance, and pharmaceutical companies have been interested in leveraging this rich clinical data source to extract clinical evidence and develop predictive algorithms. Large vendors have been able to compile structured EHR data from sites all over the United States, de-identify these data, and make them available to data science researchers in a more usable format. For this dissertation, we leveraged one of the earliest and largest secondary EHR data sources and conducted three studies of increasing scope. In the first study, which was of limited scope, we conducted a retrospective observational study to compare the effect of three drugs on a specific population of approximately 3,000 patients. Using a novel statistical method, we found evidence that the selection of phenylephrine as the primary vasopressor to induce hypertension for the management of nontraumatic subarachnoid hemorrhage is associated with better outcomes as compared to selecting norepinephrine or dopamine. In the second study, we widened our scope, using a cohort of more than 100,000 patients to train generalizable models for the risk prediction of specific clinical events, such as heart failure in diabetes patients or pancreatic cancer. In this study, we found that recurrent neural network-based predictive models trained on expressive terminologies, which preserve a high level of granularity, are associated with better prediction performance as compared with other baseline methods, such as logistic regression. Finally, we widened our scope again, to train Med-BERT, a foundation model, on more than 20 million patients’ diagnosis data. Med-BERT was found to improve the prediction performance of downstream tasks that have a small sample size, which otherwise would limit the ability of the model to learn good representation. In conclusion, we found that we can extract useful information and train helpful deep learning-based predictive models. Given the limitations of secondary EHR data and taking into consideration that the data were originally collected for administrative and not research purposes, however, the findings need clinical validation. Therefore, clinical trials are warranted to further validate any new evidence extracted from such data sources before updating clinical practice guidelines. The implementability of the developed predictive models, which are in an early development phase, also warrants further evaluation

    Predicting in-hospital death from derived EHR trajectory features

    Get PDF
    Medical histories of patients can provide insight into the immediate future of a patient. While most studies propose to predict survival from vital signs and hospital tests within one episode of care, we carry out selective feature engineering from longitudinal historical medical records in this study to develop a dataset with derived features. We then train multiple machine learning models for the binary prediction whether an episode of care will culminate in death among patients suspected of bloodstream infections. The machine learning classifier performance is evaluated and compared and the feature importance impacting the model output is explored. The findings indicated that the logistic regression model achieved the best performance for predicting death in the next hospital episode with an accuracy of 98% and an almost perfect area under the receiver operating characteristic curve. Exploring the feature importance reveals that time to and severity of the last episode and previous history of sepsis episodes were the most critical features

    A prognostic Bayesian network that makes personalized predictions of poor prognostic outcome post resection of pancreatic ductal adenocarcinoma

    Get PDF
    Background The narrative surrounding the management of potentially resectable pancreatic cancer is complex. Surgical resection is the only potentially curative treatment. However resection rates are low, the risk of operative morbidity and mortality are high, and survival outcomes remain poor. The aim of this study was to create a prognostic Bayesian network that pre-operatively makes personalized predictions of post-resection survival time of 12months or less and also performs post-operative prognostic updating. Methods A Bayesian network was created by synthesizing data from PubMed post-resection survival analysis studies through a two-stage weighting process. Input variables included: inflammatory markers, tumour factors, tumour markers, patient factors and, if applicable, response to neoadjuvant treatment for pre-operative predictions. Prognostic updating was performed by inclusion of post-operative input variables including: pathology results and adjuvant therapy. Results 77 studies (n = 31,214) were used to create the Bayesian network, which was validated against a prospectively maintained tertiary referral centre database (n = 387). For pre-operative predictions an Area Under the Curve (AUC) of 0.7 (P value: 0.001; 95% CI 0.589–0.801) was achieved accepting up to 4 missing data-points in the dataset. For prognostic updating an AUC 0.8 (P value: 0.000; 95% CI:0.710–0.870) was achieved when validated against a dataset with up to 6 missing pre-operative, and 0 missing post-operative data-points. This dropped to AUC: 0.7 (P value: 0.000; 95% CI:0.667–0.818) when the post-operative validation dataset had up to 2 missing data-points. Conclusion This Bayesian network is currently unique in the way it utilizes PubMed and patient level data to translate the existing empirical evidence surrounding potentially resectable pancreatic cancer to make personalized prognostic predictions. We believe such a tool is vital in facilitating better shared decision-making in clinical practice and could be further developed to offer a vehicle for delivering personalized precision medicine in the future

    Clinical Data Reuse or Secondary Use: Current Status and Potential Future Progress

    Get PDF
    Objective: To perform a review of recent research in clinical data reuse or secondary use, and envision future advances in this field. Methods: The review is based on a large literature search in MEDLINE (through PubMed), conference proceedings, and the ACM Digital Library, focusing only on research published between 2005 and early 2016. Each selected publication was reviewed by the authors, and a structured analysis and summarization of its content was developed. Results: The initial search produced 359 publications, reduced after a manual examination of abstracts and full publications. The following aspects of clinical data reuse are discussed: motivations and challenges, privacy and ethical concerns, data integration and interoperability, data models and terminologies, unstructured data reuse, structured data mining, clinical practice and research integration, and examples of clinical data reuse (quality measurement and learning healthcare systems). Conclusion: Reuse of clinical data is a fast-growing field recognized as essential to realize the potentials for high quality healthcare, improved healthcare management, reduced healthcare costs, population health management, and effective clinical research

    Explainable AI for clinical risk prediction: a survey of concepts, methods, and modalities

    Full text link
    Recent advancements in AI applications to healthcare have shown incredible promise in surpassing human performance in diagnosis and disease prognosis. With the increasing complexity of AI models, however, concerns regarding their opacity, potential biases, and the need for interpretability. To ensure trust and reliability in AI systems, especially in clinical risk prediction models, explainability becomes crucial. Explainability is usually referred to as an AI system's ability to provide a robust interpretation of its decision-making logic or the decisions themselves to human stakeholders. In clinical risk prediction, other aspects of explainability like fairness, bias, trust, and transparency also represent important concepts beyond just interpretability. In this review, we address the relationship between these concepts as they are often used together or interchangeably. This review also discusses recent progress in developing explainable models for clinical risk prediction, highlighting the importance of quantitative and clinical evaluation and validation across multiple common modalities in clinical practice. It emphasizes the need for external validation and the combination of diverse interpretability methods to enhance trust and fairness. Adopting rigorous testing, such as using synthetic datasets with known generative factors, can further improve the reliability of explainability methods. Open access and code-sharing resources are essential for transparency and reproducibility, enabling the growth and trustworthiness of explainable research. While challenges exist, an end-to-end approach to explainability in clinical risk prediction, incorporating stakeholders from clinicians to developers, is essential for success
    • …
    corecore