2,214 research outputs found

    Predicting Pancreatic Cancer Using Support Vector Machine

    Get PDF
    This report presents an approach to predict pancreatic cancer using Support Vector Machine Classification algorithm. The research objective of this project it to predict pancreatic cancer on just genomic, just clinical and combination of genomic and clinical data. We have used real genomic data having 22,763 samples and 154 features per sample. We have also created Synthetic Clinical data having 400 samples and 7 features per sample in order to predict accuracy of just clinical data. To validate the hypothesis, we have combined synthetic clinical data with subset of features from real genomic data. In our results, we observed that prediction accuracy, precision, recall with just genomic data is 80.77%, 20%, 4%. Prediction accuracy, precision, recall with just synthetic clinical data is 93.33%, 95%, 30%. While prediction accuracy, precision, recall for combination of real genomic and synthetic clinical data is 90.83%, 10%, 5%. The combination of real genomic and synthetic clinical data decreased the accuracy since the genomic data is weakly correlated. Thus we conclude that the combination of genomic and clinical data does not improve pancreatic cancer prediction accuracy. A dataset with more significant genomic features might help to predict pancreatic cancer more accurately

    Modular Clinical Decision Support Networks (MoDN)-Updatable, interpretable, and portable predictions for evolving clinical environments.

    Get PDF
    Clinical Decision Support Systems (CDSS) have the potential to improve and standardise care with probabilistic guidance. However, many CDSS deploy static, generic rule-based logic, resulting in inequitably distributed accuracy and inconsistent performance in evolving clinical environments. Data-driven models could resolve this issue by updating predictions according to the data collected. However, the size of data required necessitates collaborative learning from analogous CDSS's, which are often imperfectly interoperable (IIO) or unshareable. We propose Modular Clinical Decision Support Networks (MoDN) which allow flexible, privacy-preserving learning across IIO datasets, as well as being robust to the systematic missingness common to CDSS-derived data, while providing interpretable, continuous predictive feedback to the clinician. MoDN is a novel decision tree composed of feature-specific neural network modules that can be combined in any number or combination to make any number or combination of diagnostic predictions, updatable at each step of a consultation. The model is validated on a real-world CDSS-derived dataset, comprising 3,192 paediatric outpatients in Tanzania. MoDN significantly outperforms 'monolithic' baseline models (which take all features at once at the end of a consultation) with a mean macro F1 score across all diagnoses of 0.749 vs 0.651 for logistic regression and 0.620 for multilayer perceptron (p < 0.001). To test collaborative learning between IIO datasets, we create subsets with various percentages of feature overlap and port a MoDN model trained on one subset to another. Even with only 60% common features, fine-tuning a MoDN model on the new dataset or just making a composite model with MoDN modules matched the ideal scenario of sharing data in a perfectly interoperable setting. MoDN integrates into consultation logic by providing interpretable continuous feedback on the predictive potential of each question in a CDSS questionnaire. The modular design allows it to compartmentalise training updates to specific features and collaboratively learn between IIO datasets without sharing any data

    Mining Oncology Data: Knowledge Discovery in Clinical Performance of Cancer Patients

    Get PDF
    Our goal in this research is twofold: to develop clinical performance databases of cancer patients, and to conduct data mining and machine learning studies on collected patient records. We use these studies to develop models for predicting cancer patient medical outcomes. The clinical database is developed in conjunction with surgeons and oncologists at UMass Memorial Hospital. Aspects of the database design and representation of patient narrative are discussed here. Current predictive model design in medical literature is dominated by linear and logistic regression techniques. We seek to show that novel machine learning methods can perform as well or better than these traditional techniques. Our machine learning focus for this thesis is on pancreatic cancer patients. Classification and regression prediction targets include patient survival, wellbeing scores, and disease characteristics. Information research in oncology is often constrained by type variation, missing attributes, high dimensionality, skewed class distribution, and small data sets. We compensate for these difficulties using preprocessing, meta-learning, and other algorithmic methods during data analysis. The predictive accuracy and regression error of various machine learning models are presented as results, as are t-tests comparing these to the accuracy of traditional regression methods. In most cases, it is shown that the novel machine learning prediction methods offer comparable or superior performance. We conclude with an analysis of results and discussion of future research possibilities

    A Machine Learning Decision Support System (DSS) for Neuroendocrine Tumor Patients Treated with Somatostatin Analog (SSA) Therapy

    Get PDF
    The application of machine learning (ML) techniques could facilitate the identification of predictive biomarkers of somatostatin analog (SSA) efficacy in patients with neuroendocrine tumors (NETs). We collected data from 74 patients with a pancreatic or gastrointestinal NET who received SSA as first-line therapy. We developed three classification models to predict whether the patient would experience a progressive disease (PD) after 12 or 18 months based on clinic-pathological factors at the baseline. The dataset included 70 samples and 15 features. We initially developed three classification models with accuracy ranging from 55% to 70%. We then compared ten different ML algorithms. In all but one case, the performance of the Multinomial Naive Bayes algorithm (80%) was the highest. The support vector machine classifier (SVC) had a higher performance for the recall metric of the progression-free outcome (97% vs. 94%). Overall, for the first time, we documented that the factors that mainly influenced progression-free survival (PFS) included age, the number of metastatic sites and the primary site. In addition, the following factors were also isolated as important: adverse events G3-G4, sex, Ki67, metastatic site (liver), functioning NET, the primary site and the stage. In patients with advanced NETs, ML provides a predictive model that could potentially be used to differentiate prognostic groups and to identify patients for whom SSA therapy as a single agent may not be sufficient to achieve a long-lasting PFS

    Conditional Tabular Generative Adversarial Net for Enhancing Ensemble Classifiers in Sepsis Diagnosis

    Get PDF
    Antibiotic-resistant bacteria have proliferated at an alarming rate as a result of the extensive use of antibiotics and the paucity of new medication research. The possibility that an antibiotic-resistant bacterial infection would progress to sepsis is one of the major collateral problems affecting people with this condition. 31,000 lives were lost due to sepsis in England with costs about two billion pounds annually. This research aims to develop and evaluate several classification approaches to improve predicting sepsis and reduce the tendency of underdiagnosis in computer-aided predictive tools. This research employs medical data sets for patients diagnosed with sepsis, it analyses the efficacy of ensemble machine learning techniques compared to non ensemble machine learning techniques and the significance of data balancing and Conditional Tabular Generative Adversarial Nets for data augmentation in producing reliable diagnosis. The average F Score obtained by the non-ensemble models trained in this paper is 0.83 compared to the ensemble techniques average of 0.94. Nonensemble techniques, such as Decision Tree, achieved an F score of 0.90, an AUC of 0.90 and an accuracy of 90%. Histogram-based Gradient Boosting Classification Tree achieved an F score of 0.96, an AUC of 0.96 and an accuracy of 95%, surpassing the other models tested. Additionally, when compared to the current state of the art sepsis prediction models, the models developed in this study demonstrated higher average performance in all metrics, indicating reduced bias and improved robustness through data balancing and Conditional Tabular Generative Adversarial Nets for data augmentation. The study revealed that data balancing and augmentation on the ensemble machine learning algorithms boost the efficacy of clinical predictive models and can help clinics decide which data types are most important when examining patients and diagnosing sepsis early through intelligent human-machine interface
    corecore