239 research outputs found

    Applications of Machine Learning in Cancer Prediction and Prognosis

    Get PDF
    Machine learning is a branch of artificial intelligence that employs a variety of statistical, probabilistic and optimization techniques that allows computers to “learn” from past examples and to detect hard-to-discern patterns from large, noisy or complex data sets. This capability is particularly well-suited to medical applications, especially those that depend on complex proteomic and genomic measurements. As a result, machine learning is frequently used in cancer diagnosis and detection. More recently machine learning has been applied to cancer prognosis and prediction. This latter approach is particularly interesting as it is part of a growing trend towards personalized, predictive medicine. In assembling this review we conducted a broad survey of the different types of machine learning methods being used, the types of data being integrated and the performance of these methods in cancer prediction and prognosis. A number of trends are noted, including a growing dependence on protein biomarkers and microarray data, a strong bias towards applications in prostate and breast cancer, and a heavy reliance on “older” technologies such artificial neural networks (ANNs) instead of more recently developed or more easily interpretable machine learning methods. A number of published studies also appear to lack an appropriate level of validation or testing. Among the better designed and validated studies it is clear that machine learning methods can be used to substantially (15–25%) improve the accuracy of predicting cancer susceptibility, recurrence and mortality. At a more fundamental level, it is also evident that machine learning is also helping to improve our basic understanding of cancer development and progression

    The risk of re-intervention after endovascular aortic aneurysm repair

    Get PDF
    This thesis studies survival analysis techniques dealing with censoring to produce predictive tools that predict the risk of endovascular aortic aneurysm repair (EVAR) re-intervention. Censoring indicates that some patients do not continue follow up, so their outcome class is unknown. Methods dealing with censoring have drawbacks and cannot handle the high censoring of the two EVAR datasets collected. Therefore, this thesis presents a new solution to high censoring by modifying an approach that was incapable of differentiating between risks groups of aortic complications. Feature selection (FS) becomes complicated with censoring. Most survival FS methods depends on Cox's model, however machine learning classifiers (MLC) are preferred. Few methods adopted MLC to perform survival FS, but they cannot be used with high censoring. This thesis proposes two FS methods which use MLC to evaluate features. The two FS methods use the new solution to deal with censoring. They combine factor analysis with greedy stepwise FS search which allows eliminated features to enter the FS process. The first FS method searches for the best neural networks' configuration and subset of features. The second approach combines support vector machines, neural networks, and K nearest neighbor classifiers using simple and weighted majority voting to construct a multiple classifier system (MCS) for improving the performance of individual classifiers. It presents a new hybrid FS process by using MCS as a wrapper method and merging it with the iterated feature ranking filter method to further reduce the features. The proposed techniques outperformed FS methods based on Cox's model such as; Akaike and Bayesian information criteria, and least absolute shrinkage and selector operator in the log-rank test's p-values, sensitivity, and concordance. This proves that the proposed techniques are more powerful in correctly predicting the risk of re-intervention. Consequently, they enable doctors to set patients’ appropriate future observation plan

    Stage-Specific Predictive Models for Cancer Survivability

    Get PDF
    Survivability of cancer strongly depends on the stage of cancer. In most previous works, machine learning survivability prediction models for a particular cancer, were trained and evaluated together on all stages of the cancer. In this work, we trained and evaluated survivability prediction models for five major cancers, together on all stages and separately for every stage. We named these models joint and stage-specific models respectively. The obtained results for the cancers which we investigated reveal that, the best model to predict the survivability of the cancer for one specific stage is the model which is specifically built for that stage. Additionally, we saw that for every stage of cancer, the most important features to predict survivability, differed from other stages. By evaluating the models separately on different stages we found that their performance differed on different stages. We also found that evaluating the models together on all stages, as was done in past, is misleading because it overestimates performance

    Data Mining Technique for Breast Cancer Prediction using Fuzzy Theory

    Get PDF
    In order to find a reliable approach of breast cancer prediction, Data mining methods are used in the studies provided in this article. This study compares multiple patient clinical data in order to find a reliable model that can predict the occurrence of breast cancer. In this article, the support vector machine (SVM), artificial neural network (ANN), naive bayes classifier, and AdaBoost tree are used as four data mining methods. Furthermore, since it has such a significant impact on the efficacy and efficiency of the learning process, feature space is extensively examined in this work. Combining PCA with other data mining algorithms that use a PCA-like technique to compress the feature space is recommended. This hybrid is intended to assess the effect of feature space reduction. Wisconsin Breast Cancer Database (1991) and Wisconsin Diagnostic Breast Cancer (1995) are two frequently used test data sets that are used to assess the effectiveness of these algorithms. To calculate each model's test error, the method of 10-fold cross-validation is used. The findings of this research show a thorough trade-off between these tactics and also provide a thorough assessment of the models. In practical applications, it is anticipated that feature identification results would help to avoid breast cancer for both doctors and patients

    Machine Learning na previsão de Cancro Colorretal em função de alteraçÔes metabólicas

    Get PDF
    No mundo atual, a quantidade de informação disponĂ­vel nos mais variados setores Ă© cada vez maior. É o caso da ĂĄrea da saĂșde, onde a recolha e tratamento de dados biomĂ©dicos procuram melhorar a tomada de decisĂŁo no tratamento a aplicar a um doente, recorrendo a ferramentas baseadas em Machine Learning. Machine Learning Ă© uma ĂĄrea da InteligĂȘncia Artificial em que atravĂ©s da aplicação de algoritmos a um conjunto de dados Ă© possĂ­vel prever resultados ou atĂ© descobrir relaçÔes entre estes que seriam impercetĂ­veis Ă  primeira vista. Com este projeto pretende-se realizar um estudo em que o objetivo Ă© investigar diversos algoritmos e tĂ©cnicas de Machine Learning, de modo a identificar se o perfil de acilcarnitinas pode constituir um novo marcador bioquĂ­mico para a predição e prognĂłstico do Cancro Colorretal. No decurso do trabalho, foram testados diferentes algoritmos e tĂ©cnicas de prĂ©-processamento de dados. Foram realizadas trĂȘs experiĂȘncias distintas com o objetivo de validar as previsĂ”es dos modelos construĂ­dos para diferentes cenĂĄrios, nomeadamente: prever se o paciente tem Cancro Colorretal, prever qual a doença que o paciente tem (Cancro Colorretal e outras doenças metabĂłlicas) e prever se este tem ou nĂŁo alguma doença. Numa primeira anĂĄlise, os modelos desenvolvidos apresentam bons resultados na triagem de Cancro Colorretal. Os melhores resultados foram obtidos pelos algoritmos Random Forest e Gradient Boosting, em conjunto com tĂ©cnicas de balanceamento dos dados e Feature Selection, nomeadamente Random Oversampling, Synthetic Oversampling e Recursive Feature SelectionIn todayÂŽs world, the amount of information available in various sectors is increasing. That is the case in the healthcare area, where the collection and treatment of biochemical data seek to improve the decision-making in the treatment to be applied to a patient, using Machine Learning-based tools. Machine learning is an area of Artificial Intelligence in which applying algorithms to a dataset makes it possible to predict results or even discover relationships that would be unnoticeable at first glance. This project’s main objective is to study several algorithms and techniques of Machine Learning to identify if the acylcarnitine profile may constitute a new biochemical marker for the prediction and prognosis of rectal cancer. In the course of the work, different algorithms and data preprocessing techniques were tested. Three different experiments were carried out to validate the predictions of the models built for different scenarios, namely: predicting whether the patient has Colorectal Cancer, predicting which disease the patient has (Colorectal Cancer and other metabolic diseases) and predicting whether he has any disease. As a first analysis, the developed models showed good results in Colorectal Cancer screening. The best results were obtained by the Random Forest and Gradient Boosting algorithms, together with data balancing and feature selection techniques, namely Random Oversampling, Synthetic Oversampling and Recursive Feature Selectio

    Feature selection through validation and un-censoring of endovascular repair survival data for predicting the risk of re-intervention

    Get PDF
    Background: Feature selection (FS) process is essential in the medical area as it reduces the effort and time needed for physicians to measure unnecessary features. Choosing useful variables is a difficult task with the presence of censoring which is the unique characteristic in survival analysis. Most survival FS methods depend on Cox's proportional hazard model; however, machine learning techniques (MLT) are preferred but not commonly used due to censoring. Techniques that have been proposed to adopt MLT to perform FS with survival data cannot be used with the high level of censoring. The researcher's previous publications proposed a technique to deal with the high level of censoring. It also used existing FS techniques to reduce dataset dimension. However, in this paper a new FS technique was proposed and combined with feature transformation and the proposed uncensoring approaches to select a reduced set of features and produce a stable predictive model. Methods: In this paper, a FS technique based on artificial neural network (ANN) MLT is proposed to deal with highly censored Endovascular Aortic Repair (EVAR). Survival data EVAR datasets were collected during 2004 to 2010 from two vascular centers in order to produce a final stable model. They contain almost 91% of censored patients. The proposed approach used a wrapper FS method with ANN to select a reduced subset of features that predict the risk of EVAR re-intervention after 5 years to patients from two different centers located in the United Kingdom, to allow it to be potentially applied to cross-centers predictions. The proposed model is compared with the two popular FS techniques; Akaike and Bayesian information criteria (AIC, BIC) that are used with Cox's model. Results: The final model outperforms other methods in distinguishing the high and low risk groups; as they both have concordance index and estimated AUC better than the Cox's model based on AIC, BIC, Lasso, and SCAD approaches. These models have p-values lower than 0.05, meaning that patients with different risk groups can be separated significantly and those who would need re-intervention can be correctly predicted. Conclusion: The proposed approach will save time and effort made by physicians to collect unnecessary variables. The final reduced model was able to predict the long-term risk of aortic complications after EVAR. This predictive model can help clinicians decide patients' future observation plan

    An Integrated Approach for Cancer Survival Prediction Using Data Mining Techniques

    Get PDF
    Ovarian cancer is the third most common gynecologic cancers worldwide. Advanced ovarian cancer patients bear a significant mortality rate. Survival estimation is essential for clinicians and patients to understand better and tolerate future outcomes. The present study intends to investigate different survival predictors available for cancer prognosis using data mining techniques. Dataset of 140 advanced ovarian cancer patients containing data from different data profiles (clinical, treatment, and overall life quality) has been collected and used to foresee cancer patients’ survival. Attributes from each data profile have been processed accordingly. Clinical data has been prepared corresponding to missing values and outliers. Treatment data including varying time periods were created using sequence mining techniques to identify the treatments given to the patients. And lastly, different comorbidities were combined into a single factor by computing Charlson Comorbidity Index for each patient. After appropriate preprocessing, the integrated dataset is classified using appropriate machine learning algorithms. The proposed integrated model approach gave the highest accuracy of 76.4% using ensemble technique with sequential pattern mining including time intervals of 2 months between treatments. Thus, the treatment sequences and, most importantly, life quality attributes significantly contribute to the survival prediction of cancer patients

    Prognosis for Degenerative Cervical Myelopathy: A Computer Learning Approach on the AOspine Database

    Get PDF
    Description du problĂšme : La myĂ©lopathie cervicale dĂ©gĂ©nĂ©rative (MCD) est une condition particuliĂšre liĂ©e Ă  l’ñge touchant environ 600 adultes par million Ă  travers le monde [1]. Elle rĂ©sulte d’une compression spontanĂ©e de la moelle Ă©piniĂšre, causĂ©e par une excroissance os-seuse d’une vertĂšbre cervicale ou bien d’un des disques intervertĂ©braux. Dans les deux cas, la moelle se retrouve comprimĂ©e et le patient commence Ă  perdre sensations et contrĂŽle mo-teur. Cette maladie a un trĂšs fort impact socio-Ă©conomique. En e˙et, les patients perdent graduellement l’usage de leurs membres, les empĂȘchant de vivre et de travailler au fur et Ă  mesure que la compression augmente [2] [3]. Cette maladie est principalement diagnostiquĂ©e Ă  l’aide d’évaluations cliniques du contrĂŽle moteur et l’utilisation d’imagerie par rĂ©sonance magnĂ©tique. Le problĂšme reste toutefois com-plexe, car la compression peut ĂȘtre asymptomatique, et de faibles compressions sont encore diĂżciles Ă  dĂ©tecter avec les mĂ©thodes d’imagerie actuelle. Certains patients peuvent attendre de long mois avant d’avoir un diagnostic [4]. Fort heureusement, une mĂ©thode existe afin de limiter la future perte de contrĂŽle moteur : il s’agit de la chirurgie dĂ©compressive. Cette chirurgie peut prendre plusieurs formes et est dĂ©cidĂ©e au cas par cas. Il n’existe pas encore de consensus sur les dĂ©tails et les approches de telles opĂ©rations. Toutefois, cette opĂ©ration compte de trĂšs nombreux risques (p. ex., pa-ralysie C5) [3]. La moelle Ă©piniĂšre est en e˙et une zone sensible, et le chirurgien doit aller travailler au plus proche de cette autoroute nerveuse vitale. De nombreuses complications peuvent suivre. Bien que le rĂ©sultat possible semble excĂ©der les risques, la rĂ©alitĂ© est bien di˙érente, car seulement 40 % des opĂ©rations ont un rĂ©el impact sur le rĂ©tablissement des patients.Le pronostic postopĂ©ratoire est une tĂąche compliquĂ©e et l’opĂ©ration est donc choi-sie par dĂ©faut. La question demeure sur la possibilitĂ© d’établir ce pronostic de maniĂšre plus prĂ©cise. Il n’existe pour l’instant que peu d’étude se penchant sur l’analyse quantitative de donnĂ©es cliniques des patients afin d’obtenir un pronostic postopĂ©ratoire. L’une des pistes d’exploration serait d’utiliser des mĂ©thodes d’intelligence artificielle afin de crĂ©er un modĂšle capable d’aider les chirurgiens dans leur prise de dĂ©cision. Cela passe par l’exploitation de donnĂ©es cliniques et IRM. Durant les annĂ©es prĂ©cĂ©dentes, seulement deux Ă©tudes sont appa-rues exploitant des donnĂ©es similaires dans un but identique. Ces Ă©tudes sont principalement centrĂ©es sur l’analyse des donnĂ©es cliniques [5] [6]. Objectifs : L’objectif de cette Ă©tude est de crĂ©er ou d’exploiter un modĂšle existant afin de vĂ©rifier si l’intelligence artificielle pourrait apporter des solutions Ă  ce problĂšme. Parmi les sous-objectifs de ce travail, le but est notamment de vĂ©rifier les hypothĂšses suivantes : ‱ L’exploitation d’IRM conjointement avec les donnĂ©es cliniques apporte de meilleurs rĂ©sultats ‱ Il est possible d’établir un modĂšle de pronostic postopĂ©ratoire pour la myĂ©lopathie cervicale dĂ©gĂ©nĂ©rative. MĂ©thodes et matĂ©riel : Nous avons Ă  notre disposition une base donnĂ©e de 759 sujets pour lesquels nous avons 135 points de donnĂ©es cliniques ainsi que, pour une partie des sujets, des images IRM de modalitĂ© T2 et T1 avec les vues axiales et sagittales. Ces informations cliniques regroupent di˙érentes donnĂ©es mĂ©dicales et courantes sur le patient telles que l’ñge, le sexe, etc... Ces donnĂ©es ont Ă©tĂ© acquises prospectivement durant une prĂ©cĂ©dente Ă©tude : AOSpine [7]. Ces donnĂ©es proviennent de di˙érents centres et o˙rent donc une bonne hĂ©tĂ©-rogĂ©nĂ©itĂ© au niveau du contraste des images, simulant correctement des donnĂ©es rĂ©elles. Cela est important pour Ă©valuer la capacitĂ© de gĂ©nĂ©ralisation du modĂšle. Ces patients sou˙rent tous de MCD et ont Ă©tĂ© opĂ©rĂ©s dans les di˙érents centres. Parmi ces donnĂ©es, nous avons di˙érents scores d’évaluation de leur capacitĂ© de contrĂŽle moteur ainsi que de leur sensation (MJOa,SF6D, . . . ). Ces scores cliniques ont Ă©tĂ© Ă©tablis avant l’opĂ©ration ainsi que 6, 12 et 24 mois aprĂšs. La di˙érence entre le score prĂ©opĂ©ratoire et le score postopĂ©ratoire servira de cible. Le score le plus important semble ĂȘtre celui Ă©tabli sur l’échelle de la modified japanese orthopedic association (MJOa). La di˙érence entre le score prĂ©opĂ©ratoire et le score postopĂ©-ratoire pourra ĂȘtre classifiĂ©e en 2 catĂ©gories selon la di˙érence minimale significative qui est de 2 points [8]. Une augmentation du score de 2 traduirait donc une amĂ©lioration de l’état des patients aprĂšs chirurgie. Les donnĂ©es contiennent Ă©galement des informations sur l’opĂ©ration subie par le patient qui ne sont a priori pas disponibles dans le cadre de pronostic prĂ©opĂ©ratoire. Toutefois, cela pourrait ĂȘtre utile pour Ă©tudier l’impact de ces donnĂ©es sur les performances de notre modĂšle prĂ©dictif. Les donnĂ©es cliniques ont tout d’abord Ă©tĂ© manuellement analysĂ©es afin d’essayer d’en ex-traire uniquement les donnĂ©es importantes ainsi que de retirer les donnĂ©es postopĂ©ratoires dans un premier temps. Cela a permis d’établir un score de base sur un modĂšle classique type «random forest» fourni dans le package «scikit-learn» de python. Plusieurs modĂšles de machine learning ont alors Ă©tĂ© testĂ©s pour Ă©tablir un score maximum atteignable avec ces donnĂ©es. Nous avons Ă©galement ajoutĂ© les donnĂ©es opĂ©ratoires dans nos modĂšles afin d’éva-luer leur impact sur les performances du modĂšle. Par la suite, di˙érents modĂšles de rĂ©seaux de neurones artificiels, principalement convolu-tionnels,ont Ă©tĂ© crĂ©Ă©es afin d’e˙ectuer l’analyse automatique des images IRM. Ces images n’étaient pas les images originales, mais le rĂ©sultat d’un prĂ©traitement Ă  l’aide de la «spinal cord toolbox» [9]. Di˙érents modĂšles furent Ă©tablis : le premier utilisait uniquement les IRM T2 sagittal, le suivant les IRM T2 et T1 sagittal, et le dernier fonctionnait les donnĂ©es T1 et T2 sagittales ainsi que les donnĂ©es cliniques. La partie du rĂ©seau traitant les donnĂ©es cliniques fut Ă©galement testĂ©e seule pour vĂ©rifier ses performances face au modĂšle de machine learning Ă©tabli Ă  l’étape prĂ©cĂ©dente. Le pipeline donne une prĂ©cision de prĂ©diction de 72,5 % ( soit une amĂ©lioration de 8 % par rapport a la baseline) avec une aire sous la courbe (ASC) de 0,73 pour le modĂšle basĂ© unique-ment sur des donnĂ©es cliniques. Toutefois, cela dĂ©pend fortement de la quantitĂ© de donnĂ©es disponibles. Les modĂšles d’apprentissage profond ont tendance Ă  overfit ou underfit les don-nĂ©es montrant un manque de gĂ©nĂ©ralisabilitĂ© du modĂšle, ce qui pourrait s’expliquer par le nombre rĂ©duit d’IRM disponibles. L’ajout des donnĂ©es extraites semble fournir au modĂšle une plus grande capacitĂ© puisque l’amĂ©lioration par rapport Ă  la baseline atteint 8 % avec une prĂ©cision de 65,2 % et une ASC de 0,69 avec moins de sujets que le premier modĂšle.----------ABSTRACT Description of the problem: Spinal injuries may impair patients’ motor control as the spinal cord represents a nervous highway connecting the brain and the limb. Some of these in-juries arise from accidents others occur progressively; such includes Degenerative Cervical Myelopathy (DCM). Cervical myelopathy is caused by the compression of the spinal cord by an outgrowth of a vertebral body or intervertebral disc, yielding symptoms such as sen-sorimotor dysfunction or pain. Cervical myelopathy is degenerative, which implies that it only gets worse as time goes by, and the compression increases. This condition is a cause of surgery among 40 adults per million yearly [1]. The diagnosis for this condition is made possible through clinical motor skill testing and magnetic resonance imaging (MRI). However, diagnosis is still a complex problem as the compression can be asymptomatic or, in some cases, not easily visible in MR images at its early stages. The diagnosis can take months, if not years, for some patients. [4] The current study proposes a decompressive surgery which aims at removing the object causing the compression, or at least, part of it. The goal here is to avoid further compression of the spinal cord and alleviate the existing ones. This operation can take various forms (anterior or posterior approach, bone graft, bone fusion) and includes many risks (e.g., C5 palsy) [3] as this touches the spinal cord, that is, highly sensitive and important to the body. The details of each surgery are then determined case by case by the surgeon as no consensus on the aspects of such operation exists [10]. This exposes a complicated scenario in predicting the outcome of the surgery. Even though the benefits seem to outweigh the risk, the operation is only successful in 35 % [11] of the cases. This success rate is based on the improvement of the patients’ sensorimotor skills. As it is complex to predict the outcome, surgery is often seen as a default option; however, there is an open question about the possibility of predicting the outcome of the surgery. To the best of my knowledge, only a handful of studies that utilize quantitative clinical data analysis in predicting post-surgery prognosis exist. One of the leading techniques is the use of data science and artificial intelligence to design a model that will be able to establish this prognosis and assist surgeons in the decision process with AO spine [7] clinical data. The first study exploring this concept was published in 2019 [5], and, since then, only two studies have been conducted to improve on the methodology [6]. These studies mainly focus on the analysis of clinical data.Objectives: Primary Goal: The primary goal of this study is to develop a model based on artificial intelligence (AI) that can be used in patient prognosis and treatment of DCM The sub-objectives for the study are: ‱ To establish the eĂżcacy of using MRI in conjunction with clinical data in providing better results in treatment and prognosis. ‱ To establish a postoperative prognostic model for DCM. Material and method: The database consists of 769 subjects, most of whom have T2 and T1 MRI images with an axial and sagittal view and 135 clinical data points. These data was acquired prospectively during the AOSpine study. These data came from numerous centers and o˙er a functional heterogeneity in image contrast, making it close to a real-life scenario. The inclusion criteria for patients used as participants in this study were to be su˙ering from DCM and have undergone surgery. Among these data, we have di˙erent clinical evaluations of their sensorimotor capacities according to various scales (MJOa, SF6D. . . ). These scores were established before the operation as well as 6, 12, and 24 months after the surgery. The goal is to predict the di˙erence between pre-surgery and post-surgery scores. We evaluated which score is the most relevant among the three measures at 6, 12, and 24 months. This target can be classified into two categories according to the minimum clinically significant di˙erence [8], which is two. The Modified Japanese Orthopedic Association (MJOA) has been used in this study to record and gauge the results. An increase of 2 points or more would, therefore, reflect an improvement in the condition of the patient after the surgery. The data also contain information from the surgery performed on the patient, which isn’t available in the preoperative prognosis. However, this may be useful to study the impact of these data on the performance of our predictive model. Three approaches were tested in order to try to improve on the current prognosis. The first one aims at exploiting clinical data, the second one aimed at leveraging deep learning method to use images as well as clinical data as input, the third one exploit features extracted from Magnetic Resonance (MR) images. Clinical data was processed through machine learning. This was done after a pre-processing step aimed at removing the non-relevant features in order to get the best results possible. To process available MRI data jointly with clinical data, two di˙erent strategies were implemented. The first one is geared towards the use of deep learning and the exploita-tion of artificial neural networks, which were fed pre-processed sagittal images and clinical data. The models used were a Resnet and a custom model to use both T1w and T2w MR im-ages. The model is based on three di˙erent modules. Two of them are similar and were used to process and encode features from the image. They were created with convolutional filter and inception modules. The second one relies on feature extraction from the axial images through a semi-automatic processing pipeline. These features were then added to the existing clinical one to improve the ability of the model to generalize to the unseen cases. All models were tested for accuracy on unseen data. Data were split between training, validation, and testing (80%,10%,10%, respectively). Results show an accuracy of 72.5 % ( 8 % improvement from the baseline) with an area under the curve (AUC) of 0.75 for the model-based solely on clinical data. This is, however, heavily dependent on the quantity of available data. Deep learning models tend to overfit or underfit the data showing a lack of generalizability from the model, which could be explained by the reduced number of available MRI. The added extracted feature seems to provide the model with valuable insight as the improvement from the baseline reaches 8 % with an accuracy of 65.2% and an AUC of 0.68 with less subject than the first model

    Advanced analytics to predict survivability of breast cancer patients

    Get PDF
    Cancer is major burden of disease worldwide. Amongst women, breast cancer is the most common cancer and primary cause of death followed by heart diseases. With increasing breast cancer cases and technological improvements, cancer care institutions and registries have collected large volumes of data in various formats. Unfortunately, these repositories are not easily accessible and the stored formats are difficult to analyze. We propose an end-to-end process through which such data can be cleansed, integrated and presented in the form of interactive dashboards. This provides a comprehensible view of over forty years of data consisting of over one million records with provisions to slice this data along several dimensions. Additionally, developing a breast cancer predictive model that predicts survival months for diagnosed patients is proposed. The outcomes of different modeling techniques along with assessing the impacts of retraining the predictive model is observed in the experimentations conducted for this research.breast cancercancerdeveloping breast cancer predictive mode
    • 

    corecore