1,555 research outputs found

    Prognostic modelling of breast cancer patients: a benchmark of predictive models with external validation

    Get PDF
    Dissertação apresentada para obtenção do Grau de Doutor em Engenharia Electrotécnica e de Computadores – Sistemas Digitais e Percepcionais pela Universidade Nova de Lisboa, Faculdade de Ciências e TecnologiaThere are several clinical prognostic models in the medical field. Prior to clinical use, the outcome models of longitudinal cohort data need to undergo a multi-centre evaluation of their predictive accuracy. This thesis evaluates the possible gain in predictive accuracy in multicentre evaluation of a flexible model with Bayesian regularisation, the (PLANN-ARD), using a reference data set for breast cancer, which comprises 4016 records from patients diagnosed during 1989-93 and reported by the BCCA, Canada, with follow-up of 10 years. The method is compared with the widely used Cox regression model. Both methods were fitted to routinely acquired data from 743 patients diagnosed during 1990-94 at the Christie Hospital, UK, with follow-up of 5 years following surgery. Methodological advances developed to support the external validation of this neural network with clinical data include: imputation of missing data in both the training and validation data sets; and a prognostic index for stratification of patients into risk groups that can be extended to non-linear models. Predictive accuracy was measured empirically with a standard discrimination index, Ctd, and with a calibration measure, using the Hosmer-Lemeshow test statistic. Both Cox regression and the PLANN-ARD model are found to have similar discrimination but the neural network showed marginally better predictive accuracy over the 5-year followup period. In addition, the regularised neural network has the substantial advantage of being suited for making predictions of hazard rates and survival for individual patients. Four different approaches to stratify patients into risk groups are also proposed, each with a different foundation. While it was found that the four methodologies broadly agree, there are important differences between them. Rules sets were extracted and compared for the two stratification methods, the log-rank bootstrap and by direct application of regression trees, and with two rule extraction methodologies, OSRE and CART, respectively. In addition, widely used clinical breast cancer prognostic indexes such as the NPI, TNM and St. Gallen consensus rules, were compared with the proposed prognostic models expressed as regression trees, concluding that the suggested approaches may enhance current practice. Finally, a Web clinical decision support system is proposed for clinical oncologists and for breast cancer patients making prognostic assessments, which is tailored to the particular characteristics of the individual patient. This system comprises three different prognostic modelling methodologies: the NPI, Cox regression modelling and PLANN-ARD. For a given patient, all three models yield a generally consistent but not identical set of prognostic indices that can be analysed together in order to obtain a consensus and so achieve a more robust prognostic assessment of the expected patient outcome

    Breast cancer data analysis for survivability studies and prediction

    Full text link
    © 2017 Elsevier B.V. Background Breast cancer is the most common cancer affecting females worldwide. Breast cancer survivability prediction is challenging and a complex research task. Existing approaches engage statistical methods or supervised machine learning to assess/predict the survival prospects of patients. Objective The main objectives of this paper is to develop a robust data analytical model which can assist in (i) a better understanding of breast cancer survivability in presence of missing data, (ii) providing better insights into factors associated with patient survivability, and (iii) establishing cohorts of patients that share similar properties. Methods Unsupervised data mining methods viz. the self-organising map (SOM) and density-based spatial clustering of applications with noise (DBSCAN) is used to create patient cohort clusters. These clusters, with associated patterns, were used to train multilayer perceptron (MLP) model for improved patient survivability analysis. A large dataset available from SEER program is used in this study to identify patterns associated with the survivability of breast cancer patients. Information gain was computed for the purpose of variable selection. All of these methods are data-driven and require little (if any) input from users or experts. Results SOM consolidated patients into cohorts of patients with similar properties. From this, DBSCAN identified and extracted nine cohorts (clusters). It is found that patients in each of the nine clusters have different survivability time. The separation of patients into clusters improved the overall survival prediction accuracy based on MLP and revealed intricate conditions that affect the accuracy of a prediction. Conclusions A new, entirely data driven approach based on unsupervised learning methods improves understanding and helps identify patterns associated with the survivability of patient. The results of the analysis can be used to segment the historical patient data into clusters or subsets, which share common variable values and survivability. The survivability prediction accuracy of a MLP is improved by using identified patient cohorts as opposed to using raw historical data. Analysis of variable values in each cohort provide better insights into survivability of a particular subgroup of breast cancer patients

    A Strategy for Stepwise Regression Procedures in Survival Analysis with Missing Covariates

    Get PDF
    The selection of variables used to predict a time to event outcome is a common and important issue when analyzing survival data. This is an essential step in accurately assessing risk factors in medical and public health studies. Ignoring an important variable in a regression model may result in biased and inefficient estimates of the outcomes. Such bias can have major implications in public health studies because it may cause potential risk factors to be falsely declared as associated with an outcome, such as mortality or conversely, be falsely declared not associated with the outcome. Stepwise regression procedures are widely used for model selection. However, they have inherent limitations, and can lead to unreasonable results when there are missing values in the potential covariates. In the first part of this dissertation, multiple imputations are used to deal with missing covariate information. We review two powerful imputation procedures, Multiple Imputation by Chain Equations (MICE) and estimation/multiple imputation for Mixed categorical and continuous data (MIX) that implement different multiple imputation methods. We compare the performance of these two procedures by assessing the bias, efficiency and robustness in several simulation studies using time to event outcomes. Practical limitations and valuable features of these two procedures are also assessed. In the second part of the dissertation, we use imputation together with a criterion called the Brier Score to formulate an overall stepwise model selection strategy. The strategy has the advantage of enabling one to perform model selection and evaluate the predictive accuracy of a selected model at the same time, all while taking into account the missing values in the covariates. This comprehensive strategy is implemented by defining the Weighted Brier Score (WBS) using weighted survival functions. We use simulations to assess this strategy and further demonstrate its use by analyzing survival data from the National Surgical Adjuvant Breast and Bowel Project (NSABP) Protocol B-06

    Stochastic Decision Modeling to Improve Breast Cancer Preventive Care

    Get PDF
    Breast cancer is a leading cause of premature mortality among women in the United States. Breast cancer screening tests can help with detecting breast cancer in early stages and thereby reducing the breast cancer mortality risk. However, due to the imperfect nature of screening tests, there is always some associated overdiagnosis, false positives, and false negatives risks. Therefore, to improve breast cancer preventive care, we defined the focus of this dissertation on modeling breast cancer screening decisions.Breast cancer overdiagnosis is the first issue that is addressed in this dissertation. Although overdiagnosis is known to be the major risk inherent in mammography screening; currently there is no way to distinguish between overdiagnosed cancers and the ones that would cause problems over a patient’s lifetime. Overdiagnosis risk significantly depends on a patient’s compliance with screening recommendations. In Chapter 2, we use a stochastic framework to perform a harm-benefit analysis to compare the overdiagnosis risk with the benefits that breast cancer screening provides. In addition, we estimate the lifetime mortality risk of breast cancer while considering the overdiagnosis risk and the uncertainty in a patient’s adherence behavior. Our results show that, although overdiagnosis rate is relatively high in breast cancer screening, the benefits of breast cancer mammography screening outweigh the overdiagnosis risk.The second issue that is addressed in this dissertation is false negative results caused by density of breast tissue. Breast density is known to increase breast cancer risk and decrease mammography screening sensitivity. Breast density notification laws, require physicians to inform women with high breast density of these potential risks. The laws usually require healthcare providers to notify patients of the possibility of using more sensitive supplemental screening tests (e.g., ultrasound). Since the enactment of the laws, there have been controversial debates over i) their implementations due to the potential radiologists bias in breast density classification of mammogram images and ii) the necessity of supplemental screenings for all patients with high breast density. Breast density is a dynamic risk factor. Therefore, in the third chapter, we apply a hidden Markov model (HMM) on a sparse unbalanced longitudinal data to quantify the yearly progression of breast density based on Breast Imaging Reporting and Data System (BI-RADs) classifications.In Chapter 4, we use the results from previous chapter to investigate the effectiveness of supplemental screening and the impact of radiologists’ bias on patients’ outcomes under the breast density notification law. We consider the conditional probability of eventually detecting breast cancer in early states given that the patient develops breast cancer in her lifetime and the expected number of supplemental tests as patient’s outcome. Our results indicate that referring patients to a supplemental test solely based on their breast density may not necessarily improve their health outcomes and other risk factors need to be considered when making such referrals. Additionally, average-skilled radiologists’ performances are shown to be comparable with the performance of a perfect radiologist

    A Comprehensive Evaluation Of Breast Cancer Risk Prediction Using Uk Biobank Data.

    Get PDF
    Introduction: Both genetic and clinical risk factors play roles in breast cancer onset. Developing comprehensive and accurate breast cancer prediction model will enable effective identifications of individuals at high risk, and facilitate personalized decision on screening and preventive strategy. Objectives: Develop a risk prediction model for breast cancer including both genetic and clinical risk factors, and quantify the benefit of adding polygenic risk score (PRS) in risk prediction. Methods: Analysis was based on data collected through the UK Biobank which consist of 13,851 breast cancer cases and 206,865 controls. The clinical factors that we considered include baseline demographics, lifestyles, family history, reproductive history, medication status and operation history. The PRS for breast cancer was constructed from 2,994,056 single nucleotide polymorphisms (SNPs) using the AnnoPred approach. The area under the curve (AUC) was used to evaluate the performance of different prediction models. The cumulative risk of breast cancer was compared between participants in different risk groups. Results: Breast cancer risk prediction based on AnnoPred derived PRS had a comparable prediction accuracy (AUC=0.646, 95%CI 0.642-0.651) with that based on all the 19 clinical factors (AUC=0.657, 95%CI 0.652-0.662). Combining PRS and clinical factors further improved the prediction accuracy (AUC=0.708, 95%CI 0.704-0.713). Based on the combined model, the estimated lifetime risk of developing breast cancer up to age 70 among individuals in the top 1% risk group (40.1%) was more than 28-folds higher than that in the bottom 1% risk group (1.4%). Conclusion: Breast cancer risk prediction based on genetic factors only can achieve comparable performance compared to that using well established risk factors, demonstrating the significant progress that has been made towards breast cancer genetics. It is important to include PRS in deriving risk prediction for personalized breast cancer screening and prevention

    Diagnostic and prognostic correlates of preoperative FDG PET for breast cancer

    Get PDF
    Purpose: To explore the preoperative utility of FDG PET for the diagnosis and prognosis in a retrospective breast cancer case series. Methods: In this retrospective study, 104 patients who had undergone a preoperative FDG PET scan for primary breast cancer at the UZ Brussel during the period 2002-2008 were identified. Selection criteria were: histological confirmation, FDG PET performed prior to therapy, and breast surgery integrated into the primary therapy plan. Patterns of increased metabolism were recorded according to the involved locations: breast, ipsilateral axillary region, internal mammary chain, or distant organs. The end-point for the survival analysis using Cox proportional hazards was disease-free survival. The contribution of prognostic factors was evaluated using the Akaike information criterion and the Nagelkerke index. Results: PET positivity was associated with age, gender, tumour location, tumour size >2 cm, lymphovascular invasion, oestrogen and progesterone receptor status. Among 63 patients with a negative axillary PET status, 56 (88.9%) had three or fewer involved nodes, whereas among 41 patients with a positive axillary PET status, 25 (61.0%) had more than three positive nodes (P < 0.0001). In the survival analysis of preoperative characteristics, PET axillary node positivity was the foremost statistically significant factor associated with decreased disease-free survival (hazard ratio 2.81, 95% CI 1.17-6.74). Conclusion: Preoperative PET axillary node positivity identified patients with a higher burden of nodal involvement, which might be important for treatment decisions in breast cancer patient

    Machine learning for the classification of breast cancer tumor: a comparative analysis

    Get PDF
    The detection and diagnosis of Breast cancer at an early stage is a challenging task. With the increase in emerging technologies such as data mining tools, along with machine learning algorithms, new prospects in the medical field for automatic diagnosis have been developed, with which the prediction of a disease at an early stage is possible. Early detection of the disease may increase the survival rate of patients. The main purpose of the study was to predict breast cancer disease as benign or malignant by using supervised machine learning algorithms such as the K-nearest neighbor (K-NN), multilayer perceptron (MLP), and random forest (RF) and to compare their performance in terms of the accuracy, precision, F1 score, support, and AUC. The experimental results demonstrated that the MLP achieved a high prediction accuracy of 99.4%, followed by random forest (96.4%) and K-NN (76.3%). The diagnosis rates of the MLP, random forest and K-NN were 99.9%, 99.6%, and 73%, respectively. The study provides a clear idea of the accomplishments of classification algorithms in terms of their prediction ability, which can aid healthcare professionals in diagnosing chronic breast cancer efficiently

    Adherence to the Dutch Breast Cancer Guidelines for Surveillance in Breast Cancer Survivors:Real-World Data from a Pooled Multicenter Analysis

    Get PDF
    BACKGROUND: Regular follow-up after treatment for breast cancer is crucial to detect potential recurrences and second contralateral breast cancer in an early stage. However, information about follow-up patterns in the Netherlands is scarce. PATIENTS AND METHODS: Details concerning diagnostic procedures and policlinic visits in the first 5 years following a breast cancer diagnosis were gathered between 2009 and 2019 for 9916 patients from 4 large Dutch hospitals. This information was used to analyze the adherence of breast cancer surveillance to guidelines in the Netherlands. Multivariable logistic regression was used to relate the average number of a patient’s imaging procedures to their demographics, tumor–treatment characteristics, and individual locoregional recurrence risk (LRR), estimated by a risk-prediction tool, called INFLUENCE. RESULTS: The average number of policlinic contacts per patient decreased from 4.4 in the first to 2.0 in the fifth follow-up year. In each of the 5 follow-up years, the share of patients without imaging procedures was relatively high, ranging between 31.4% and 33.6%. Observed guidelines deviations were highly significant (P < .001). A higher age, lower UICC stage, and having undergone radio- or chemotherapy were significantly associated with a higher chance of receiving an imaging procedure. The estimated average LRR-risk was 3.5% in patients without any follow-up imaging compared with 2.3% in patients with the recommended number of 5 imagings. CONCLUSION: Compared to guidelines, more policlinic visits were made, although at inadequate intervals, and fewer imaging procedures were performed. The frequency of imaging procedures did not correlate with the patients’ individual risk profiles for LRR

    Project 1: Bladder and non-bladder urinary cancers: examining patterns and risk factors for second cancers using data from the New South Wales Central Cancer Registry (Australia). Project 2: Multiple Imputation to address a data artefact for the degree-of-spread variable in the NSW CCR for the period 1993-1998: Lung Cancer as a test case

    Get PDF
    Tese de Doutoramento em Engenharia Civil, no ramo de Hidráulica, Recursos Hídricos e Ambiente, apresentada ao Departamento de Engenharia Civil da Faculdade de Ciências e Tecnologia da Universidade de CoimbraOs espaços urbanos situam-se frequentemente em zonas costeiras ou ribeirinhas susceptíveis de alagamento. Estas áreas são muito propensas a cheias e inundações podendo originar grandes danos económicos e eventualmente problemas de saúde pública. Este risco levou a que projetistas e investigadores procurassem ferramentas de modelação eficazes que não só replicassem numericamente o escoamento superficial ou a rede de coletores pluviais, mas também a interação entre ambos. Estas ferramentas são conhecidas como sendo DD. Os modelos DD são capazes de replicar as complexas interações entre a rede de coletores e o escoamento superficial. Esta tese tem como objetivos o desenvolvimento e avaliação de um modelo numérico totalmente acoplado de cheias urbanas capaz de replicar o escoamento na rede bidimensional superficial, o escoamento na rede de coletores e as complexas interações entre ambos. Na investigação apresentada, derivaram-se soluções analíticas para uma das simplificações das SWE, as LInE. As LInE são uma simplificação das GWM por redução de uma dimensão (2D para 1D) e estas, por sua vez são uma simplificação das SWE por desprezo dos termos da aceleração convectiva. As soluções analíticas obtidas poderão ser usadas como referência para validação de modelos numéricos derivados para as LInE ou GWM. Foi criado um modelo numérico bidimensional com uma malha não estruturada, discretizada em triângulos. O modelo é de primeira ordem no espaço e tempo, sendo baseado numa aproximação localizada do método de Godunov usando os pressupostos do esquema de Roe para a derivação dos fluxos numéricos. Foi, ao mesmo tempo e usando o mesmo esquema, implementado um modelo para as SWE. Estes modelos criados (SWE e GWM) foram aperfeiçoados através de um esquema melhorado para o tratamento de frentes molhadas/secas com a capacidade de conservar localmente a massa. Os modelos foram acoplados a dois reconhecidos modelos de coletores, SIPSON e SWMM, que permitem o cálculo do escoamento nos coletores. Os modelos superficiais foram comparados com um modelo difusivo sem inércia (P-DWave), usando como base o modelo de coletores SIPSON. Foi também analisada a influência da propagação nos coletores no escoamento superficial, através de curvas depth-damage usando como modelo para a superfície o GWM. As principais conclusões obtidas mostram que o GWM reproduz escoamentos com regime lento de forma semelhante às equações dinâmicas (SWE) e é capaz de reproduzir regime rápido desde que seja localizado, ao contrário das restantes simplificações das SWE. Mostrou-se também a superior eficácia computacional do GWM quando comparado com o SWE para o mesmo esquema numérico. No que diz respeito às frentes secas/molhadas, verificou-se que os esquemas que restringem os fluxos e os que conservam localmente a massa são os mais adequados na prática de engenharia. O modelo numérico totalmente acoplado, em qualquer das versões de escoamento superficial SIPSON/GWM ou SWMM/GWM provou que dispunha de características melhoradas quando comparado com outros modelos acoplados. Globalmente os objectivos propostos para esta tese foram alcançados, tendo sido criado e validado um modelo totalmente acoplado de cheias urbano capaz de modelar o escoamento na rede bidimensional superficial, o escoamento na rede de coletores e as complexas interações entre ambos.Urban areas often lie within floodplains or low lying coastal areas. These areas are very prone to inundations and their subsequent economic damage and potential public health issues. This risk lead designers and researchers to search for a sophisticated modelling tool capable of modelling not only surface runoff and pipe network systems isolated but also the interaction between these two systems. These tools are known as DD. DD models are able to replicate the complex interactions between the overland and the pipe network system. The aim of this Thesis is to develop and test a novel "Fully Coupled Urban Flood Model'' capable of modelling the 2D surface flood runoff, the 1D pipe network flows and the complex interactions between the surface and subsurface systems (overland and pipe networks). The research presented in this Thesis starts by deriving analytical solutions to a simplification of the SWE, the LInE. These equations are a simplification of the GWM by reducing one dimension (2D to 1D). GWM are an approximation of the SWE by neglecting the convective terms. The obtained analytical solutions can be used as a benchmark for the validation of the numerical models created for the LInE or GWM. A novel first order unstructured numerical model is afterwards presented for the GWM based upon an approximation of the Godunov exact solver, by making use of the Roe assumptions for the derivation of the numerical fluxes. This model is based upon a node centred triangular discretization. A SWE model is created alongside using the same scheme. The models are improved through the use of a WD scheme that is both locally and globally conservative. The models are coupled to two well established pipe network models (SIPSON and SWMM). A comparison is drawn between the two overland flow models and a diffusive wave model (P-Dwave) by using SIPSON as the pipe network model. The use of different pipe network models SIPSON and SWMM is also compared using GWM as the surface model. The main conclusions drawn are that the GWM reproduces subcritical flow in a similar manner than SWE and is able to handle localized supercritical flow unlike other SWE simplifications. The GWM simulation time is less than SWE with the same numerical scheme. Local correction WD schemes and flux restricting schemes show the best results for treating WD fronts. The novel "Fully Coupled Urban Flood Model'' in either simplified overland flow versions SIPSON/GWM or SWMM/GWM has shown to be an effective tool in modelling complex interactions between the surface and subsurface systems. Furthermore it has proved to possess improved features when compared to other coupled models. The proposed objectives to be achieved in this Thesis were totally fulfilled, with a "Fully Coupled Urban Flood Model'' capable of modelling the 2D surface flood runoff, the pipe network flows and the complex interactions between the surface and subsurface systems created and validated.FCT - SFRH/BD/81869/201
    • …
    corecore