496 research outputs found

    Training artificial neural networks directly on the concordance index for censored data using genetic algorithms.

    Get PDF
    OBJECTIVE: The concordance index (c-index) is the standard way of evaluating the performance of prognostic models in the presence of censored data. Constructing prognostic models using artificial neural networks (ANNs) is commonly done by training on error functions which are modified versions of the c-index. Our objective was to demonstrate the capability of training directly on the c-index and to evaluate our approach compared to the Cox proportional hazards model. METHOD: We constructed a prognostic model using an ensemble of ANNs which were trained using a genetic algorithm. The individual networks were trained on a non-linear artificial data set divided into a training and test set both of size 2000, where 50% of the data was censored. The ANNs were also trained on a data set consisting of 4042 patients treated for breast cancer spread over five different medical studies, 2/3 used for training and 1/3 used as a test set. A Cox model was also constructed on the same data in both cases. The two models' c-indices on the test sets were then compared. The ranking performance of the models is additionally presented visually using modified scatter plots. RESULTS: Cross validation on the cancer training set did not indicate any non-linear effects between the covariates. An ensemble of 30 ANNs with one hidden neuron was therefore used. The ANN model had almost the same c-index score as the Cox model (c-index=0.70 and 0.71, respectively) on the cancer test set. Both models identified similarly sized low risk groups with at most 10% false positives, 49 for the ANN model and 60 for the Cox model, but repeated bootstrap runs indicate that the difference was not significant. A significant difference could however be seen when applied on the non-linear synthetic data set. In that case the ANN ensemble managed to achieve a c-index score of 0.90 whereas the Cox model failed to distinguish itself from the random case (c-index=0.49). CONCLUSIONS: We have found empirical evidence that ensembles of ANN models can be optimized directly on the c-index. Comparison with a Cox model indicates that near identical performance is achieved on a real cancer data set while on a non-linear data set the ANN model is clearly superior

    The risk of re-intervention after endovascular aortic aneurysm repair

    Get PDF
    This thesis studies survival analysis techniques dealing with censoring to produce predictive tools that predict the risk of endovascular aortic aneurysm repair (EVAR) re-intervention. Censoring indicates that some patients do not continue follow up, so their outcome class is unknown. Methods dealing with censoring have drawbacks and cannot handle the high censoring of the two EVAR datasets collected. Therefore, this thesis presents a new solution to high censoring by modifying an approach that was incapable of differentiating between risks groups of aortic complications. Feature selection (FS) becomes complicated with censoring. Most survival FS methods depends on Cox's model, however machine learning classifiers (MLC) are preferred. Few methods adopted MLC to perform survival FS, but they cannot be used with high censoring. This thesis proposes two FS methods which use MLC to evaluate features. The two FS methods use the new solution to deal with censoring. They combine factor analysis with greedy stepwise FS search which allows eliminated features to enter the FS process. The first FS method searches for the best neural networks' configuration and subset of features. The second approach combines support vector machines, neural networks, and K nearest neighbor classifiers using simple and weighted majority voting to construct a multiple classifier system (MCS) for improving the performance of individual classifiers. It presents a new hybrid FS process by using MCS as a wrapper method and merging it with the iterated feature ranking filter method to further reduce the features. The proposed techniques outperformed FS methods based on Cox's model such as; Akaike and Bayesian information criteria, and least absolute shrinkage and selector operator in the log-rank test's p-values, sensitivity, and concordance. This proves that the proposed techniques are more powerful in correctly predicting the risk of re-intervention. Consequently, they enable doctors to set patients’ appropriate future observation plan

    Using multiple classifiers for predicting the risk of endovascular aortic aneurysm repair re-intervention through hybrid feature selection.

    Get PDF
    Feature selection is essential in medical area; however, its process becomes complicated with the presence of censoring which is the unique character of survival analysis. Most survival feature selection methods are based on Cox's proportional hazard model, though machine learning classifiers are preferred. They are less employed in survival analysis due to censoring which prevents them from directly being used to survival data. Among the few work that employed machine learning classifiers, partial logistic artificial neural network with auto-relevance determination is a well-known method that deals with censoring and perform feature selection for survival data. However, it depends on data replication to handle censoring which leads to unbalanced and biased prediction results especially in highly censored data. Other methods cannot deal with high censoring. Therefore, in this article, a new hybrid feature selection method is proposed which presents a solution to high level censoring. It combines support vector machine, neural network, and K-nearest neighbor classifiers using simple majority voting and a new weighted majority voting method based on survival metric to construct a multiple classifier system. The new hybrid feature selection process uses multiple classifier system as a wrapper method and merges it with iterated feature ranking filter method to further reduce features. Two endovascular aortic repair datasets containing 91% censored patients collected from two centers were used to construct a multicenter study to evaluate the performance of the proposed approach. The results showed the proposed technique outperformed individual classifiers and variable selection methods based on Cox's model such as Akaike and Bayesian information criterions and least absolute shrinkage and selector operator in p values of the log-rank test, sensitivity, and concordance index. This indicates that the proposed classifier is more powerful in correctly predicting the risk of re-intervention enabling doctor in selecting patients' future follow-up plan

    A New Fuzzy Modeling Framework for Integrated Risk Prognosis and Therapy of Bladder Cancer Patients

    Get PDF
    This paper presents a new fuzzy modelling approach for analysing censored survival data and finding risk groups of patients diagnosed with bladder cancer. The proposed framework involves a new procedure for integrating the frameworks of interval type-2 fuzzy logic and Cox modelling intrinsically. The output of this synergistic framework is a risk score/prognostics index which is indicative of the patient's level of mortality risk. A threshold value is selected whereby patients with risk scores that are greater than this threshold are classed as high risk patients and vice versa. Unlike in the case of black-box type modelling approaches, the paper shows that interpretability and transparency are maintained using the proposed fuzzy modelling framework

    Deep learning cardiac motion analysis for human survival prediction

    Get PDF
    Motion analysis is used in computer vision to understand the behaviour of moving objects in sequences of images. Optimising the interpretation of dynamic biological systems requires accurate and precise motion tracking as well as efficient representations of high-dimensional motion trajectories so that these can be used for prediction tasks. Here we use image sequences of the heart, acquired using cardiac magnetic resonance imaging, to create time-resolved three-dimensional segmentations using a fully convolutional network trained on anatomical shape priors. This dense motion model formed the input to a supervised denoising autoencoder (4Dsurvival), which is a hybrid network consisting of an autoencoder that learns a task-specific latent code representation trained on observed outcome data, yielding a latent representation optimised for survival prediction. To handle right-censored survival outcomes, our network used a Cox partial likelihood loss function. In a study of 302 patients the predictive accuracy (quantified by Harrell's C-index) was significantly higher (p < .0001) for our model C=0.73 (95%\% CI: 0.68 - 0.78) than the human benchmark of C=0.59 (95%\% CI: 0.53 - 0.65). This work demonstrates how a complex computer vision task using high-dimensional medical image data can efficiently predict human survival

    Feature selection through validation and un-censoring of endovascular repair survival data for predicting the risk of re-intervention

    Get PDF
    Background: Feature selection (FS) process is essential in the medical area as it reduces the effort and time needed for physicians to measure unnecessary features. Choosing useful variables is a difficult task with the presence of censoring which is the unique characteristic in survival analysis. Most survival FS methods depend on Cox's proportional hazard model; however, machine learning techniques (MLT) are preferred but not commonly used due to censoring. Techniques that have been proposed to adopt MLT to perform FS with survival data cannot be used with the high level of censoring. The researcher's previous publications proposed a technique to deal with the high level of censoring. It also used existing FS techniques to reduce dataset dimension. However, in this paper a new FS technique was proposed and combined with feature transformation and the proposed uncensoring approaches to select a reduced set of features and produce a stable predictive model. Methods: In this paper, a FS technique based on artificial neural network (ANN) MLT is proposed to deal with highly censored Endovascular Aortic Repair (EVAR). Survival data EVAR datasets were collected during 2004 to 2010 from two vascular centers in order to produce a final stable model. They contain almost 91% of censored patients. The proposed approach used a wrapper FS method with ANN to select a reduced subset of features that predict the risk of EVAR re-intervention after 5 years to patients from two different centers located in the United Kingdom, to allow it to be potentially applied to cross-centers predictions. The proposed model is compared with the two popular FS techniques; Akaike and Bayesian information criteria (AIC, BIC) that are used with Cox's model. Results: The final model outperforms other methods in distinguishing the high and low risk groups; as they both have concordance index and estimated AUC better than the Cox's model based on AIC, BIC, Lasso, and SCAD approaches. These models have p-values lower than 0.05, meaning that patients with different risk groups can be separated significantly and those who would need re-intervention can be correctly predicted. Conclusion: The proposed approach will save time and effort made by physicians to collect unnecessary variables. The final reduced model was able to predict the long-term risk of aortic complications after EVAR. This predictive model can help clinicians decide patients' future observation plan

    A New Scalable, Portable, and Memory-Efficient Predictive Analytics Framework for Predicting Time-to-Event Outcomes in Healthcare

    Get PDF
    Time-to-event outcomes are prevalent in medical research. To handle these outcomes, as well as censored observations, statistical and survival regression methods are widely used based on the assumptions of linear association; however, clinicopathological features often exhibit nonlinear correlations. Machine learning (ML) algorithms have been recently adapted to effectively handle nonlinear correlations. One drawback of ML models is that they can model idiosyncratic features of a training dataset. Due to this overlearning, ML models perform well on the training data but are not so striking on test data. The features that we choose indirectly influence the performance of ML prediction models. With the expansion of big data in biomedical informatics, appropriate feature engineering and feature selection are vital to ML success. Also, an ensemble learning algorithm helps decrease bias and variance by combining the predictions of multiple models. In this study, we newly constructed a scalable, portable, and memory-efficient predictive analytics framework, fitting four components (feature engineering, survival analysis, feature selection, and ensemble learning) together. Our framework first employs feature engineering techniques, such as binarization, discretization, transformation, and normalization on raw dataset. The normalized feature set was applied to the Cox survival regression that produces highly correlated features relevant to the outcome.The resultant feature set was deployed to “eXtreme gradient boosting ensemble learning” (XGBoost) and Recursive Feature Elimination algorithms. XGBoost uses a gradient boosting decision tree algorithm in which new models are created sequentially that predict the residuals of prior models, which are then added together to make the final prediction. In our experiments, we analyzed a cohort of cardiac surgery patients drawn from a multi-hospital academic health system. The model evaluated 72 perioperative variables that impact an event of readmission within 30 days of discharge, derived 48 significant features, and demonstrated optimum predictive ability with feature sets ranging from 16 to 24. The area under the receiver operating characteristics observed for the feature set of 16 were 0.8816, and 0.9307 at the 35th, and 151st iteration respectively. Our model showed improved performance compared to state-of-the-art models and could be more useful for decision support in clinical settings

    Identifying prognostic gene-signatures using a network-based approach

    Get PDF
    The main objective of this study is to develop a novel network-based methodology to identify prognostic signatures of genes that can predict recurrence in cancer. Feature selection algorithms were used widely for the identification of gene signatures in genome-wide association studies. But most of them do not discover the causal relationships between the features and need to compromise between accuracy and complexity. The network-based techniques take the molecular interactions between pairs of genes in to account and are thus a more efficient means of finding gene signatures, and they are also better in terms of its classification accuracy without compromising over complexity. Nevertheless, the network-based techniques currently being used have a few limitations each. Correlation-based coexpression networks do not provide predictive structure or causal relations among the genes. Bayesian networks cannot model feedback loops. Boolean networks can model small scale molecular networks, but not at the genome-scale. Thus the prediction logic induced implication networks are chosen to generate genome-wide coexpression networks, as they integrate formal logic and statistics and also overcome the limitations of other network-based techniques.;The first part of the study includes building of an implication network and identification of a set of genes that could form a prognostic signature. The data used consisted of 442 samples taken from 4 different sources. The data was split into training set UM/HLM (n=256) and two testing sets DFCI (n=82) and MSK (n=104). The training set was used for the generation of the implication network and eventually the identification of the prognostic signature. The test sets were used for validating the obtained signature. The implication networks were built by using the gene expression data associated with two disease states (metastasis or non-metastasis), defined by the period and status of post-operative survival. The gene interactions that differentiated the two disease states, the differential components, were identified. The major cancer hallmarks (E2F, EGF, EGFR, KRAS, MET, RB1, and TP53) were considered, and the genes that interacted with all the major hallmarks were identified from the differential components to form a 31-gene prognostic signature. A software package was created in R to automate this process which has C-code embedded into it. Next, the signature was fitted into a COX proportional hazard model and the nearest point to the perfect classification in the ROC curve was identified as the best scheme for patient stratification on the training set (log-rank p-value=1.97e-08), and two test sets DFCI (log-rank p-value=2.13e-05) and MSK (log-rank p-value=1.24e-04) in Kaplan-Meier analyses.;Prognostic validation was carried out on the test sets using methods such as Concordance Probability Estimate (CPE) and Gene Set Enrichment Analysis (GSEA). The accuracy of this signature was evaluated with CPE, which achieves 0.71 on the test set DFCI (log-rank p-value=5.3e-08) and 0.70 on test set MSK (log-rank p-value=2.1e-07). The hazard ratio of this 31-gene prognostic signature is 2.68 (95% CI: [1.88, 3.82]) on the DFCI dataset and 3.31 (95% CI: [2.11, 5.2]) on the MSK set. These results demonstrate that our 31-gene signature was significantly more accurate than previously published signatures on the same datasets. The false discovery rate (FDR) of this 31-gene signature is 0.21 as computed with GSEA, which showed that our 31 gene signature was comparable to other lung cancer prognostic signatures on the same datasets.;Topological validation was performed on the test sets for the identified signature to validate the computationally derived molecular interactions. The interactions from implication networks were compared with those from Bayesian networks implemented in Tetrad IV. Various curated databases and bioinformatics tools were used in the topological evaluation, including PRODISTIN, KEGG, PubMed, NCI-Nature pathways, MATISSE, STRING 8, Ingenuity Pathway Analysis, and Pathway Studio 6. The results showed that the implication networks generated all the curated interactions from various tools and databases, whereas Bayesian networks contained only a few of them. It can thus be concluded that implication networks are capable of generating many more gene or protein interactions when compared to the currently used network techniques such as Bayesian networks
    • 

    corecore