9,519 research outputs found

    A rule-of-thumb for the variable bandwidth selection in kernel hazard rate estimation

    Get PDF
    In nonparametric curve estimation the decision about the type of smoothing parameter is critical for the practical performance. The nearest neighbor bandwidth as introduced by Gefeller and Dette 1992 for censored data in survival analysis is specified by one parameter, namely the number of nearest neighbors. Bandwidth selection in this setting is rarely investigated although not linked closely to the frequently studied fixed bandwidth. We introduce a selection algorithm in the hazard rate estimation context. The approach uses a newly developed link to the fixed bandwidth which identifies the variable bandwidth as additional smoothing step. The procedure gains further data-adaption after fixed bandwidth smoothing. Assessment by a Monte Carlo simulation and a clinical example demonstrate the practical relevance of the findings. --

    Gene Expression based Survival Prediction for Cancer Patients: A Topic Modeling Approach

    Full text link
    Cancer is one of the leading cause of death, worldwide. Many believe that genomic data will enable us to better predict the survival time of these patients, which will lead to better, more personalized treatment options and patient care. As standard survival prediction models have a hard time coping with the high-dimensionality of such gene expression (GE) data, many projects use some dimensionality reduction techniques to overcome this hurdle. We introduce a novel methodology, inspired by topic modeling from the natural language domain, to derive expressive features from the high-dimensional GE data. There, a document is represented as a mixture over a relatively small number of topics, where each topic corresponds to a distribution over the words; here, to accommodate the heterogeneity of a patient's cancer, we represent each patient (~document) as a mixture over cancer-topics, where each cancer-topic is a mixture over GE values (~words). This required some extensions to the standard LDA model eg: to accommodate the "real-valued" expression values - leading to our novel "discretized" Latent Dirichlet Allocation (dLDA) procedure. We initially focus on the METABRIC dataset, which describes breast cancer patients using the r=49,576 GE values, from microarrays. Our results show that our approach provides survival estimates that are more accurate than standard models, in terms of the standard Concordance measure. We then validate this approach by running it on the Pan-kidney (KIPAN) dataset, over r=15,529 GE values - here using the mRNAseq modality - and find that it again achieves excellent results. In both cases, we also show that the resulting model is calibrated, using the recent "D-calibrated" measure. These successes, in two different cancer types and expression modalities, demonstrates the generality, and the effectiveness, of this approach

    Using multiple classifiers for predicting the risk of endovascular aortic aneurysm repair re-intervention through hybrid feature selection.

    Get PDF
    Feature selection is essential in medical area; however, its process becomes complicated with the presence of censoring which is the unique character of survival analysis. Most survival feature selection methods are based on Cox's proportional hazard model, though machine learning classifiers are preferred. They are less employed in survival analysis due to censoring which prevents them from directly being used to survival data. Among the few work that employed machine learning classifiers, partial logistic artificial neural network with auto-relevance determination is a well-known method that deals with censoring and perform feature selection for survival data. However, it depends on data replication to handle censoring which leads to unbalanced and biased prediction results especially in highly censored data. Other methods cannot deal with high censoring. Therefore, in this article, a new hybrid feature selection method is proposed which presents a solution to high level censoring. It combines support vector machine, neural network, and K-nearest neighbor classifiers using simple majority voting and a new weighted majority voting method based on survival metric to construct a multiple classifier system. The new hybrid feature selection process uses multiple classifier system as a wrapper method and merges it with iterated feature ranking filter method to further reduce features. Two endovascular aortic repair datasets containing 91% censored patients collected from two centers were used to construct a multicenter study to evaluate the performance of the proposed approach. The results showed the proposed technique outperformed individual classifiers and variable selection methods based on Cox's model such as Akaike and Bayesian information criterions and least absolute shrinkage and selector operator in p values of the log-rank test, sensitivity, and concordance index. This indicates that the proposed classifier is more powerful in correctly predicting the risk of re-intervention enabling doctor in selecting patients' future follow-up plan
    corecore