14,826 research outputs found
Using multiple classifiers for predicting the risk of endovascular aortic aneurysm repair re-intervention through hybrid feature selection.
Feature selection is essential in medical area; however, its process becomes complicated with the presence of censoring which is the unique character of survival analysis. Most survival feature selection methods are based on Cox's proportional hazard model, though machine learning classifiers are preferred. They are less employed in survival analysis due to censoring which prevents them from directly being used to survival data. Among the few work that employed machine learning classifiers, partial logistic artificial neural network with auto-relevance determination is a well-known method that deals with censoring and perform feature selection for survival data. However, it depends on data replication to handle censoring which leads to unbalanced and biased prediction results especially in highly censored data. Other methods cannot deal with high censoring. Therefore, in this article, a new hybrid feature selection method is proposed which presents a solution to high level censoring. It combines support vector machine, neural network, and K-nearest neighbor classifiers using simple majority voting and a new weighted majority voting method based on survival metric to construct a multiple classifier system. The new hybrid feature selection process uses multiple classifier system as a wrapper method and merges it with iterated feature ranking filter method to further reduce features. Two endovascular aortic repair datasets containing 91% censored patients collected from two centers were used to construct a multicenter study to evaluate the performance of the proposed approach. The results showed the proposed technique outperformed individual classifiers and variable selection methods based on Cox's model such as Akaike and Bayesian information criterions and least absolute shrinkage and selector operator in p values of the log-rank test, sensitivity, and concordance index. This indicates that the proposed classifier is more powerful in correctly predicting the risk of re-intervention enabling doctor in selecting patients' future follow-up plan
Towards Flexible Time-to-event Modeling: Optimizing Neural Networks via Rank Regression
Time-to-event analysis, also known as survival analysis, aims to predict the
time of occurrence of an event, given a set of features. One of the major
challenges in this area is dealing with censored data, which can make learning
algorithms more complex. Traditional methods such as Cox's proportional hazards
model and the accelerated failure time (AFT) model have been popular in this
field, but they often require assumptions such as proportional hazards and
linearity. In particular, the AFT models often require pre-specified parametric
distributional assumptions. To improve predictive performance and alleviate
strict assumptions, there have been many deep learning approaches for
hazard-based models in recent years. However, representation learning for AFT
has not been widely explored in the neural network literature, despite its
simplicity and interpretability in comparison to hazard-focused methods. In
this work, we introduce the Deep AFT Rank-regression model for Time-to-event
prediction (DART). This model uses an objective function based on Gehan's rank
statistic, which is efficient and reliable for representation learning. On top
of eliminating the requirement to establish a baseline event time distribution,
DART retains the advantages of directly predicting event time in standard AFT
models. The proposed method is a semiparametric approach to AFT modeling that
does not impose any distributional assumptions on the survival time
distribution. This also eliminates the need for additional hyperparameters or
complex model architectures, unlike existing neural network-based AFT models.
Through quantitative analysis on various benchmark datasets, we have shown that
DART has significant potential for modeling high-throughput censored
time-to-event data.Comment: Accepted at ECAI 202
TripleSurv: Triplet Time-adaptive Coordinate Loss for Survival Analysis
A core challenge in survival analysis is to model the distribution of
censored time-to-event data, where the event of interest may be a death,
failure, or occurrence of a specific event. Previous studies have showed that
ranking and maximum likelihood estimation (MLE)loss functions are widely-used
for survival analysis. However, ranking loss only focus on the ranking of
survival time and does not consider potential effect of samples for exact
survival time values. Furthermore, the MLE is unbounded and easily subject to
outliers (e.g., censored data), which may cause poor performance of modeling.
To handle the complexities of learning process and exploit valuable survival
time values, we propose a time-adaptive coordinate loss function, TripleSurv,
to achieve adaptive adjustments by introducing the differences in the survival
time between sample pairs into the ranking, which can encourage the model to
quantitatively rank relative risk of pairs, ultimately enhancing the accuracy
of predictions. Most importantly, the TripleSurv is proficient in quantifying
the relative risk between samples by ranking ordering of pairs, and consider
the time interval as a trade-off to calibrate the robustness of model over
sample distribution. Our TripleSurv is evaluated on three real-world survival
datasets and a public synthetic dataset. The results show that our method
outperforms the state-of-the-art methods and exhibits good model performance
and robustness on modeling various sophisticated data distributions with
different censor rates. Our code will be available upon acceptance.Comment: 9 pages,6 figure
Survival ensembles by the sum of pairwise differences with application to lung cancer microarray studies
Lung cancer is among the most common cancers in the United States, in terms
of incidence and mortality. In 2009, it is estimated that more than 150,000
deaths will result from lung cancer alone. Genetic information is an extremely
valuable data source in characterizing the personal nature of cancer. Over the
past several years, investigators have conducted numerous association studies
where intensive genetic data is collected on relatively few patients compared
to the numbers of gene predictors, with one scientific goal being to identify
genetic features associated with cancer recurrence or survival. In this note,
we propose high-dimensional survival analysis through a new application of
boosting, a powerful tool in machine learning. Our approach is based on an
accelerated lifetime model and minimizing the sum of pairwise differences in
residuals. We apply our method to a recent microarray study of lung
adenocarcinoma and find that our ensemble is composed of 19 genes, while a
proportional hazards (PH) ensemble is composed of nine genes, a proper subset
of the 19-gene panel. In one of our simulation scenarios, we demonstrate that
PH boosting in a misspecified model tends to underfit and ignore
moderately-sized covariate effects, on average. Diagnostic analyses suggest
that the PH assumption is not satisfied in the microarray data and may explain,
in part, the discrepancy in the sets of active coefficients. Our simulation
studies and comparative data analyses demonstrate how statistical learning by
PH models alone is insufficient.Comment: Published in at http://dx.doi.org/10.1214/10-AOAS426 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
- …