Tree-based methods for survival analysis and high-dimensional data

Abstract

Machine learning techniques have garnered significant popularity due to their capacity to handle high dimensional data. Tree-based methods are among the most popular machine learning approaches. My dissertation aims on improving existing tree-based methods and developing statistical framework for understanding the proposed methods. It contains three topics: recursively imputed survival tree, reinforcement learning trees and reinforcement learning trees for right censored survival data. A central idea of my dissertation is focused on increasing the chance of using signaled variables as splitting rule during the tree construction while not losing the randomness/diversity, hence a more accurate model can be built. However, different methods achieve this by using different approaches. Recursively imputed survival tree recursively impute censored observations and refit the survival tree model. This approach allows better use of the censored observations during the tree construction, it also changes the dynamic of splitting rule selections during the tree construction so that signaled variables can be emphasized more in the refitted model. Reinforcement learning trees takes a direct approach to emphasize signaled variables in the tree construction. An embedded model is fitted at each internal node while searching for splitting rules. The variable with the largest variable importance measure is used as the splitting variable. A new theoretical framework is proposed to show consistency and convergence rate of this new approach. In the third topic, we further extend reinforcement learning trees to right censored survival data. Brier score is utilized to calculate the variable importance measures. We also show a desirable property of the proposed method that can help correct the bias of variable importance measures when correlated variables are present in the model.Doctor of Philosoph

    Similar works