1 research outputs found

    A two-step feature selection procedure for relevant markers of Squamous Cell Lung Carcinoma using different survival models

    Get PDF
    There are potentially infinite gene expression markers for Lung Squamous Cell Carcinoma. This results in a high-dimensional data with a large number of features. The selection of relevant markers for analysis is thus, of utmost importance. In our study, we have aimed to select a subset of prominent and significant features from 31918 features of gene expressions. Analysis is then performed on the selected features using the Cox Proportional Hazards Model to know how each marker affects the survival estimates of a patient. We have employed a two-step selection process to select a subset of markers. The first step is done by L1 regularized Cox PH. Then the selected markers are screened a second time by running a univariate Cox PH model and checking for the p-value of each bio-marker via Wald inference (p<0.05). Once the final selection is made, we estimate the Hazard Ratio and Confidence intervals using Maximum Likelihood Estimates (MLE) and the Bayesian Approach with the Cox Proportional Hazards Model (CPH) and the Accelerated Failure Time Model (AFT) as an alternative. A forest plot has also been generated to show the graphical representation of the meta-analysis done in the study. With the proposed selection procedure we have managed to find a suitable subset out of a large number of variables available. The features selected have been analyzed and their validity has been confirmed by using survival models
    corecore