4 research outputs found

    Identification of Prognostic Genes and Gene Sets for Early-Stage Non-Small Cell Lung Cancer Using Bi-Level Selection Methods

    Get PDF
    In contrast to feature selection and gene set analysis, bi-level selection is a process of selecting not only important gene sets but also important genes within those gene sets. Depending on the order of selections, a bi-level selection method can be classified into three categories – forward selection, which first selects relevant gene sets followed by the selection of relevant individual genes; backward selection which takes the reversed order; and simultaneous selection, which performs the two tasks simultaneously usually with the aids of a penalized regression model. To test the existence of subtype-specific prognostic genes for non-small cell lung cancer (NSCLC), we had previously proposed the Cox-filter method that examines the association between patients’ survival time after diagnosis with one specific gene, the disease subtypes, and their interaction terms. In this study, we further extend it to carry out forward and backward bi-level selection. Using simulations and a NSCLC application, we demonstrate that the forward selection outperforms the backward selection and other relevant algorithms in our setting. Both proposed methods are readily understandable and interpretable. Therefore, they represent useful tools for the researchers who are interested in exploring the prognostic value of gene expression data for specific subtypes or stages of a disease

    Cancer Is Associated with Alterations in the Three-Dimensional Organization of the Genome

    Get PDF
    The human genome is organized into topologically associating domains (TADs), which represent contiguous regions with a higher frequency of intra-interactions as opposed to inter-interactions. TADs contribute to gene expression regulation by restricting the interactions between their regulatory elements, and TAD disruption has been associated with cancer. Here, we provide a proof of principle that mutations within TADs can be used to predict the survival of cancer patients. Specifically, we constructed a set of 1467 consensus TADs representing the three-dimensional organization of the human genome and used Cox regression analysis to identify a total of 35 prognostic TADs in different cancer types. Interestingly, only 46% of the 35 prognostic TADs comprised genes with known clinical relevance. Moreover, in the vast majority of such cases, the prognostic value of the TAD was not directly related to the presence/absence of mutations in the gene(s), emphasizing the importance of regulatory mutations. In addition, we found that 34% of the prognostic TADs show strong structural perturbations in the cancer genome, consistent with the widespread, global epigenetic dysregulation often observed in cancer patients. In summary, this study elucidates the mechanisms through which non-coding variants may influence cancer progression and opens new avenues for personalized medicine

    Unbiased prediction and feature selection in high-dimensional survival regression.

    No full text
    With widespread availability of omics profiling techniques, the analysis and interpretation of high-dimensional omics data, for example, for biomarkers, is becoming an increasingly important part of clinical medicine because such datasets constitute a promising resource for predicting survival outcomes. However, early experience has shown that biomarkers often generalize poorly. Thus, it is crucial that models are not overfitted and give accurate results with new data. In addition, reliable detection of multivariate biomarkers with high predictive power (feature selection) is of particular interest in clinical settings. We present an approach that addresses both aspects in high-dimensional survival models. Within a nested cross-validation (CV), we fit a survival model, evaluate a dataset in an unbiased fashion, and select features with the best predictive power by applying a weighted combination of CV runs. We evaluate our approach using simulated toy data, as well as three breast cancer datasets, to predict the survival of breast cancer patients after treatment. In all datasets, we achieve more reliable estimation of predictive power for unseen cases and better predictive performance compared to the standard CoxLasso model. Taken together, we present a comprehensive and flexible framework for survival models, including performance estimation, final feature selection, and final model construction. The proposed algorithm is implemented in an open source R package (SurvRank) available on CRAN
    corecore