34 research outputs found

    Weighted Cox regression for the prediction of heterogeneous patient subgroups

    Full text link
    An important task in clinical medicine is the construction of risk prediction models for specific subgroups of patients based on high-dimensional molecular measurements such as gene expression data. Major objectives in modeling high-dimensional data are good prediction performance and feature selection to find a subset of predictors that are truly associated with a clinical outcome such as a time-to-event endpoint. In clinical practice, this task is challenging since patient cohorts are typically small and can be heterogeneous with regard to their relationship between predictors and outcome. When data of several subgroups of patients with the same or similar disease are available, it is tempting to combine them to increase sample size, such as in multicenter studies. However, heterogeneity between subgroups can lead to biased results and subgroup-specific effects may remain undetected. For this situation, we propose a penalized Cox regression model with a weighted version of the Cox partial likelihood that includes patients of all subgroups but assigns them individual weights based on their subgroup affiliation. Patients who are likely to belong to the subgroup of interest obtain higher weights in the subgroup-specific model. Our proposed approach is evaluated through simulations and application to real lung cancer cohorts. Simulation results demonstrate that our model can achieve improved prediction and variable selection accuracy over standard approaches.Comment: under review, 15 pages, 6 figure

    Combining heterogeneous subgroups with graph-structured variable selection priors for Cox regression

    Full text link
    Important objectives in cancer research are the prediction of a patient's risk based on molecular measurements such as gene expression data and the identification of new prognostic biomarkers (e.g. genes). In clinical practice, this is often challenging because patient cohorts are typically small and can be heterogeneous. In classical subgroup analysis, a separate prediction model is fitted using only the data of one specific cohort. However, this can lead to a loss of power when the sample size is small. Simple pooling of all cohorts, on the other hand, can lead to biased results, especially when the cohorts are heterogeneous. For this situation, we propose a new Bayesian approach suitable for continuous molecular measurements and survival outcome that identifies the important predictors and provides a separate risk prediction model for each cohort. It allows sharing information between cohorts to increase power by assuming a graph linking predictors within and across different cohorts. The graph helps to identify pathways of functionally related genes and genes that are simultaneously prognostic in different cohorts. Results demonstrate that our proposed approach is superior to the standard approaches in terms of prediction performance and increased power in variable selection when the sample size is small.Comment: under review, 19 pages, 10 figure

    Cabbage and fermented vegetables : From death rate heterogeneity in countries to candidates for mitigation strategies of severe COVID-19

    Get PDF
    Large differences in COVID-19 death rates exist between countries and between regions of the same country. Some very low death rate countries such as Eastern Asia, Central Europe, or the Balkans have a common feature of eating large quantities of fermented foods. Although biases exist when examining ecological studies, fermented vegetables or cabbage have been associated with low death rates in European countries. SARS-CoV-2 binds to its receptor, the angiotensin-converting enzyme 2 (ACE2). As a result of SARS-CoV-2 binding, ACE2 downregulation enhances the angiotensin II receptor type 1 (AT(1)R) axis associated with oxidative stress. This leads to insulin resistance as well as lung and endothelial damage, two severe outcomes of COVID-19. The nuclear factor (erythroid-derived 2)-like 2 (Nrf2) is the most potent antioxidant in humans and can block in particular the AT(1)R axis. Cabbage contains precursors of sulforaphane, the most active natural activator of Nrf2. Fermented vegetables contain many lactobacilli, which are also potent Nrf2 activators. Three examples are: kimchi in Korea, westernized foods, and the slum paradox. It is proposed that fermented cabbage is a proof-of-concept of dietary manipulations that may enhance Nrf2-associated antioxidant effects, helpful in mitigating COVID-19 severity.Peer reviewe

    Nrf2-interacting nutrients and COVID-19 : time for research to develop adaptation strategies

    Get PDF
    There are large between- and within-country variations in COVID-19 death rates. Some very low death rate settings such as Eastern Asia, Central Europe, the Balkans and Africa have a common feature of eating large quantities of fermented foods whose intake is associated with the activation of the Nrf2 (Nuclear factor (erythroid-derived 2)-like 2) anti-oxidant transcription factor. There are many Nrf2-interacting nutrients (berberine, curcumin, epigallocatechin gallate, genistein, quercetin, resveratrol, sulforaphane) that all act similarly to reduce insulin resistance, endothelial damage, lung injury and cytokine storm. They also act on the same mechanisms (mTOR: Mammalian target of rapamycin, PPAR gamma:Peroxisome proliferator-activated receptor, NF kappa B: Nuclear factor kappa B, ERK: Extracellular signal-regulated kinases and eIF2 alpha:Elongation initiation factor 2 alpha). They may as a result be important in mitigating the severity of COVID-19, acting through the endoplasmic reticulum stress or ACE-Angiotensin-II-AT(1)R axis (AT(1)R) pathway. Many Nrf2-interacting nutrients are also interacting with TRPA1 and/or TRPV1. Interestingly, geographical areas with very low COVID-19 mortality are those with the lowest prevalence of obesity (Sub-Saharan Africa and Asia). It is tempting to propose that Nrf2-interacting foods and nutrients can re-balance insulin resistance and have a significant effect on COVID-19 severity. It is therefore possible that the intake of these foods may restore an optimal natural balance for the Nrf2 pathway and may be of interest in the mitigation of COVID-19 severity

    Survival models with selection of genomic covariates in heterogeneous cancer studies

    No full text
    Building a risk prediction model for a specific subgroup of patients based on high-dimensional molecular measurements such as gene expression data is an important current field of biostatistical research. Major objectives in modeling high-dimensional data are good prediction performance and finding a subset of covariates that are truly relevant to the outcome (here: time-to-event endpoint). The latter requires variable selection to obtain a sparse, interpretable model solution. In this thesis, one further objective in modeling is taking into account heterogeneity in data due to known subgroups of patients that may differ in their relationship between genomic covariates and survival outcome. We consider multiple cancer studies as subgroups, however, our approaches can be applied to any other subgroups, for example, defined by clinical covariates. We aim at providing a separate prediction model for each subgroup that allows the identification of common as well as subgroup-specific effects and has improved prediction accuracy over standard approaches. Standard subgroup analysis includes only patients of the subgroup of interest and may lead to a loss of power when sample size is small, whereas standard combined analysis simply pools patients of all subgroups and may suffer from biased results and averaging of subgroup-specific effects. To overcome these drawbacks, we propose two different statistical models that allow sharing information between subgroups to increase power when this is supported by data. One approach is a classical frequentist Cox proportional hazards model with a lasso penalty for variable selection and a weighted version of the Cox partial likelihood that includes patients of all subgroups but assigns them individual weights based on their subgroup affiliation. Patients who fit well to the subgroup of interest receive higher weights in the subgroup-specific model. The other approach is a novel Bayesian Cox model that uses a stochastic search variable selection prior with latent indicators of variable inclusion. We assume a sparse graphical model that links genes within subgroups and the same genes across different subgroups. This graph structure is not known a priori and inferred simultaneously with the important variables of each subgroup. Both approaches are evaluated through extensive simulations and applied to real lung cancer studies. Simulation results demonstrate that our proposed models can achieve improved prediction and variable selection accuracy over standard subgroup models when sample size is low. As expected, the standard combined model only identifies common effects but fails to detect subgroup-specific effects

    Model-Based Optimization of Subgroup Weights for Survival Analysis

    No full text
    To obtain a reliable prediction model for a specific cancer subgroup or cohort is often difficult due to the limited number of samples and, in survival analysis, even more due to potentially high censoring rates. Sometimes similar datasets are available for other patient subgroups with the same or a similar disease and treatment, e.g., from other clinical centers. Simple pooling of all subgroups can decrease the variance of the predicted parameters of the prediction models, but also increase the bias due to potential high heterogeneity between the cohorts. A promising compromise is to identify which subgroups are similar enough to the specific subgroup of interest and then include only these for model building. Similarity here refers to the relationship between input and output in the prediction model, and not necessarily to the distributions of the input and output variables themselves. Here, we propose a subgroup-based weighted likelihood approach and evaluate it on a set of lung cancer cohorts. When interested in a prediction model for a specific subgroup, then for every other subgroup, an individual weight determines the strength with which its observations enter into the likelihood-based optimization of the model parameters. A weight close to 0 indicates that a subgroup should be discarded, and a weight close to 1 indicates that the subgroup fully enters into the model building process. MBO (model based optimization) can be used to quickly find a good prediction model in the presence of a large number of hyperparameters to be tuned. Here, we use MBO to identify the best model for survival prediction in lung cancer subgroups, where besides the parameters of a Cox model additionally the individual values of the subgroup weights are optimized. Interestingly, often the resulting models with highest prediction quality are obtained for a mixed weight structure, i.e. both weights close to 0, weights close to 1, and medium weights are optimal, reflecting the similarity of the corresponding cancer subgroups

    Combining heterogeneous subgroups with graph-structured variable selection priors for Cox regression

    No full text
    Background Important objectives in cancer research are the prediction of a patient’s risk based on molecular measurements such as gene expression data and the identification of new prognostic biomarkers (e.g. genes). In clinical practice, this is often challenging because patient cohorts are typically small and can be heterogeneous. In classical subgroup analysis, a separate prediction model is fitted using only the data of one specific cohort. However, this can lead to a loss of power when the sample size is small. Simple pooling of all cohorts, on the other hand, can lead to biased results, especially when the cohorts are heterogeneous. Results We propose a new Bayesian approach suitable for continuous molecular measurements and survival outcome that identifies the important predictors and provides a separate risk prediction model for each cohort. It allows sharing information between cohorts to increase power by assuming a graph linking predictors within and across different cohorts. The graph helps to identify pathways of functionally related genes and genes that are simultaneously prognostic in different cohorts. Conclusions Results demonstrate that our proposed approach is superior to the standard approaches in terms of prediction performance and increased power in variable selection when the sample size is small

    Epsin Family Member 3 and Ribosome-Related Genes Are Associated with Late Metastasis in Estrogen Receptor-Positive Breast Cancer and Long-Term Survival in Non-Small Cell Lung Cancer Using a Genome-Wide Identification and Validation Strategy.

    No full text
    In breast cancer, gene signatures that predict the risk of metastasis after surgical tumor resection are mainly indicative of early events. The purpose of this study was to identify genes linked to metastatic recurrence more than three years after surgery.Affymetrix HG U133A and Plus 2.0 array datasets with information on metastasis-free, disease-free or overall survival were accessed via public repositories. Time restricted Cox regression models were used to identify genes associated with metastasis during or after the first three years post-surgery (early- and late-type genes). A sequential validation study design, with two non-adjuvantly treated discovery cohorts (n = 409) and one validation cohort (n = 169) was applied and identified genes were further evaluated in tamoxifen-treated breast cancer patients (n = 923), as well as in patients with non-small cell lung (n = 1779), colon (n = 893) and ovarian (n = 922) cancer.Ten late- and 243 early-type genes were identified in adjuvantly untreated breast cancer. Adjustment to clinicopathological factors and an established proliferation-related signature markedly reduced the number of early-type genes to 16, whereas nine late-type genes still remained significant. These nine genes were associated with metastasis-free survival (MFS) also in a non-time restricted model, but not in the early period alone, stressing that their prognostic impact was primarily based on MFS more than three years after surgery. Four of the ten late-type genes, the ribosome-related factors EIF4B, RPL5, RPL3, and the tumor angiogenesis modifier EPN3 were significantly associated with MFS in the late period also in a meta-analysis of tamoxifen-treated breast cancer cohorts. In contrast, only one late-type gene (EPN3) showed consistent survival associations in more than one cohort in the other cancer types, being associated with worse outcome in two non-small cell lung cancer cohorts. No late-type gene was validated in ovarian and colon cancer.Ribosome-related genes were associated with decreased risk of late metastasis in both adjuvantly untreated and tamoxifen-treated breast cancer patients. In contrast, high expression of epsin (EPN3) was associated with increased risk of late metastasis. This is of clinical relevance considering the well-understood role of epsins in tumor angiogenesis and the ongoing development of epsin antagonizing therapies

    Integrative analysis of genome-wide gene copy number changes and gene expression in non-small cell lung cancer

    No full text
    Non-small cell lung cancer (NSCLC) represents a genomically unstable cancer type with extensive copy number aberrations. The relationship of gene copy number alterations and subsequent mRNA levels has only fragmentarily been described. The aim of this study was to conduct a genome-wide analysis of gene copy number gains and corresponding gene expression levels in a clinically well annotated NSCLC patient cohort (n = 190) and their association with survival. While more than half of all analyzed gene copy number-gene expression pairs showed statistically significant correlations (10,296 of 18,756 genes), high correlations, with a correlation coefficient >0.7, were obtained only in a subset of 301 genes (1.6%), including KRAS, EGFR and MDM2. Higher correlation coefficients were associated with higher copy number and expression levels. Strong correlations were frequently based on few tumors with high copy number gains and correspondingly increased mRNA expression. Among the highly correlating genes, GO groups associated with posttranslational protein modifications were particularly frequent, including ubiquitination and neddylation. In a meta-analysis including 1,779 patients we found that survival associated genes were overrepresented among highly correlating genes (61 of the 301 highly correlating genes, FDR adjusted p<0.05). Among them are the chaperone CCT2, the core complex protein NUP107 and the ubiquitination and neddylation associated protein CAND1. In conclusion, in a comprehensive analysis we described a distinct set of highly correlating genes. These genes were found to be overrepresented among survival-associated genes based on gene expression in a large collection of publicly available datasets

    Dynamic Metabolic and Transcriptional Responses of Proteasome-Inhibited Neurons

    No full text
    Proteasome inhibition is associated with parkinsonian pathology in vivo and degeneration of dopaminergic neurons in vitro. We explored here the metabolome (386 metabolites) and transcriptome (3257 transcripts) regulations of human LUHMES neurons, following exposure to MG-132 [100 nM]. This proteasome inhibitor killed cells within 24 h but did not reduce viability for 12 h. Overall, 206 metabolites were changed in live neurons. The early (3 h) metabolome changes suggested a compromised energy metabolism. For instance, AMP, NADH and lactate were up-regulated, while glycolytic and citric acid cycle intermediates were down-regulated. At later time points, glutathione-related metabolites were up-regulated, most likely by an early oxidative stress response and activation of NRF2/ATF4 target genes. The transcriptome pattern confirmed proteostatic stress (fast up-regulation of proteasome subunits) and also suggested the progressive activation of additional stress response pathways. The early ones (e.g., HIF-1, NF-kB, HSF-1) can be considered a cytoprotective cellular counter-regulation, which maintained cell viability. For instance, a very strong up-regulation of AIFM2 (=FSP1) may have prevented fast ferroptotic death. For most of the initial period, a definite life–death decision was not taken, as neurons could be rescued for at least 10 h after the start of proteasome inhibition. Late responses involved p53 activation and catabolic processes such as a loss of pyrimidine synthesis intermediates. We interpret this as a phase of co-occurrence of protective and maladaptive cellular changes. Altogether, this combined metabolomics–transcriptomics analysis informs on responses triggered in neurons by proteasome dysfunction that may be targeted by novel therapeutic intervention in Parkinson’s disease
    corecore