31 research outputs found

    Memory CD4+ T cells are generated in the human fetal intestine

    Get PDF
    The fetus is thought to be protected from exposure to foreign antigens, yet CD45RO+ T cells reside in the fetal intestine. Here we combined functional assays with mass cytometry, single-cell RNA sequencing and high-throughput T cell antigen receptor (TCR) sequencing to characterize the CD4+ T cell compartment in the human fetal intestine. We identified 22 CD4+ T cell clusters, including naive-like, regulatory-like and memory-like subpopulations, which were confirmed and further characterized at the transcriptional level. Memory-like CD4+ T cells had high expression of Ki-67, indicative of cell division, and CD5, a surrogate marker of TCR avidity, and produced the cytokines IFN-γ and IL-2. Pathway analysis revealed a differentiation trajectory associated with cellular activation and proinflammatory effector functions, and TCR repertoire analysis indicated clonal expansions, distinct repertoire characteristics and interconnections between subpopulations of memory-like CD4+ T cells. Imaging mass cytometry indicated that memory-like CD4+ T cells colocalized with antigen-presenting cells. Collectively, these results provide evidence for the generation of memory-like CD4+ T cells in the human fetal intestine that is consistent with exposure to foreign antigens

    An in-depth comparison of linear and non-linear joint embedding methods for bulk and single-cell multi-omics

    Get PDF
    Multi-omic analyses are necessary to understand the complex biological processes taking place at the tissue and cell level, but also to make reliable predictions about, for example, disease outcome. Several linear methods exist that create a joint embedding using paired information per sample, but recently there has been a rise in the popularity of neural architectures that embed paired -omics into the same non-linear manifold. This work describes a head-to-head comparison of linear and non-linear joint embedding methods using both bulk and single-cell multi-modal datasets. We found that non-linear methods have a clear advantage with respect to linear ones for missing modality imputation. Performance comparisons in the downstream tasks of survival analysis for bulk tumor data and cell type classification for single-cell data lead to the following insights: First, concatenating the principal components of each modality is a competitive baseline and hard to beat if all modalities are available at test time. However, if we only have one modality available at test time, training a predictive model on the joint space of that modality can lead to performance improvements with respect to just using the unimodal principal components. Second, -omic profiles imputed by neural joint embedding methods are realistic enough to be used by a classifier trained on real data with limited performance drops. Taken together, our comparisons give hints to which joint embedding to use for which downstream task. Overall, product-of-experts performed well in most tasks and was reasonably fast, while early integration (concatenation) of modalities did quite poorly.</p

    Benchmarking variational AutoEncoders on cancer transcriptomics data.

    Get PDF
    Deep generative models, such as variational autoencoders (VAE), have gained increasing attention in computational biology due to their ability to capture complex data manifolds which subsequently can be used to achieve better performance in downstream tasks, such as cancer type prediction or subtyping of cancer. However, these models are difficult to train due to the large number of hyperparameters that need to be tuned. To get a better understanding of the importance of the different hyperparameters, we examined six different VAE models when trained on TCGA transcriptomics data and evaluated on the downstream tasks of cluster agreement with cancer subtypes and survival analysis. We studied the effect of the latent space dimensionality, learning rate, optimizer, initialization and activation function on the quality of subsequent downstream tasks on the TCGA samples. We found β-TCVAE and DIP-VAE to have a good performance, on average, despite being more sensitive to hyperparameters selection. Based on these experiments, we derived recommendations for selecting the different hyperparameters settings. To ensure generalization, we tested all hyperparameter configurations on the GTEx dataset. We found a significant correlation (ρ = 0.7) between the hyperparameter effects on clustering performance in the TCGA and GTEx datasets. This highlights the robustness and generalizability of our recommendations. In addition, we examined whether the learned latent spaces capture biologically relevant information. Hereto, we measured the correlation and mutual information of the different representations with various data characteristics such as gender, age, days to metastasis, immune infiltration, and mutation signatures. We found that for all models the latent factors, in general, do not uniquely correlate with one of the data characteristics nor capture separable information in the latent factors even for models specifically designed for disentanglement

    Performance of VAE models in downstream tasks.

    No full text
    A) Clustering performance (ARI, y−axis, the higher the better) in the latent space of each model (x−axis) compared to the true cancer type on the TCGA dataset. Each box represents the distribution of scores obtained for different hyperparameter settings within a specific VAE model. The middle line corresponds to the mean, while the edges of the box represent the first and third quartiles. B) As in A) but for the supervised task of predicting overall survival. Performance is measured by the AIC (y−axis, the lower the better) and the dashed red line indicates the baseline model performance using the covariates only (i.e., age, gender and cancer types).</p

    DIP-VAE validation loss vs downstream tasks performance.

    No full text
    Scatter plot for the 85th percentile of the validation loss of different hyperparameters configurations of DIP-VAE vs A)ARI, B) AIC. The selection of the 85th percentile was motivated by the observation that this particular model tends to generate a higher number of outliers compared to others. Each dot is a different configuration, and they are colored after the latent space dimensions variable. (TIF)</p

    Validation loss does not reflect downstream performance.

    No full text
    Plotting the 90th percentile (i.e., excluding the highest 10%) of the of the validation loss (y−axis) for the different vanilla VAE hyperparameters configurations vs: A) the ARI (x−axis, the higher the better) and B) the AIC (x−axis, the lower the better). The figure shows a correlation between the validation loss and ARI & AIC, however, different configurations with the same validation loss can have different scores. The blue line shows the regression line and its thickness indicates the 95% confidence interval. The dots are colored after the latent space dimensions variable. C/D) UMAP visualization of the TCGA data embedded into the learned latent space for a Vanilla VAE configuration. C) For a configuration with a validation loss ≈ 1 and an ARI score ≈ 0 (good model fit, poor clustering ability). D) For a configuration with a validation loss ≈ 1 and an ARI score of ≈ 0.72 (good model fit, and good clustering performance).</p

    Viability of different hyperparameter combinations for the different VAE models.

    No full text
    Columns represent the different hyperparameters. Each bar within a column represents a specific setting of a hyperparameter. The blue color indicates the number of successful configurations, while the red color represents the number of failed configurations. The vertical axis displays the distribution of failed configurations for each specific setting among the 6,480 tested configurations.</p

    <i>β</i>-TCVAE validation loss vs downstream tasks performance.

    No full text
    Scatter plot for the 90th percentile of the validation loss of different hyperparameters configurations of β-TCVAE vs A)ARI, B) AIC. Each dot is a different configuration, and they are colored after the latent space dimensions variable. (TIF)</p

    Vanilla VAE downstream tasks performance agreement.

    No full text
    Scatter plot for the clustering performance measured in ARI (y−axis, the higher the better) and survival analysis performance measured in AIC (x−axis, the lower the better). The figure demonstrates the concordance between these two measures, indicating that models with higher ARI tend to have lower AIC. The top left quarter of the plot represents the best performing models across both clustering and survival analysis tasks. Blue line represents the lowess curve fitting for the data. (TIF)</p

    Spearman correlation between different models validation loss and ARI, AIC.

    No full text
    The absolute rounded Spearman correlation between all the different configurations tested for each model and both ARI and AIC values achieved by this model in the downstream task. (PDF)</p
    corecore