45 research outputs found
Hyperparameter search experiment.
Three hyperparameters significantly influence the clustering performance of AttentionAE-sc: the number of attention heads in the information fusion blocks (a), the resolution parameters in the Leiden algorithm (b), and the number of highly variable genes (c). (TIF)</p
Experiments in BRCA dataset.
a. UMAP visualization of the clustering results on the BRCA datasets. b. Comparison of cell distribution between the predicted clusters and the ground truth cell lines. In the heat map, each row represents a class of cells from the same cell line, and each column represents a class of cells from the same predicted cluster. There was generally a one-to-one correspondence between cell lines and predicted clusters for ideal clustering results. c. Expression dot plot of identified DEGs in different cell lines. Based on the predicted cell labels, DEGs of all clusters were calculated by the Wilcoxon test (cluster 15 only contains 30 cells, so it was excluded). Top 1 DEGs of predicted clusters were selected as the potential marker genes. d. The pot plot was drawn to visualize the expression in each cell line and the most of cell lines can be distinguished obtained DEGs. BCAS3 was specifically expressed in cluster 18, i.e. MCF7 and KPL1. e. The UMAP visualization of the sub-clustering results on the cluster 18 by AttentionAE-sc, which was consist of these two cell lines.</p
AttentionAE-sc construction better relationships among cells (Muraro).
a. The trend of the number of cell groups predicted and ARI scores changing with training epochs during the clustering stage. The better clustering performance was obtained, when the cluster centers were adaptively chosen with each iteration and redundant cluster centers were discarded. Other datasets were shown in S4 Fig. b. The visualization of cell embeddings. c. the heatmap of relative cosine distance between cells calculated by the embeddings of multi-head attention layer. d. the heatmap of relative cosine distance between cells calculated from the input expression matrix. On the right subfigure, we sorted the dataset follow per cell groups to get an intuitive visualization of among cells distance (c, d, e). e. the heatmap of relative cosine distance between cells calculated by ordinary denoising autoencoder (DAE) and the corresponding visualization.</p
Model architecture.
In the train stage, a graph autoencoder (GAE) and a denoising autoencoder (DAE) are constructed simultaneously. The multi-head attention blocks are used to combine the denoising embedding and the topological embedding. In the clustering stage, the prediction labels of each cell are obtained by a self-optimizing clustering.</p
Visualization of the experiments in the BRCA dataset.
a. Venn diagram comparing the difference of highly variable genes after the preprocessing process between original clustering stage and sub-clustering stage. b. The results of KEGG enrichment analysis in the cluster 6 and cluster 31 based on the acquired DEGs in the S2 Table. In the histogram, the dotted line represents a threshold that the corrected p-value is 0.05 and the higher scores indicates significant enrichment. (TIF)</p
Hyperparameters search and the ablation experiment.
a. Comparison of different methods for cell connectivity calculations. When compared to the UMAP method, the ’gauss’ method demonstrated superior performance. b. Comparison of different settings for the number of attention heads. c. Comparison of different settings for the resolution parameter in the Leiden algorithm. d. Comparison of different settings for the number of highly variable genes. e. Model ablation experiment of AttentionAE-sc in 8 scRNA-seq datasets. Four conditions were tested, including the absence of the information fusion block (wo attn), the absence of the ZINB loss function (wo zinb), the absence of residual connections (wo res), and the absence of GAE (wo gnn).</p
The clustering performance of AttentionAE-sc and baseline methods in 16 scRNA-seq datasets.
a. NMI scores of AttentionAE-sc and 9 baseline methods. b. Silhouette scores of 6 community detection-based methods (including AttentionAE-sc). c. Davies-Bouldin score of 6 community detection-based methods (including AttentionAE-sc). Arithmetic mean is taken as results of each dataset after running each method five times under different random seeds. Methods that need to specify the number of clusters are marked with an asterisk (*). (TIF)</p
The visualization of the weight of attention blocks. Each head contains a <i>n</i> Ă— <i>n</i> matrix (<i>n</i> is the number of cells), which represents the attention scores between each pair of cells.
The visualization of the weight of attention blocks. Each head contains a n Ă— n matrix (n is the number of cells), which represents the attention scores between each pair of cells.</p
Model stability and robustness test in scRNA-seq datasets.
a. The ARI score of clustering results with random manual dropout rate of gene expression values set as 10%, 20%, 30% and 50% in the scRNA-seq datasets. All experiments run 5 times with different random seeds. Base (the pink box) is the performance of AttentionAE-sc without manual dropout. Compared AttentionAE-sc (drop 10 to drop 50, the blue box) with ordinary DAE (DAE 10 to DAE 50, the grey box), AttentionAE-sc is less influenced. b. The ARI score of clustering results with random and stratified input down-sampling rate of cells set as 20%, 40%, 60% and 80% respectively in the scRNA-seq datasets. All experiments run 5 times with different random seeds. Base (the pink box) is the performance of AttentionAE-sc with the whole scRNA-seq datasets as input. The average score is displayed below the x label on each box. (TIF)</p
The variation trend of the number of cell types predicted and the result scores in the clustering stage of 15 scRNA-seq datasets (Muraro was shown in Fig 4A).
The variation trend of the number of cell types predicted and the result scores in the clustering stage of 15 scRNA-seq datasets (Muraro was shown in Fig 4A).</p