2 research outputs found

    Investigating How Reproducibility and Geometrical Representation in UMAP Dimensionality Reduction Impact the Stratification of Breast Cancer Tumors

    No full text
    Advances in next-generation sequencing have provided high-dimensional RNA-seq datasets, allowing the stratification of some tumor patients based on their transcriptomic profiles. Machine learning methods have been used to reduce and cluster high-dimensional data. Recently, uniform manifold approximation and projection (UMAP) was applied to project genomic datasets in low-dimensional Euclidean latent space. Here, we evaluated how different representations of the UMAP embedding can impact the analysis of breast cancer (BC) stratification. We projected BC RNA-seq data on Euclidean, spherical, and hyperbolic spaces, and stratified BC patients via clustering algorithms. We also proposed a pipeline to yield more reproducible clustering outputs. The results show how the selection of the latent space can affect downstream stratification results and suggest that the exploration of different geometrical representations is recommended to explore data structure and samples’ relationships

    Investigating How Reproducibility and Geometrical Representation in UMAP Dimensionality Reduction Impact the Stratification of Breast Cancer Tumors

    No full text
    Advances in next-generation sequencing have provided high-dimensional RNA-seq datasets, allowing the stratification of some tumor patients based on their transcriptomic profiles. Machine learning methods have been used to reduce and cluster high-dimensional data. Recently, uniform manifold approximation and projection (UMAP) was applied to project genomic datasets in low-dimensional Euclidean latent space. Here, we evaluated how different representations of the UMAP embedding can impact the analysis of breast cancer (BC) stratification. We projected BC RNA-seq data on Euclidean, spherical, and hyperbolic spaces, and stratified BC patients via clustering algorithms. We also proposed a pipeline to yield more reproducible clustering outputs. The results show how the selection of the latent space can affect downstream stratification results and suggest that the exploration of different geometrical representations is recommended to explore data structure and samples’ relationships
    corecore