7 research outputs found

    Computational approaches for single-cell omics and multi-omics data

    Get PDF
    Single-cell omics and multi-omics technologies have enabled the study of cellular heterogeneity with unprecedented resolution and the discovery of new cell types. The core of identifying heterogeneous cell types, both existing and novel ones, relies on efficient computational approaches, including especially cluster analysis. Additionally, gene regulatory network analysis and various integrative approaches are needed to combine data across studies and different multi-omics layers. This thesis comprehensively compared Bayesian clustering models for single-cell RNAsequencing (scRNA-seq) data and selected integrative approaches were used to study the cell-type specific gene regulation of uterus. Additionally, single-cell multi-omics data integration approaches for cell heterogeneity analysis were investigated. Article I investigated analytical approaches for cluster analysis in scRNA-seq data, particularly, latent Dirichlet allocation (LDA) and hierarchical Dirichlet process (HDP) models. The comparison of LDA and HDP together with the existing state-of-art methods revealed that topic modeling-based models can be useful in scRNA-seq cluster analysis. Evaluation of the cluster qualities for LDA and HDP with intrinsic and extrinsic cluster quality metrics indicated that the clustering performance of these methods is dataset dependent. Article II and Article III focused on cell-type specific integrative analysis of uterine or decidual stromal (dS) and natural killer (dNK) cells that are important for successful pregnancy. Article II integrated the existing preeclampsia RNA-seq studies of the decidua together with recent scRNA-seq datasets in order to investigate cell-type-specific contributions of early onset preeclampsia (EOP) and late onset preeclampsia (LOP). It was discovered that the dS marker genes were enriched for LOP downregulated genes and the dNK marker genes were enriched for upregulated EOP genes. Article III presented a gene regulatory network analysis for the subpopulations of dS and dNK cells. This study identified novel subpopulation specific transcription factors that promote decidualization of stromal cells and dNK mediated maternal immunotolerance. In Article IV, different strategies and methodological frameworks for data integration in single-cell multi-omics data analysis were reviewed in detail. Data integration methods were grouped into early, late and intermediate data integration strategies. The specific stage and order of data integration can have substantial effect on the results of the integrative analysis. The central details of the approaches were presented, and potential future directions were discussed.  Laskennallisia menetelmiä yksisolusekvensointi- ja multiomiikkatulosten analyyseihin Yksisolusekvensointitekniikat mahdollistavat solujen heterogeenisyyden tutkimuksen ennennäkemättömällä resoluutiolla ja uusien solutyyppien löytämisen. Solutyyppien tunnistamisessa keskeisessä roolissa on ryhmittely eli klusterointianalyysi. Myös geenien säätelyverkostojen sekä eri molekyylidatatasojen yhdistäminen on keskeistä analyysissä. Väitöskirjassa verrataan bayesilaisia klusterointimenetelmiä ja yhdistetään eri menetelmillä kerättyjä tietoja kohdun solutyyppispesifisessä geeninsäätelyanalyysissä. Lisäksi yksisolutiedon integraatiomenetelmiä selvitetään kattavasti. Julkaisu I keskittyy analyyttisten menetelmien, erityisesti latenttiin Dirichletallokaatioon (LDA) ja hierarkkiseen Dirichlet-prosessiin (HDP) perustuvien mallien tutkimiseen yksisoludatan klusterianalyysissä. Kattava vertailu näiden kahden mallin sekä olemassa olevien menetelmien kanssa paljasti, että aihemallinnuspohjaiset menetelmät voivat olla hyödyllisiä yksisoludatan klusterianalyysissä. Menetelmien suorituskyky riippui myös kunkin analysoitavan datasetin ominaisuuksista. Julkaisuissa II ja III keskitytään naisen lisääntymisterveydelle tärkeiden kohdun stroomasolujen ja NK-immuunisolujen solutyyppispesifiseen analyysiin. Artikkelissa II yhdistettiin olemassa olevia tuloksia pre-eklampsiasta viimeisimpiin yksisolusekvensointituloksiin ja löydettiin varhain alkavan pre-eklampsian (EOP) ja myöhään alkavan pre-eklampsian (LOP) solutyyppispesifisiä vaikutuksia. Havaittiin, että erilaistuneen strooman markkerigeenien ilmentyminen vähentyi LOP:ssa ja NK-markkerigeenien ilmentyminen lisääntyi EOP:ssa. Julkaisu III analysoi strooman ja NK-solujen alapopulaatiospesifisiä geeninsäätelyverkostoja ja niiden transkriptiofaktoreita. Tutkimus tunnisti uusia alapopulaatiospesifisiä säätelijöitä, jotka edistävät strooman erilaistumista ja NK-soluvälitteistä immunotoleranssia Julkaisu IV tarkastelee yksityiskohtaisesti strategioita ja menetelmiä erilaisten yksisoludatatasojen (multi-omiikka) integroimiseksi. Integrointimenetelmät ryhmiteltiin varhaisen, myöhäisen ja välivaiheen strategioihin ja kunkin lähestymistavan menetelmiä esiteltiin tarkemmin. Lisäksi keskusteltiin mahdollisista tulevaisuuden suunnista

    Computational functional prediction of novel long noncoding RNA in TCGA Glioblastoma multiforme sample

    Get PDF
    According to international human genome sequencing consortium 2004[43], it was known that only less than 2% of the total human genome code for proteins. This ignited quite a surprise in the scientific community. Since then, a lot of researchers are attracted towards the noncoding part of the genome. There are explosion of researches addressing the role of the 98% of the human untranslated regions of the genome. This shows that the transcription is not only limited to the protein coding regions of the genome rather more than 90% of the genome are likely to be transcribed. [43] This will result in the transcription of tens and thousands of the long noncoding RNAs (lncRNAs) with little or no coding potential. However, the molecular mechanism and function of long noncoding RNAs are still an open research topic. Although the functions of limited lncRNAs are identified, there is still a gap in identifying the function of novel lncRNAs. This project implements different computational methods to predict the function of novel lncRNAs identified from TCGA glioblastoma multiforme samples. The methods used in this functional prediction include both expression and sequence-based analysis approach. In expression-based analysis, the co-expressing genes with lncRNAs are used to predict the possible functional relation. In sequence based analysis, the gene-protein and lncRNA-protein interactions together with miRNA-lncRNA interactions are considered towards the possible functional predictions. The result from the integrated functional prediction on the novel lncRNAs show that TCGA_gbm3-153501 novel lncRNA which is co-expressed together with the THBS1 gene with correlation coefficient of more that 0.5 is predicted to function in cell-cell and cell-to-matrix interactions, platelet aggregation, angiogenesis, and tumorigenesis. [202] MSI1, RBM3 and RBM8A are RNA binding proteins (RBPs) that have binding site on both the first top five differentially expressed lncRNAs which are TCGA_gbm-2-104096501, TCGA_gbm-3-153501, TCGA_gbm-5-63687001 and TCGA_gbm-17-10671251 and IGF2 which is among the top 10 differentially expressed genes. Therefore, these lncRNAs are predicted to have functional role in cell proliferation and maintenance of stem cells in the central nervous system

    Dirichlet process mixture models for single-cell RNA-seq clustering

    Get PDF
    Clustering of cells based on gene expression is one of the major steps in single-cell RNA-sequencing (scRNA-seq) data analysis. One key challenge in cluster analysis is the unknown number of clusters and, for this issue, there is still no comprehensive solution. To enhance the process of defining meaningful cluster resolution, we compare Bayesian latent Dirichlet allocation (LDA) method to its non-parametric counterpart, hierarchical Dirichlet process (HDP) in the context of clustering scRNA-seq data. A potential main advantage of HDP is that it does not require the number of clusters as an input parameter from the user. While LDA has been used in single-cell data analysis, it has not been compared in detail with HDP. Here, we compare the cell clustering performance of LDA and HDP using four scRNA-seq datasets (immune cells, kidney, pancreas and decidua/placenta), with a specific focus on cluster numbers. Using both intrinsic (DB-index) and extrinsic (ARI) cluster quality measures, we show that the performance of LDA and HDP is dataset dependent. We describe a case where HDP produced a more appropriate clustering compared to the best performer from a series of LDA clusterings with different numbers of clusters. However, we also observed cases where the best performing LDA cluster numbers appropriately capture the main biological features while HDP tended to inflate the number of clusters. Overall, our study highlights the importance of carefully assessing the number of clusters when analyzing scRNA-seq data.</p

    Computational strategies for single-cell multi-omics integration

    Get PDF
    Single-cell omics technologies are currently solving biological and medical problems that earlier have remained elusive, such as discovery of new cell types, cellular differentiation trajectories and communication networks across cells and tissues. Current advances especially in single-cell multi-omics hold high potential for breakthroughs by integration of multiple different omics layers. To pair with the recent biotechnological developments, many computational approaches to process and analyze single-cell multi-omics data have been proposed. In this review, we first introduce recent developments in single-cell multi-omics in general and then focus on the available data integration strategies. The integration approaches are divided into three categories: early, intermediate, and late data integration. For each category, we describe the underlying conceptual principles and main characteristics, as well as provide examples of currently available tools and how they have been applied to analyze single-cell multi-omics data. Finally, we explore the challenges and prospective future directions of single-cell multi-omics data integration, including examples of adopting multi-view analysis approaches used in other disciplines to single-cell multi-omics.</p

    Ribosome-Targeting Antibiotics Impair T Cell Effector Function and Ameliorate Autoimmunity by Blocking Mitochondrial Protein Synthesis

    Get PDF
    While antibiotics are intended to specifically target bacteria, most are known to affect host cell physiology. In addition, some antibiotic classes are reported as immunosuppressive for reasons that remain unclear. Here, we show that Linezolid, a ribosomal-targeting antibiotic (RAbo), effectively blocked the course of a T cell mediated autoimmune disease. Linezolid and other RAbos were strong inhibitors of T helper-17 cell effector function in vitro, showing that this effect was independent of their antibiotic activity. Perturbing mitochondria! translation in differentiating T cells, either with RAbos or through the inhibition of mitochondria! elongation factor G1 (mEF-G1) progressively compromised the integrity of the electron transport chain. Ultimately, this led to deficient oxidative phosphorylation, diminishing nicotinamide adenine dinucleotide concentrations and impairing cytokine production in differentiating T cells. In accordance, mice lacking mEF-G1 in T cells were protected from experimental autoimmune encephalomyelitis, demonstrating that this pathway is crucial in maintaining T cell function and pathogenicity

    Computational functional prediction of novel long noncoding RNA in TCGA Glioblastoma multiforme sample

    Get PDF
    According to international human genome sequencing consortium 2004[43], it was known that only less than 2% of the total human genome code for proteins. This ignited quite a surprise in the scientific community. Since then, a lot of researchers are attracted towards the noncoding part of the genome. There are explosion of researches addressing the role of the 98% of the human untranslated regions of the genome. This shows that the transcription is not only limited to the protein coding regions of the genome rather more than 90% of the genome are likely to be transcribed. [43] This will result in the transcription of tens and thousands of the long noncoding RNAs (lncRNAs) with little or no coding potential. However, the molecular mechanism and function of long noncoding RNAs are still an open research topic. Although the functions of limited lncRNAs are identified, there is still a gap in identifying the function of novel lncRNAs. This project implements different computational methods to predict the function of novel lncRNAs identified from TCGA glioblastoma multiforme samples. The methods used in this functional prediction include both expression and sequence-based analysis approach. In expression-based analysis, the co-expressing genes with lncRNAs are used to predict the possible functional relation. In sequence based analysis, the gene-protein and lncRNA-protein interactions together with miRNA-lncRNA interactions are considered towards the possible functional predictions. The result from the integrated functional prediction on the novel lncRNAs show that TCGA_gbm3-153501 novel lncRNA which is co-expressed together with the THBS1 gene with correlation coefficient of more that 0.5 is predicted to function in cell-cell and cell-to-matrix interactions, platelet aggregation, angiogenesis, and tumorigenesis. [202] MSI1, RBM3 and RBM8A are RNA binding proteins (RBPs) that have binding site on both the first top five differentially expressed lncRNAs which are TCGA_gbm-2-104096501, TCGA_gbm-3-153501, TCGA_gbm-5-63687001 and TCGA_gbm-17-10671251 and IGF2 which is among the top 10 differentially expressed genes. Therefore, these lncRNAs are predicted to have functional role in cell proliferation and maintenance of stem cells in the central nervous system

    Ribosome-Targeting Antibiotics Impair T Cell Effector Function and Ameliorate Autoimmunity by Blocking Mitochondrial Protein Synthesis.

    Get PDF
    While antibiotics are intended to specifically target bacteria, most are known to affect host cell physiology. In addition, some antibiotic classes are reported as immunosuppressive for reasons that remain unclear. Here, we show that Linezolid, a ribosomal-targeting antibiotic (RAbo), effectively blocked the course of a T cell-mediated autoimmune disease. Linezolid and other RAbos were strong inhibitors of T helper-17 cell effector function in vitro, showing that this effect was independent of their antibiotic activity. Perturbing mitochondrial translation in differentiating T cells, either with RAbos or through the inhibition of mitochondrial elongation factor G1 (mEF-G1) progressively compromised the integrity of the electron transport chain. Ultimately, this led to deficient oxidative phosphorylation, diminishing nicotinamide adenine dinucleotide concentrations and impairing cytokine production in differentiating T cells. In accordance, mice lacking mEF-G1 in T cells were protected from experimental autoimmune encephalomyelitis, demonstrating that this pathway is crucial in maintaining T cell function and pathogenicity