19 research outputs found

    An integrative approach for building personalized gene regulatory networks for precision medicine

    Get PDF
    Only a small fraction of patients respond to the drug prescribed to treat their disease, which means that most are at risk of unnecessary exposure to side effects through ineffective drugs. This inter-individual variation in drug response is driven by differences in gene interactions caused by each patient's genetic background, environmental exposures, and the proportions of specific cell types involved in disease. These gene interactions can now be captured by building gene regulatory networks, by taking advantage of RNA velocity (the time derivative of the gene expression state), the ability to study hundreds of thousands of cells simultaneously, and the falling price of single-cell sequencing. Here, we propose an integrative approach that leverages these recent advances in single-cell data with the sensitivity of bulk data to enable the reconstruction of personalized, cell-type- and context-specific gene regulatory networks. We expect this approach will allow the prioritization of key driver genes for specific diseases and will provide knowledge that opens new avenues towards improved personalized healthcare

    SCTIGER: A DEEP-LEARNING METHOD FOR INFERRING GENE REGULATORY NETWORKS FROM SINGLE-CELL GENE EXPRESSION DATA

    Get PDF
    Inferring gene regulatory networks (GRNs) from single-cell RNA-sequencing (scRNA-seq) data is an important computational question to reveal fundamental regulatory mechanisms. Although many computational methods have been designed to predict GRNs, none work on condition specific GRNs by directly using paired datasets of case versus control experiments, common in diverse biological research projects. We present a novel deep-learning based method, scTIGER, for GRN detection by using the co-dynamics of gene expression. scTIGER also employs cell type-based pseudotiming, an attention-based convolutional neural network method, and permutation-based significance testing to infer GRNs from gene modules. We first applied scTIGER to scRNA-seq datasets of prostate cancer cells and detected potential AR-mediated GRNs. Then, when applied to mouse neurons with and without fear memory and detected CREB-mediated GRNs. The results show scTIGER can be applied to general case-versus-control scRNA-seq datasets with high performance

    Development of Computational Techniques for Identification of Regulatory DNA Motif

    Get PDF
    Identifying precise transcription factor binding sites (TFBS) or regulatory DNA motif (motif) plays a fundamental role in researching transcriptional regulatory mechanism in cells and helping construct regulatory networks for biological investigation. Chromatin immunoprecipitation combined with sequencing (ChIP-seq) and lambda exonuclease digestion followed by high-throughput sequencing (ChIP-exo) enables researchers to identify TFBS on a genome-scale with improved resolution. Several algorithms have been developed to perform motif identification, employing widely different methods and often giving divergent results. In addition, these existing methods still suffer from prediction accuracy. Thesis focuses on the development of improved regulatory DNA motif identification techniques. We designed an integrated framework, WTSA, that can reliably combine the experimental signals from ChIP-exo data in base pair (bp) resolution to predict the statistically significant DNA motifs. The algorithm improves the prediction accuracy and extends the scope of applicability of the existing methods. We have applied the framework to Escherichia coli k12 genome and evaluated WTSA prediction performance through comparison with seven existing programs. The performance evaluation indicated that WTSA provides reliable predictive power for regulatory motifs using ChIP-exo data. An important application of DNA motif identification is to identify transcriptional regulatory mechanisms. The rapid development of single-cell RNA-Sequencing (scRNAseq) technologies provides an unprecedented opportunity to discover the gene transcriptional regulation at the single-cell level. In the scRNA-seq analyses, a critical step is to identify the cell-type-specific regulons (CTS-Rs), each of which is a group of genes co-regulated by the same transcription regulator in a specific cell type. We developed a web server, IRIS3 (Integrated Cell-type-specific Regulon Inference Server from Single-cell RNA-Seq), to solve this problem by the integration of data preprocessing, cell type prediction, gene module identification, and cis-regulatory motif analyses. Compared with other packages, IRIS3 predicts more efficiently and provides more accurate regulon from scRNA-seq data. These CTS-Rs can substantially improve the elucidation of heterogeneous regulatory mechanisms among various cell types and allow reliable constructions of global transcriptional regulation networks encoded in a specific cell type. Also presented in this thesis is DESSO (DEep Sequence and Shape mOtif (DESSO), using deep neural networks and the binomial distribution model to identify DNA motifs, DESSO outperformed existing tools, including DeepBind, in 690 human ENCODE ChIP-Sequencing datasets. DESSO also further expanded motif identification power by integrating the detection of DNA shape features

    TIME AND CAUSALITY IN GENOMICS DATA

    Get PDF
    The ability to sequence the genomic information that describes individual cell states has provided enormous insight into biological systems. However, to sequence the genomic information within a cell, the cell must be killed, preventing measurements from the future states that cell would have occupied had it been allowed to survive. Thus, sequencing measurements only provide a single snapshot in time of cellular genomic states. Often the ultimate goal of an analysis is to derive mechanistic insight into the biology of a system or process from the data. However, such mechanistic, causal inference is almost impossible without temporal information because causality in standard formulations is based on the concept of connected causes and effects through time. This thesis has interacted with time in genomics data in several ways. The first contribution of this thesis is a neural network-based model that attempts to predict future single-cell transcriptomic states from single-cell transcriptomics data sets. This work demonstrates that using metabolic labeling data sets, future RNA states are estimable within the same cell in the short term, providing a proof of principle that can be expanded as genomics data sets with a temporal dimension become more common. The second contribution of this thesis is a simulation of molecular cell states over time, which is able to demonstrate how single time points from cells do not allow for robust mechanistic inference. Further, the simulation conforms to observations that mRNA expression and expression of the corresponding protein are often poorly correlated and provides mechanistic explanations for how this occurs. The final contribution relates to time in a different sense, analyzing the impact of human age on biomarkers used for cancer immunotherapy. We found that older individuals possessed a number of favorable biomarkers at higher levels than their younger counterparts, possibly explaining clinical observations that older individuals do no worse than younger individuals on immune checkpoint therapies despite the usual anticorrelation between patient age and effective immune responses

    Computational approaches for single-cell omics and multi-omics data

    Get PDF
    Single-cell omics and multi-omics technologies have enabled the study of cellular heterogeneity with unprecedented resolution and the discovery of new cell types. The core of identifying heterogeneous cell types, both existing and novel ones, relies on efficient computational approaches, including especially cluster analysis. Additionally, gene regulatory network analysis and various integrative approaches are needed to combine data across studies and different multi-omics layers. This thesis comprehensively compared Bayesian clustering models for single-cell RNAsequencing (scRNA-seq) data and selected integrative approaches were used to study the cell-type specific gene regulation of uterus. Additionally, single-cell multi-omics data integration approaches for cell heterogeneity analysis were investigated. Article I investigated analytical approaches for cluster analysis in scRNA-seq data, particularly, latent Dirichlet allocation (LDA) and hierarchical Dirichlet process (HDP) models. The comparison of LDA and HDP together with the existing state-of-art methods revealed that topic modeling-based models can be useful in scRNA-seq cluster analysis. Evaluation of the cluster qualities for LDA and HDP with intrinsic and extrinsic cluster quality metrics indicated that the clustering performance of these methods is dataset dependent. Article II and Article III focused on cell-type specific integrative analysis of uterine or decidual stromal (dS) and natural killer (dNK) cells that are important for successful pregnancy. Article II integrated the existing preeclampsia RNA-seq studies of the decidua together with recent scRNA-seq datasets in order to investigate cell-type-specific contributions of early onset preeclampsia (EOP) and late onset preeclampsia (LOP). It was discovered that the dS marker genes were enriched for LOP downregulated genes and the dNK marker genes were enriched for upregulated EOP genes. Article III presented a gene regulatory network analysis for the subpopulations of dS and dNK cells. This study identified novel subpopulation specific transcription factors that promote decidualization of stromal cells and dNK mediated maternal immunotolerance. In Article IV, different strategies and methodological frameworks for data integration in single-cell multi-omics data analysis were reviewed in detail. Data integration methods were grouped into early, late and intermediate data integration strategies. The specific stage and order of data integration can have substantial effect on the results of the integrative analysis. The central details of the approaches were presented, and potential future directions were discussed.  Laskennallisia menetelmiä yksisolusekvensointi- ja multiomiikkatulosten analyyseihin Yksisolusekvensointitekniikat mahdollistavat solujen heterogeenisyyden tutkimuksen ennennäkemättömällä resoluutiolla ja uusien solutyyppien löytämisen. Solutyyppien tunnistamisessa keskeisessä roolissa on ryhmittely eli klusterointianalyysi. Myös geenien säätelyverkostojen sekä eri molekyylidatatasojen yhdistäminen on keskeistä analyysissä. Väitöskirjassa verrataan bayesilaisia klusterointimenetelmiä ja yhdistetään eri menetelmillä kerättyjä tietoja kohdun solutyyppispesifisessä geeninsäätelyanalyysissä. Lisäksi yksisolutiedon integraatiomenetelmiä selvitetään kattavasti. Julkaisu I keskittyy analyyttisten menetelmien, erityisesti latenttiin Dirichletallokaatioon (LDA) ja hierarkkiseen Dirichlet-prosessiin (HDP) perustuvien mallien tutkimiseen yksisoludatan klusterianalyysissä. Kattava vertailu näiden kahden mallin sekä olemassa olevien menetelmien kanssa paljasti, että aihemallinnuspohjaiset menetelmät voivat olla hyödyllisiä yksisoludatan klusterianalyysissä. Menetelmien suorituskyky riippui myös kunkin analysoitavan datasetin ominaisuuksista. Julkaisuissa II ja III keskitytään naisen lisääntymisterveydelle tärkeiden kohdun stroomasolujen ja NK-immuunisolujen solutyyppispesifiseen analyysiin. Artikkelissa II yhdistettiin olemassa olevia tuloksia pre-eklampsiasta viimeisimpiin yksisolusekvensointituloksiin ja löydettiin varhain alkavan pre-eklampsian (EOP) ja myöhään alkavan pre-eklampsian (LOP) solutyyppispesifisiä vaikutuksia. Havaittiin, että erilaistuneen strooman markkerigeenien ilmentyminen vähentyi LOP:ssa ja NK-markkerigeenien ilmentyminen lisääntyi EOP:ssa. Julkaisu III analysoi strooman ja NK-solujen alapopulaatiospesifisiä geeninsäätelyverkostoja ja niiden transkriptiofaktoreita. Tutkimus tunnisti uusia alapopulaatiospesifisiä säätelijöitä, jotka edistävät strooman erilaistumista ja NK-soluvälitteistä immunotoleranssia Julkaisu IV tarkastelee yksityiskohtaisesti strategioita ja menetelmiä erilaisten yksisoludatatasojen (multi-omiikka) integroimiseksi. Integrointimenetelmät ryhmiteltiin varhaisen, myöhäisen ja välivaiheen strategioihin ja kunkin lähestymistavan menetelmiä esiteltiin tarkemmin. Lisäksi keskusteltiin mahdollisista tulevaisuuden suunnista

    Additional file 1: of Evaluating methods of inferring gene regulatory networks highlights their lack of performance for single cell gene expression data

    No full text
    Figure S1. Simulation parameters using GNW to generate simulated datasets, Sim1 (top) and Sim2 (bottom). For Sim1, we sampled 101 times points (the first time point at t = 0 was not used) from a series of time series data, with other parameters kept the same as the ones used in the DREAM4 challenge, and eventually obtained S = 100. For Sim2, we sampled 11 time points (the first time point at t = 0 was not used) from 100 series of time series data, with the other parameters kept the same as the ones used in the DREAM4 challenge, and eventually obtained S = 1000. (PDF 347 kb

    Characterization of Gamma-Secretase-Mediated Cleavage of Receptor Tyrosine Kinases

    Get PDF
    ABSTRACT Receptor tyrosine kinases (RTK) are a family of cell surface receptors consisting of 55 members. RTKs regulate intracellular signaling pathways that control fundamental cellular processes including differentiation, proliferation, and survival. The functionality of RTKs is necessary for the development and homeostasis of many tissues. In human pathologies, such as cancer, aberrant RTK signaling is a common feature. Gamma-secretase-mediated regulated intramembrane proteolysis is a proteolytic cleavage of RTKs in two sequential proteolytic events: a sheddasemediated ectodomain shedding followed by the release of a soluble intracellular domain by a gamma-secretase cleavage. The aims of my thesis were to characterize the gamma-secretase-mediated cleavage of RTKs, with a focus on identifying the prevalence of cleavage among RTKs and developing novel methods to identify signaling pathways associated with the process. The results of this thesis indicate that at least half of the RTKs are subjected to gamma-secretase cleavage. In total, 12 new gamma-secretase targets were identified. Many of the identified new gamma-secretase target RTKs, for example AXL and TYRO3, presented cleavage-dependent effect on cell growth. My research also demonstrated that the signaling of TYRO3 full-length receptor and soluble intracellular domain of TYRO3 is different as observed with our novel systems biology methods. Together, these findings represent for a first time an approach to determine the prevalence of gamma-secretase cleavage among RTKs. Moreover, this study presents novel methods and tools for identifying still largely unknown RTK cleavage associated signaling pathways. The RTK processing via proteolytical cleavage has indications for the functionality of RTKs in both normal tissues and cancer. The results of this thesis can provide new insights into the regulation of the functions of RTKs and can be used to develop new strategies to treat cancers. KEYWORDS: receptor tyrosine kinase, RTK, gamma-secretase, regulated intramembrane proteolysis, intracellular kinase domain, shedding, proteomicsTIIVISTELMÄ Ihmisen genomi sisältää 55 reseptorityrosiinikinaasia (RTK). RTK:t ovat solukalvolla sijaitsevia signalointiproteiineja. RTK:t signaloivat solunsisäisten signalointireittien välityksellä ja säätelevät elintärkeitä solutapahtumia, kuten solujen lisääntymistä, erilaistumista ja selviytymistä. RTK ovat tärkeitä monien kudosten kehittymisessä, ja niiden epänormaalia toimintaa on todettu monissa sairauksissa, kuten syövissä. Gamma-sekretaasivälitteinen säädelty solukalvonsisäinen proteolyysi on mekanismi, jolla RTK:t katkaistaan proteolyyttisesti. Tämä on kaksivaiheinen tapahtuma. RTK:n solunulkoinen domeeni katkaistaan ensin ADAM-nimisten proteiinien toimesta ja tätä seuraa gamma-sekretaasin tekemä solukalvon sisäisen osan irrottaminen solukalvolta. Tämän väitöskirjan tavoitteena oli karakterisoida RTK:iden gamma-sekretaasikatkeamista. RTK:iden katkeamisen yleisyyden selvittäminen, sekä menetelmien kehitys, joilla paremmin pystytään tunnistamaan RTK:iden katkeamiseen liittyvää signalointia, olivat tarkemman tutkimuksen kohteena. Selvitimme, että puolet ihmisen RTK:ista on kohteena gamma-sekretaasi-välitteiselle katkaisulle ja tunnistimme yhteensä 12 uutta kohdetta. TYRO3 ja AXL RTK:iden kohdalla solujen kasvun lisääntyminen liittyi näiden RTK:iden katkeamiseen. Lisäksi väitöskirjatutkimuksessani pystyimme osoittamaan, että TYRO3 RTK:n katkeamisesta muodostuvan liukoisen osan aikaansaama signalointi eroaa merkittävästi kokopitkän TYRO3:n aikaansaamasta signaloinnista. Tutkimuksessa tehdyt havainnot osoittavat, että RTK:iden katkeaminen on yleistä ja uudenlaiset analysointimenetelmät auttavat aikaisempaa paremmin tunnistamaan uusia signalointireittejä katkeaville RTK:ille. Tutkimuksen tulokset RTK:iden katkeamisesta sekä sen signaloinnista laajentavat ymmärrystämme RTK:iden signaloinnista ja tulosten antamaa tietoa voidaan käyttää uusien syöpähoitojen kehittämisessä. AVAINSANAT: reseptorityrosiinikinaasi, RTK, gamma-sekretaasi, säädelty solukalvonsisäinen proteolyysi, solunsisäinen kinaasidomeeni, proteomiikka
    corecore