24 research outputs found

    Deciphering transcriptional patterns of gene regulation : a computational approach

    Get PDF
    With rapid advancements in sequencing technology, we now have the ability to sequence the entire human genome, and to quantify expression of tens of thousands of genes from hundreds of individuals. This provides an extraordinary opportunity to learn phenotype relevant genomic patterns that can improve our understanding of molecular and cellular processes underlying a trait. The high dimensional nature of genomic data presents a range of computational and statistical challenges. This dissertation presents a compilation of projects that were driven by the motivation to efficiently capture gene regulatory patterns in the human transcriptome, while addressing statistical and computational challenges that accompany this data. We attempt to address two major difficulties in this domain: a) artifacts and noise in transcriptomic data, and b) limited statistical power. First, we present our work on investigating the effect of artifactual variation in gene expression data and its impact on trans-eQTL discovery. Here we performed an in-depth analysis of diverse pre-recorded covariates and latent confounders to understand their contribution to heterogeneity in gene expression measurements. Next, we discovered 673 trans-eQTLs across 16 human tissues using v6 data from the Genotype Tissue Expression (GTEx) project. Finally, we characterized two trait-associated trans-eQTLs; one in Skeletal Muscle and another in Thyroid. Second, we present a principal component based residualization method to correct gene expression measurements prior to reconstruction of co-expression networks. In this work, we demonstrated theoretically, in simulation, and empirically, that principal component correction of gene expression measurements prior to network inference can reduce false positive edges. Using data from the GTEx project in multiple tissues, we showed that this approach reduced false discoveries beyond correcting only for known confounders. Third, we present a multi-study integration approach to identify universal transcriptional patterns underlying epithelial to mesenchymal transition (EMT) across different cancer types. With informed statistical analysis and functional validation, we identified consensus ranked universal EMT genes. This gene list consisted of a) known EMT genes, b) genes studied in a subset of carcinomas, unknown in prostate cancer, and c) novel unknown EMT and cancer genes such as C1orf116. Finally we present methods to integrate co-expression signals across multiple human RNA-seq data to reconstruct networks with increased power. First, we considered multiple aggregation strategies to build context-agnostic networks using data from recount2. These networks captured ubiquitous patterns of gene co-expression shared across tissues and cell types. Next, we briefly describe a hierarchical mixture model groupNet that leverages signal from multiple datasets to learn the structure of a Gaussian Markov random field (GRMF) to build context-specific co-expression networks

    Continuity of transcriptomes among colorectal cancer subtypes based on meta-analysis

    Full text link
    Background: Previous approaches to defining subtypes of colorectal carcinoma (CRC) and other cancers based on transcriptomes have assumed the existence of discrete subtypes. We analyze gene expression patterns of colorectal tumors from a large number of patients to test this assumption and propose an approach to identify potentially a continuum of subtypes that are present across independent studies and cohorts. Results: We examine the assumption of discrete CRC subtypes by integrating 18 published gene expression datasets and \u3e3700 patients, and contrary to previous reports, find no evidence to support the existence of discrete transcriptional subtypes. Using a meta-analysis approach to identify co-expression patterns present in multiple datasets, we identify and define robust, continuously varying subtype scores to represent CRC transcriptomes. The subtype scores are consistent with established subtypes (including microsatellite instability and previously proposed discrete transcriptome subtypes), but better represent overall transcriptional activity than do discrete subtypes. The scores are also better predictors of tumor location, stage, grade, and times of disease-free survival than discrete subtypes. Gene set enrichment analysis reveals that the subtype scores characterize T-cell function, inflammation response, and cyclin-dependent kinase regulation of DNA replication. Conclusions: We find no evidence to support discrete subtypes of the CRC transcriptome and instead propose two validated scores to better characterize a continuity of CRC transcriptomes

    Alterations of immune response of non-small lung cancer with azacytidine

    Get PDF
    Innovative therapies are needed for advanced Non-Small Cell Lung Cancer (NSCLC). We have undertaken a genomics based, hypothesis driving, approach to query an emerging potential that epigenetic therapy may sensitize to immune checkpoint therapy targeting PD-L1/PD-1 interaction. NSCLC cell lines were treated with the DNA hypomethylating agent azacytidine (AZA - Vidaza) and genes and pathways altered were mapped by genome-wide expression and DNA methylation analyses. AZA-induced pathways were analyzed in The Cancer Genome Atlas (TCGA) project by mapping the derived gene signatures in hundreds of lung adeno (LUAD) and squamous cell carcinoma (LUSC) samples. AZA up-regulates genes and pathways related to both innate and adaptive immunity and genes related to immune evasion in a several NSCLC lines. DNA hypermethylation and low expression of IRF7, an interferon transcription factor, tracks with this signature particularly in LUSC. In concert with these events, AZA up-regulates PD-L1 transcripts and protein, a key ligand-mediator of immune tolerance. Analysis of TCGA samples demonstrates that a significant proportion of primary NSCLC have low expression of AZA-induced immune genes, including PD-L1. We hypothesize that epigenetic therapy combined with blockade of immune checkpoints - in particular the PD-1/PD-L1 pathway - may augment response of NSCLC by shifting the balance between immune activation and immune inhibition, particularly in a subset of NSCLC with low expression of these pathways. Our studies define a biomarker strategy for response in a recently initiated trial to examine the potential of epigenetic therapy to sensitize patients with NSCLC to PD-1 immune checkpoint blockade

    The impact of sex on gene expression across human tissues

    Full text link
    Many complex human phenotypes exhibit sex-differentiated characteristics. However, the molecular mechanisms underlying these differences remain largely unknown. We generated a catalog of sex differences in gene expression and in the genetic regulation of gene expression across 44 human tissue sources surveyed by the Genotype-Tissue Expression project (GTEx, v8 release). We demonstrate that sex influences gene expression levels and cellular composition of tissue samples across the human body. A total of 37% of all genes exhibit sex-biased expression in at least one tissue. We identify cis expression quantitative trait loci (eQTLs) with sex-differentiated effects and characterize their cellular origin. By integrating sex-biased eQTLs with genome-wide association study data, we identify 58 gene-trait associations that are driven by genetic regulation of gene expression in a single sex. These findings provide an extensive characterization of sex differences in the human transcriptome and its genetic regulation

    Genetic effects on gene expression across human tissues

    Get PDF
    Characterization of the molecular function of the human genome and its variation across individuals is essential for identifying the cellular mechanisms that underlie human genetic traits and diseases. The Genotype-Tissue Expression (GTEx) project aims to characterize variation in gene expression levels across individuals and diverse tissues of the human body, many of which are not easily accessible. Here we describe genetic effects on gene expression levels across 44 human tissues. We find that local genetic variation affects gene expression levels for the majority of genes, and we further identify inter-chromosomal genetic effects for 93 genes and 112 loci. On the basis of the identified genetic effects, we characterize patterns of tissue specificity, compare local and distal effects, and evaluate the functional properties of the genetic effects. We also demonstrate that multi-tissue, multi-individual data can be used to identify genes and pathways affected by human disease-associated variation, enabling a mechanistic interpretation of gene regulation and the genetic basis of diseas

    Genetic effects on gene expression across human tissues

    Get PDF
    Characterization of the molecular function of the human genome and its variation across individuals is essential for identifying the cellular mechanisms that underlie human genetic traits and diseases. The Genotype-Tissue Expression (GTEx) project aims to characterize variation in gene expression levels across individuals and diverse tissues of the human body, many of which are not easily accessible. Here we describe genetic effects on gene expression levels across 44 human tissues. We find that local genetic variation affects gene expression levels for the majority of genes, and we further identify inter-chromosomal genetic effects for 93 genes and 112 loci. On the basis of the identified genetic effects, we characterize patterns of tissue specificity, compare local and distal effects, and evaluate the functional properties of the genetic effects. We also demonstrate that multi-tissue, multi-individual data can be used to identify genes and pathways affected by human disease-associated variation, enabling a mechanistic interpretation of gene regulation and the genetic basis of disease

    Identifying global expression patterns and key regulators in epithelial to mesenchymal transition through multi-study integration

    No full text
    Abstract Background Epithelial to mesenchymal transition (EMT) is the process by which stationary epithelial cells transdifferentiate to mesenchymal cells with increased motility. EMT is integral in early stages of development and wound healing. Studies have shown that EMT could be a critical early event in tumor metastasis that is involved in acquisition of migratory and invasive properties in multiple carcinomas. Methods In this study, we used 15 published gene expression microarray datasets from Gene Expression Omnibus (GEO) that represent 12 cell lines from 6 cancer types across 95 observations (45 unique samples and 50 replicates) with different modes of induction of EMT or the reverse transition, mesenchymal to epithelial transition (MET). We integrated multiple gene expression datasets while considering study differences, batch effects, and noise in gene expression measurements. A universal differential EMT gene list was obtained by normalizing and correcting the data using four approaches, computing differential expression from each, and identifying a consensus ranking. We confirmed our discovery of novel EMT genes at mRNA and protein levels in an in vitro EMT model of prostate cancer – PC3 epi, EMT and Taxol resistant cell lines. We validate our discovery of C1orf116 as a novel EMT regulator by siRNA knockdown of C1orf116 in PC3 epithelial cells. Results Among differentially expressed genes, we found known epithelial and mesenchymal marker genes such as CDH1 and ZEB1. Additionally, we discovered genes known in a subset of carcinomas that were unknown in prostate cancer. This included epithelial specific LSR and S100A14 and mesenchymal specific DPYSL3. Furthermore, we also discovered novel EMT genes including a poorly-characterized gene C1orf116. We show that decreased expression of C1orf116 is associated with poor prognosis in lung and prostate cancer patients. We demonstrate that knockdown of C1orf116 expression induced expression of mesenchymal genes in epithelial prostate cancer cell line PC3-epi cells, suggesting it as a candidate driver of the epithelial phenotype. Conclusions This comprehensive approach of statistical analysis and functional validation identified global expression patterns in EMT and candidate regulatory genes, thereby both extending current knowledge and identifying novel drivers of EMT
    corecore