27 research outputs found

    Avoiding the pitfalls of gene set enrichment analysis with SetRank.

    Get PDF
    The purpose of gene set enrichment analysis (GSEA) is to find general trends in the huge lists of genes or proteins generated by many functional genomics techniques and bioinformatics analyses. Here we present SetRank, an advanced GSEA algorithm which is able to eliminate many false positive hits. The key principle of the algorithm is that it discards gene sets that have initially been flagged as significant, if their significance is only due to the overlap with another gene set. The algorithm is explained in detail and its performance is compared to that of other methods using objective benchmarking criteria. Furthermore, we explore how sample source bias can affect the results of a GSEA analysis. The benchmarking results show that SetRank is a highly specific tool for GSEA. Furthermore, we show that the reliability of results can be improved by taking sample source bias into account. SetRank is available as an R package and through an online web interface

    Identification of Specific Circular RNA Expression Patterns and MicroRNA Interaction Networks in Mesial Temporal Lobe Epilepsy

    Get PDF
    Circular RNAs (circRNAs) regulate mRNA translation by binding to microRNAs (miRNAs), and their expression is altered in diverse disorders, including cancer, cardiovascular disease, and Parkinson’s disease. Here, we compare circRNA expression patterns in the temporal cortex and hippocampus of patients with pharmacoresistant mesial temporal lobe epilepsy (MTLE) and healthy controls. Nine circRNAs showed significant differential expression, including circRNA-HOMER1, which is expressed in synapses. Further, we identified miRNA binding sites within the sequences of differentially expressed (DE) circRNAs; expression levels of mRNAs correlated with changes in complementary miRNAs. Gene set enrichment analysis of mRNA targets revealed functions in heterocyclic compound binding, regulation of transcription, and signal transduction, which maintain the structure and function of hippocampal neurons. The circRNA–miRNA–mRNA interaction networks illuminate the molecular changes in MTLE, which may be pathogenic or an effect of the disease or treatments and suggests that DE circRNAs and associated miRNAs may be novel therapeutic target

    Identification of Specific Circular RNA Expression Patterns and MicroRNA Interaction Networks in Mesial Temporal Lobe Epilepsy

    Get PDF
    Circular RNAs (circRNAs) regulate mRNA translation by binding to microRNAs (miRNAs), and their expression is altered in diverse disorders, including cancer, cardiovascular disease, and Parkinson’s disease. Here, we compare circRNA expression patterns in the temporal cortex and hippocampus of patients with pharmacoresistant mesial temporal lobe epilepsy (MTLE) and healthy controls. Nine circRNAs showed significant differential expression, including circRNA-HOMER1, which is expressed in synapses. Further, we identified miRNA binding sites within the sequences of differentially expressed (DE) circRNAs; expression levels of mRNAs correlated with changes in complementary miRNAs. Gene set enrichment analysis of mRNA targets revealed functions in heterocyclic compound binding, regulation of transcription, and signal transduction, which maintain the structure and function of hippocampal neurons. The circRNA–miRNA–mRNA interaction networks illuminate the molecular changes in MTLE, which may be pathogenic or an effect of the disease or treatments and suggests that DE circRNAs and associated miRNAs may be novel therapeutic targets

    Sensitivity And Specificity Of Gene Set Analysis

    Get PDF
    High-throughput technologies are widely used for understanding biological processes. Gene set analysis is a well-established computational approach for providing a concise biological interpretation of high-throughput gene expression data. Gene set analysis utilizes the available knowledge about the groups of genes involved in cellular processes or functions. Large collections of such groups of genes, referred to as gene set databases, are available through online repositories to facilitate gene set analysis. There are a large number of gene set analysis methods available, and current recommendations and guidelines about the method of choice for a given experiment are often inconsistent and contradictory. It has also been reported that some gene set analysis methods suffer from a lack of specificity. Furthermore, the sheer size of gene set databases makes it difficult to study these databases and their effect on gene set analysis. In this thesis, we propose quantitative approaches for the study of reproducibility, sensitivity, and specificity of gene set analysis methods; characterize gene set databases; and offer guidelines for choosing an appropriate gene set database for a given experiment. We review commonly used gene set analysis methods; classify these methods based on their components; describe the underlying requirements and assumptions for each class; suggest the appropriate method to be used for a given experiment; and explain the challenges and pitfalls in interpreting results for each class of methods. We propose a methodology and use it for evaluating the effect of sample size on the results of thirteen gene set analysis methods utilizing real datasets. Further, to investigate the effect of method choice on the results of gene set analysis, we develop a quantitative approach and use it to evaluate ten commonly used gene set analysis methods. We also quantify and visualize gene set overlap and study its effect on the specificity of over-representation analysis. We propose Silver, a quantitative framework for simulating gene expression datasets and evaluating gene set analysis methods without relying on oversimplifying assumptions commonly made when evaluating gene set analysis methods. Finally, we propose a systematic approach to select appropriate gene set databases for conducting gene set analysis for a given experiment. Using this approach, we highlight the drawbacks of meta-databases such as MSigDB, a well-established gene set database made by extracting gene sets from several sources including GO, KEGG, Reactome, and BioCarta. Our findings suggest that the results of most gene set analysis methods are not reproducible for small sample sizes. In addition, the results of gene set analysis significantly vary depending on the method used, with little to no commonality between the 20 most significant results. We show that there is a significant negative correlation between gene set overlap and the specificity of over-representation analysis. This suggests that gene set overlap should be taken into account when developing and evaluating gene set analysis methods. We show that the datasets synthesized using Silver preserve complex gene-gene correlations and the distribution of expression values. Using Silver provides unbiased insight about how gene set analysis methods behave when applied on real datasets and real gene set databases. Our quantitative study of several well-established gene set databases reveals that commonly used gene set databases fall short in representing some phenotypes. The proposed methodologies and achieved results in this research reveal the main challenges facing gene set analysis. We identify key factors that contribute to the lack of specificity and reproducibility of gene set analysis methods, establishing the direction for future research. Also, the quantitative methodologies proposed in this thesis facilitate the design and development of gene set analysis methods as well as gene set databases and benefit a wide range of researchers utilizing high-throughput technologies

    Impact of 6 month conjugated equine estrogen versus estradiol-treatment on biomarkers and enriched gene sets in healthy mammary tissue of non-human primates.

    Get PDF
    OBJECTIVE To identify distinctly regulated gene markers and enriched gene sets in breast tissue of cynomolgus monkeys (Macaca fascicularis) treated for six months with either conjugated equine estrogens (CEE) or estradiol (E2) by analysis of corresponding mRNA levels of genes associated with breast development, carcinogenesis, apoptosis and immune regulation. Additionally, translation of three nuclear markers was analyzed. METHODS RNA from breast biopsies and necropsies was isolated from two independent study trials from Ethun et al. (CEE) and Foth et al. (E2) after 6 month of treatment duration. RNA was subjected to qRT-PCR and MicroArray analysis. Immunohistochemical stainings were performed for the estrogen receptor alpha subunit (ERa), the progesterone receptor (PGR) and the proliferation marker Ki67. RESULTS We identified a total of 36 distinctly enriched gene sets. Thirty-one were found in the CEE treatment group and five were found in the E2 treatment group, with no overlap. Furthermore, two individual genes IGFBP1 and SGK493 were upregulated in CEE treated animals. Additional targeted qRT-PCR analysis of ten specific estrogen-related genes showed upregulation of three genes (TFF1, PGR and GREB1) after CEE treatment, respectively one gene (TFF1) after E2 treatment. Immunohistochemical stains of breast biopsies showed a significant increase in expression of the PGR marker after CEE treatment. CONCLUSIONS In this study we identified enriched gene sets possibly induced by CEE or E2 treatment in various processes associated with cancer biology and immunology. This preliminary translational data supports the concept that different estrogen types have different effects on healthy breast tissue and may help generate hypotheses for future research

    Identification of specific circular RNA expression patterns and microRNA interaction networks in mesial temporal lobe epilepsy

    Get PDF
    Circular RNAs (circRNAs) regulate mRNA translation by binding to microRNAs (miRNAs), and their expression is altered in diverse disorders, including cancer, cardiovascular disease, and Parkinson's disease. Here, we compare circRNA expression patterns in the temporal cortex and hippocampus of patients with pharmacoresistant mesial temporal lobe epilepsy (MTLE) and healthy controls. Nine circRNAs showed significant differential expression, including circRNA-HOMER1, which is expressed in synapses. Further, we identified miRNA binding sites within the sequences of differentially expressed (DE) circRNAs; expression levels of mRNAs correlated with changes in complementary miRNAs. Gene set enrichment analysis of mRNA targets revealed functions in heterocyclic compound binding, regulation of transcription, and signal transduction, which maintain the structure and function of hippocampal neurons. The circRNA-miRNA-mRNA interaction networks illuminate the molecular changes in MTLE, which may be pathogenic or an effect of the disease or treatments and suggests that DE circRNAs and associated miRNAs may be novel therapeutic targets.Paroxysmal Cerebral Disorder

    Novel variants in KAT6B spectrum of disorders expand our knowledge of clinical manifestations and molecular mechanisms

    Get PDF
    The phenotypic variability associated with pathogenic variants in Lysine Acetyltransferase 6B (KAT6B, a.k.a. MORF, MYST4) results in several interrelated syndromes including Say-Barber-Biesecker-Young-Simpson Syndrome and Genitopatellar Syndrome. Here we present 20 new cases representing 10 novel KAT6B variants. These patients exhibit a range of clinical phenotypes including intellectual disability, mobility and language difficulties, craniofacial dysmorphology, and skeletal anomalies. Given the range of features previously described for KAT6B-related syndromes, we have identified additional phenotypes including concern for keratoconus, sensitivity to light or noise, recurring infections, and fractures in greater numbers than previously reported. We surveyed clinicians to qualitatively assess the ways families engage with genetic counselors upon diagnosis. We found that 56% (10/18) of individuals receive diagnoses before the age of 2 years (median age = 1.96 years), making it challenging to address future complications with limited accessible information and vast phenotypic severity. We used CRISPR to introduce truncating variants into the KAT6B gene in model cell lines and performed chromatin accessibility and transcriptome sequencing to identify key dysregulated pathways. This study expands the clinical spectrum and addresses the challenges to management and genetic counseling for patients with KAT6B-related disorders

    Ribonuclease inhibitor 1 regulates erythropoiesis by controlling GATA1 translation.

    Get PDF
    Ribosomal proteins (RP) regulate specific gene expression by selectively translating subsets of mRNAs. Indeed, in Diamond-Blackfan anemia and 5q- syndrome, mutations in RP genes lead to a specific defect in erythroid gene translation and cause anemia. Little is known about the molecular mechanisms of selective mRNA translation and involvement of ribosomal-associated factors in this process. Ribonuclease inhibitor 1 (RNH1) is a ubiquitously expressed protein that binds to and inhibits pancreatic-type ribonucleases. Here, we report that RNH1 binds to ribosomes and regulates erythropoiesis by controlling translation of the erythroid transcription factor GATA1. Rnh1-deficient mice die between embryonic days E8.5 and E10 due to impaired production of mature erythroid cells from progenitor cells. In Rnh1-deficient embryos, mRNA levels of Gata1 are normal, but GATA1 protein levels are decreased. At the molecular level, we found that RNH1 binds to the 40S subunit of ribosomes and facilitates polysome formation on Gata1 mRNA to confer transcript-specific translation. Further, RNH1 knockdown in human CD34+ progenitor cells decreased erythroid differentiation without affecting myelopoiesis. Our results reveal an unsuspected role for RNH1 in the control of GATA1 mRNA translation and erythropoiesis

    dSreg: a Bayesian model to integrate changes in splicing and RNA-binding protein activity

    Get PDF
    MOTIVATION: Alternative splicing (AS) is an important mechanism in the generation of transcript diversity across mammals. AS patterns are dynamically regulated during development and in response to environmental changes. Defects or perturbations in its regulation may lead to cancer or neurological disorders, among other pathological conditions. The regulatory mechanisms controlling AS in a given biological context are typically inferred using a two-step framework: differential AS analysis followed by enrichment methods. These strategies require setting rather arbitrary thresholds and are prone to error propagation along the analysis. RESULTS: To overcome these limitations, we propose dSreg, a Bayesian model that integrates RNA-seq with data from regulatory features, e.g. binding sites of RNA-binding proteins. dSreg identifies the key underlying regulators controlling AS changes and quantifies their activity while simultaneously estimating the changes in exon inclusion rates. dSreg increased both the sensitivity and the specificity of the identified AS changes in simulated data, even at low read coverage. dSreg also showed improved performance when analyzing a collection of knock-down RNA-binding proteins' experiments from ENCODE, as opposed to traditional enrichment methods, such as over-representation analysis and gene set enrichment analysis. dSreg opens the possibility to integrate a large amount of readily available RNA-seq datasets at low coverage for AS analysis and allows more cost-effective RNA-seq experiments. AVAILABILITY AND IMPLEMENTATION: dSreg was implemented in python using stan and is freely available to the community at https://bitbucket.org/cmartiga/dsreg. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.This work was supported by grants from the European Union [CardioNeTITN-289600, CardioNext-608027]; the Spanish Ministry of Economy and Competitiveness [SAF2015-65722-R, SAF2012-31451]; the Instituto de salud Carlos III (ISCIII) [CPII14/00027, RD012/0042/0066]; the Madrid Regional Government [2010-BMD-2321 “Fibroteam”]. The study also received support from the Plan Estatal de I+D+I 2013-2016 – European Regional Development Fund (ERDF) “A way of making Europe”, Spain. The CNIC is supported by the Spanish Ministry of Economy, Industry and Competitiveness and the Pro-CNIC Foundation and is a Severo Ochoa Center of Excellence (MEIC award SEV-2015-0505).S

    Pathway crosstalk perturbation network modeling for identification of connectivity changes induced by diabetic neuropathy and pioglitazone

    Full text link
    Abstract Background Aggregation of high-throughput biological data using pathway-based approaches is useful to associate molecular results to functional features related to the studied phenomenon. Biological pathways communicate with one another through the crosstalk phenomenon, forming large networks of interacting processes. Results In this work, we present the pathway crosstalk perturbation network (PXPN) model, a novel model used to analyze and integrate pathway perturbation data based on graph theory. With this model, the changes in activity and communication between pathways observed in transitions between physiological states are represented as networks. The model presented here is agnostic to the type of biological data and pathway definition used and can be implemented to analyze any type of high-throughput perturbation experiments. We present a case study in which we use our proposed model to analyze a gene expression dataset derived from experiments in a BKS-db/db mouse model of type 2 diabetes mellitus–associated neuropathy (DN) and the effects of the drug pioglitazone in this condition. The networks generated describe the profile of pathway perturbation involved in the transitions between the healthy and the pathological state and the pharmacologically treated pathology. We identify changes in the connectivity of perturbed pathways associated to each biological transition, such as rewiring between extracellular matrix, neuronal system, and G-protein coupled receptor signaling pathways. Conclusion The PXPN model is a novel, flexible method used to integrate high-throughput data derived from perturbation experiments; it is agnostic to the type of data and enrichment function used, and it is applicable to a wide range of biological phenomena of interest.https://deepblue.lib.umich.edu/bitstream/2027.42/146780/1/12918_2018_Article_674.pd
    corecore