45 research outputs found

    Progression of RNA-sequencing to single-cell applications

    Get PDF
    New methods enable new discoveries. My time as a PhD student has run in parallel with the maturation of the RNA-seq method, and I have used it to discover basic properties of gene expression and transcriptomes. My part has been bioinformatics – the computer analysis of biological data. RNA-seq quantifies gene expression for all genes in one experiment, allowing discoveries without prior knowledge, as opposed to single-gene hypothesis testing. When I started my PhD, this was done by microarray followed by qRT-PCR validation, which can be arduous. In contrast to microarrays, RNA-seq quantifies expression with little ambiguity of which gene each expression value corresponds to, and in absolute terms. But at the time, data analysis of RNA-seq was full of unknowns and there were little software available. Nowadays, partly the result of my work, the data analysis is much less complicated, and RNA-seq can be performed on diminutive samples, down to single cells, which was not viable using microarrays. My first study (Paper I) used one of the very first RNA-seq datasets to study general features of transcriptomes, such as mean mRNA length (~1,500 nt) and the number of genes expressed per tissue (~13,000). I also found special features of some tissues: the liver transcriptome is dominated by a few highly expressed gene, brain expresses especially long mRNAs and testis expresses many more genes than other tissues. Following this tissue RNA-seq study, I evaluated a new library preparation method for single-cell RNA-seq (Paper III), developed before the prevalence of single-cell RNA-seq. I used technical replicates to show that the method was accurate and reliable for the more highly expressed genes at single-cell RNA levels, and with input RNA amounts corresponding to >50 cells it produced as good quality data as bulk RNA-seq. Then the method was applied on melanoma cells isolated from human blood, and I listed surface antigen genes that distinguished these circulating tumour cells from other cells in the blood. This single-cell RNA-seq method was then applied on pre-implantation embryo cells (Paper IV). Using first-generation crosses between two mouse strains, I could separate the expression from the maternal and the paternal copies of the genes. I found that 12-24% of the genes express only one of their two copies in any given cell, in a random manner that affects almost all the expressed genes. I also found that the two copies are expressed independently from each other. Finally, I studied Sox transcription factors during neural development (Paper II), combining RNA-seq and microarray data for different cell types with ChIP-seq data for transcription factor binding and histone modifications. I found that Sox proteins bind to the enhancers active in the stem cells where the Sox proteins are active, but also to enhancers specific to subsequent cells in ii development. I also found that different Sox factors bind to much the same enhancers, and that they can induce histone modifications. In conclusion, my work has advanced the RNA-seq method and increased the understanding of transcriptional regulation and output

    Mouse Model of Alagille Syndrome and Mechanisms of Jagged1 Missense Mutations.

    Get PDF
    BACKGROUND & AIMS: Alagille syndrome is a genetic disorder characterized by cholestasis, ocular abnormalities, characteristic facial features, heart defects, and vertebral malformations. Most cases are associated with mutations in JAGGED1 (JAG1), which encodes a Notch ligand, although it is not clear how these contribute to disease development. We aimed to develop a mouse model of Alagille syndrome to elucidate these mechanisms. METHODS: Mice with a missense mutation (H268Q) in Jag1 (Jag1+/Ndr mice) were outbred to a C3H/C57bl6 background to generate a mouse model for Alagille syndrome (Jag1Ndr/Ndr mice). Liver tissues were collected at different timepoints during development, analyzed by histology, and liver organoids were cultured and analyzed. We performed transcriptome analysis of Jag1Ndr/Ndr livers and livers from patients with Alagille syndrome, cross-referenced to the Human Protein Atlas, to identify commonly dysregulated pathways and biliary markers. We used species-specific transcriptome separation and ligand-receptor interaction assays to measure Notch signaling and the ability of JAG1Ndr to bind or activate Notch receptors. We studied signaling of JAG1 and JAG1Ndr via NOTCH 1, NOTCH2, and NOTCH3 and resulting gene expression patterns in parental and NOTCH1-expressing C2C12 cell lines. RESULTS: Jag1Ndr/Ndr mice had many features of Alagille syndrome, including eye, heart, and liver defects. Bile duct differentiation, morphogenesis, and function were dysregulated in newborn Jag1Ndr/Ndr mice, with aberrations in cholangiocyte polarity, but these defects improved in adult mice. Jag1Ndr/Ndr liver organoids collapsed in culture, indicating structural instability. Whole-transcriptome sequence analyses of liver tissues from mice and patients with Alagille syndrome identified dysregulated genes encoding proteins enriched at the apical side of cholangiocytes, including CFTR and SLC5A1, as well as reduced expression of IGF1. Exposure of Notch-expressing cells to JAG1Ndr, compared with JAG1, led to hypomorphic Notch signaling, based on transcriptome analysis. JAG1-expressing cells, but not JAG1Ndr-expressing cells, bound soluble Notch1 extracellular domain, quantified by flow cytometry. However, JAG1 and JAG1Ndr cells each bound NOTCH2, and signaling from NOTCH2 signaling was reduced but not completely inhibited, in response to JAG1Ndr compared with JAG1. CONCLUSIONS: In mice, expression of a missense mutant of Jag1 (Jag1Ndr) disrupts bile duct development and recapitulates Alagille syndrome phenotypes in heart, eye, and craniofacial dysmorphology. JAG1Ndr does not bind NOTCH1, but binds NOTCH2, and elicits hypomorphic signaling. This mouse model can be used to study other features of Alagille syndrome and organ development

    Primate-specific endogenous retrovirus-driven transcription defines naive-like stem cells

    Get PDF
    Naive embryonic stem cells hold great promise for research and therapeutics as they have broad and robust developmental potential. While such cells are readily derived from mouse blastocysts it has not been possible to isolate human equivalents easily, although human naive-like cells have been artificially generated (rather than extracted) by coercion of human primed embryonic stem cells by modifying culture conditions or through transgenic modification. Here we show that a sub-population within cultures of human embryonic stem cells (hESCs) and induced pluripotent stem cells (hiPSCs) manifests key properties of naive state cells. These naive-like cells can be genetically tagged, and are associated with elevated transcription of HERVH, a primate-specific endogenous retrovirus. HERVH elements provide functional binding sites for a combination of naive pluripotency transcription factors, including LBP9, recently recognized as relevant to naivety in mice. LBP9-HERVH drives hESC-specific alternative and chimaeric transcripts, including pluripotency-modulating long non-coding RNAs. Disruption of LBP9, HERVH and HERVH-derived transcripts compromises self-renewal. These observations define HERVH expression as a hallmark of naive-like hESCs, and establish novel primate-specific transcriptional circuitry regulating pluripotency

    Efficient and Comprehensive Representation of Uniqueness for Next-Generation Sequencing by Minimum Unique Length Analyses

    Get PDF
    <div><p>As next generation sequencing technologies are getting more efficient and less expensive, RNA-Seq is becoming a widely used technique for transcriptome studies. Computational analysis of RNA-Seq data often starts with the mapping of millions of short reads back to the genome or transcriptome, a process in which some reads are found to map equally well to multiple genomic locations (multimapping reads). We have developed the <u>M</u>inimum <u>U</u>nique <u>L</u>ength <u>To</u>ol (MULTo), a framework for efficient and comprehensive representation of mappability information, through identification of the shortest possible length required for each genomic coordinate to become unique in the genome and transcriptome. Using the minimum unique length information, we have compared different uniqueness compensation approaches for transcript expression level quantification and demonstrate that the best compensation is achieved by discarding multimapping reads and correctly adjusting gene model lengths. We have also explored uniqueness within specific regions of the mouse genome and enhancer mapping experiments. Finally, by making MULTo available to the community we hope to facilitate the use of uniqueness compensation in RNA-Seq analysis and to eliminate the need to make additional mappability files.</p> </div

    Effects of uniqueness normalization on expression level.

    No full text
    <p>(<b>A</b>) Histogram showing how uniqueness compensation using MULTo affects the RPKM values at different read lengths. The x-axis show the difference in gene expression between uniqueness compensated and uncompensated expression levels. (<b>B</b>) RPKM values for FTH1 before and after uniqueness normalization. (<b>C</b>) Read coverage and uniqueness profile across FTH1 for 25 nt reads. Uniqueness density was calculated as the proportion unique reads aligning to each genomic coordinate.</p

    Schematic illustration MULTo file generation.

    No full text
    <p>(<b>A</b>) We defined the minimum unique length (MUL) of a genomic coordinate as the length of the shortest starting oligonucleotide at that coordinate that is needed to be unique. To find the MUL value, Fasta files with artificial “reads” of different lengths were iteratively created from whole chromosome fasta files and mapped to the genome using bowtie. When the minimum length needed for uniqueness was found, this value was stored in a binary file. In this example, position 3000091 was unique at 33 base pairs but not at 32, i.e. we have a MUL value of 33. (<b>B</b>) Exemplifying that MUL values can be retrieved from arbitrary regions in just a few lines of code.</p

    Uniqueness in the transcriptome.

    No full text
    <p>(<b>A, B</b>) We calculated the proportion of unique positions for each transcript, both for single reads and paired-end fragments (mean 500 nt), and then plotted how many transcripts have a certain proportion of unique positions. The y-axis represents the proportion of all transcripts that satisfies the given condition. (<b>A</b>) Gene-level uniqueness of all RefSeq transcripts. (<b>B</b>) Transcript-level uniqueness for all transcripts from multi-isoform genes. (<b>C</b>) Positional plot of the uniqueness proportion across all coding transcripts. We calculated the number of reads of a specific length that passes through each position, and determined what proportion of these were unique. Since transcripts differ in length, we binned positions together so that each region (upstream, downstream, coding sequence, 5′ and 3′UTR) had the same number of bins for each transcript. The x-axis represents coordinate bins across transcripts.</p

    Uniqueness profiles within genomic regions.

    No full text
    <p>The proportions of unique positions within different regions were calculated for read lengths in the range 20–255 nts. (<b>A</b>) Proportion unique positions in whole genome, within RefSeq genes, intergenic regions, known p300 binding sites, proximal promoters and CpG islands. (<b>B</b>) Proportion unique positions within different parts of genes; exons, introns and UTRs. (<b>C,D</b>) Difference in proportion unique positions between the regular and bisulfite converted genome. The y-axis in (C) and (D) represents the uniqueness proportion in bisulfite genome subtracted from that in the regular genome. The vertical dashed line marks 35 nucleotide reads.</p