133 research outputs found

    Methods to study splicing from high-throughput RNA Sequencing data

    Full text link
    The development of novel high-throughput sequencing (HTS) methods for RNA (RNA-Seq) has provided a very powerful mean to study splicing under multiple conditions at unprecedented depth. However, the complexity of the information to be analyzed has turned this into a challenging task. In the last few years, a plethora of tools have been developed, allowing researchers to process RNA-Seq data to study the expression of isoforms and splicing events, and their relative changes under different conditions. We provide an overview of the methods available to study splicing from short RNA-Seq data. We group the methods according to the different questions they address: 1) Assignment of the sequencing reads to their likely gene of origin. This is addressed by methods that map reads to the genome and/or to the available gene annotations. 2) Recovering the sequence of splicing events and isoforms. This is addressed by transcript reconstruction and de novo assembly methods. 3) Quantification of events and isoforms. Either after reconstructing transcripts or using an annotation, many methods estimate the expression level or the relative usage of isoforms and/or events. 4) Providing an isoform or event view of differential splicing or expression. These include methods that compare relative event/isoform abundance or isoform expression across two or more conditions. 5) Visualizing splicing regulation. Various tools facilitate the visualization of the RNA-Seq data in the context of alternative splicing. In this review, we do not describe the specific mathematical models behind each method. Our aim is rather to provide an overview that could serve as an entry point for users who need to decide on a suitable tool for a specific analysis. We also attempt to propose a classification of the tools according to the operations they do, to facilitate the comparison and choice of methods.Comment: 31 pages, 1 figure, 9 tables. Small corrections adde

    Estimation of alternative splicing isoform frequencies from RNA-Seq data

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Massively parallel whole transcriptome sequencing, commonly referred as RNA-Seq, is quickly becoming the technology of choice for gene expression profiling. However, due to the short read length delivered by current sequencing technologies, estimation of expression levels for alternative splicing gene isoforms remains challenging.</p> <p>Results</p> <p>In this paper we present a novel expectation-maximization algorithm for inference of isoform- and gene-specific expression levels from RNA-Seq data. Our algorithm, referred to as IsoEM, is based on disambiguating information provided by the distribution of insert sizes generated during sequencing library preparation, and takes advantage of base quality scores, strand and read pairing information when available. The open source Java implementation of IsoEM is freely available at <url>http://dna.engr.uconn.edu/software/IsoEM/</url>.</p> <p>Conclusions</p> <p>Empirical experiments on both synthetic and real RNA-Seq datasets show that IsoEM has scalable running time and outperforms existing methods of isoform and gene expression level estimation. Simulation experiments confirm previous findings that, for a fixed sequencing cost, using reads longer than 25-36 bases does not necessarily lead to better accuracy for estimating expression levels of annotated isoforms and genes.</p

    RNA-Seq Mapping and Detection of Gene Fusions with a Suffix Array Algorithm

    Get PDF
    High-throughput RNA sequencing enables quantification of transcripts (both known and novel), exon/exon junctions and fusions of exons from different genes. Discovery of gene fusions–particularly those expressed with low abundance– is a challenge with short- and medium-length sequencing reads. To address this challenge, we implemented an RNA-Seq mapping pipeline within the LifeScope software. We introduced new features including filter and junction mapping, annotation-aided pairing rescue and accurate mapping quality values. We combined this pipeline with a Suffix Array Spliced Read (SASR) aligner to detect chimeric transcripts. Performing paired-end RNA-Seq of the breast cancer cell line MCF-7 using the SOLiD system, we called 40 gene fusions among over 120,000 splicing junctions. We validated 36 of these 40 fusions with TaqMan assays, of which 25 were expressed in MCF-7 but not the Human Brain Reference. An intra-chromosomal gene fusion involving the estrogen receptor alpha gene ESR1, and another involving the RPS6KB1 (Ribosomal protein S6 kinase beta-1) were recurrently expressed in a number of breast tumor cell lines and a clinical tumor sample

    De Novo Transcriptome Sequencing in Anopheles funestus Using Illumina RNA-Seq Technology

    Get PDF
    BACKGROUND: Anopheles funestus is one of the primary vectors of human malaria, which causes a million deaths each year in sub-Saharan Africa. Few scientific resources are available to facilitate studies of this mosquito species and relatively little is known about its basic biology and evolution, making development and implementation of novel disease control efforts more difficult. The An. funestus genome has not been sequenced, so in order to facilitate genome-scale experimental biology, we have sequenced the adult female transcriptome of An. funestus from a newly founded colony in Burkina Faso, West Africa, using the Illumina GAIIx next generation sequencing platform. METHODOLOGY/PRINCIPAL FINDINGS: We assembled short Illumina reads de novo using a novel approach involving iterative de novo assemblies and "target-based" contig clustering. We then selected a conservative set of 15,527 contigs through comparisons to four Dipteran transcriptomes as well as multiple functional and conserved protein domain databases. Comparison to the Anopheles gambiae immune system identified 339 contigs as putative immune genes, thus identifying a large portion of the immune system that can form the basis for subsequent studies of this important malaria vector. We identified 5,434 1:1 orthologues between An. funestus and An. gambiae and found that among these 1:1 orthologues, the protein sequence of those with putative immune function were significantly more diverged than the transcriptome as a whole. Short read alignments to the contig set revealed almost 367,000 genetic polymorphisms segregating in the An. funestus colony and demonstrated the utility of the assembled transcriptome for use in RNA-seq based measurements of gene expression. CONCLUSIONS/SIGNIFICANCE: We developed a pipeline that makes de novo transcriptome sequencing possible in virtually any organism at a very reasonable cost ($6,300 in sequencing costs in our case). We anticipate that our approach could be used to develop genomic resources in a diversity of systems for which full genome sequence is currently unavailable. Our An. funestus contig set and analytical results provide a valuable resource for future studies in this non-model, but epidemiologically critical, vector insect

    The first transcriptome of Italian wall lizard, a new tool to infer about the Island Syndrome

    Get PDF
    Some insular lizards show a high degree of differentiation from their conspecific mainland populations, like Licosa island lizards, which are described as affected by Reversed Island Syndrome (RIS). In previous works, we demonstrated that some traits of RIS, as melanization, depend on a differential expression of gene encoding melanocortin receptors. To better understand the basis of syndrome, and providing raw data for future investigations, we generate the first de novo transcriptome of the Italian wall lizard. Comparing mainland and island transcriptomes, we link differences in life-traits to differential gene expression. Our results, taking together testis and brain sequences, generated 275,310 and 269,885 transcripts, 18,434 and 21,606 proteins in Gene Ontology annotation, for mainland and island respectively. Variant calling analysis identified about the same number of SNPs in island and mainland population. Instead, through a differential gene expression analysis we found some putative genes involved in syndrome more expressed in insular samples like Major Histocompatibility Complex class I, Immunoglobulins, Melanocortin 4 receptor, Neuropeptide Y and Proliferating Cell Nuclear Antigen

    Uterine Gene Expression in the Live-Bearing Lizard, Chalcides ocellatus, Reveals Convergence of Squamate Reptile and Mammalian Pregnancy Mechanisms

    Get PDF
    Although the morphological and physiological changes involved in pregnancy in live-bearing reptiles are well studied, the genetic mechanisms that underlie these changes are not known. We used the viviparous African Ocellated Skink, Chalcides ocellatus, as a model to identify a near complete gene expression profile associated with pregnancy using RNA-Seq analyses of uterine transcriptomes. Pregnancy in C. ocellatus is associated with upregulation of uterine genes involved with metabolism, cell proliferation and death, and cellular transport. Moreover, there are clear parallels between the genetic processes associated with pregnancy in mammals and Chalcides in expression of genes related to tissue remodeling, angiogenesis, immune system regulation, and nutrient provisioning to the embryo. In particular, the pregnant uterine transcriptome is dominated by expression of proteolytic enzymes that we speculate are involved both with remodeling the chorioallantoic placenta and histotrophy in the omphaloplacenta. Elements of the maternal innate immune system are downregulated in the pregnant uterus, indicating a potential mechanism to avoid rejection of the embryo. We found a downregulation of major histocompatability complex loci and estrogen and progesterone receptors in the pregnant uterus. This pattern is similar to mammals but cannot be explained by the mammalian model. The latter finding provides evidence that pregnancy is controlled by different endocrinological mechanisms in mammals and reptiles. Finally, 88% of the identified genes are expressed in both the pregnant and the nonpregnant uterus, and thus, morphological and physiological changes associated with C. ocellatus pregnancy are likely a result of regulation of genes continually expressed in the uterus rather than the initiation of expression of unique genes

    Patterns of Positive Selection and Neutral Evolution in the Protein-Coding Genes of Tetraodon and Takifugu

    Get PDF
    Recent genome-wide analyses have revealed patterns of positive selection acting on protein-coding genes in humans and mammals. To assess whether the conclusions drawn from these analyses are valid for other vertebrates and to identify mammalian specificities, I have investigated the selective pressure acting on protein-coding genes of the puffer fishes Tetraodon and Takifugu. My results indicate that the strength of purifying selection in puffer fishes is similar to previous reports for murids but stronger in hominids, which have a smaller population size. Gene ontology analyses show that more than half of the biological processes targeted by positive selection in mammals are also targeted in puffer fishes, highlighting general patterns for vertebrates. Biological processes enriched with positively selected genes that are shared between mammals and fishes include immune and defense responses, signal transduction, regulation of transcription and several of their descendent terms. Mammalian-specific processes displaying an excess of positively selected genes are related to sensory perception and neurological processes. The comparative analyses also revealed that, for both mammals and fishes, genes encoding extracellular proteins are preferentially targeted by positive selection, indicating that adaptive evolution occurs more often in the extra-cellular environment rather than inside the cell. Moreover, I present here the first genome-wide characterization of neutrally-evolving regions of protein-coding genes. This analysis revealed an unexpectedly high proportion of genes containing both positively selected motifs and neutrally-evolving regions, uncovering a strong link between neutral evolution and positive selection. I speculate that neutrally-evolving regions are a major source of novelties screened by natural selection
    corecore