105 research outputs found

    Methods to study splicing from high-throughput RNA Sequencing data

    Full text link
    The development of novel high-throughput sequencing (HTS) methods for RNA (RNA-Seq) has provided a very powerful mean to study splicing under multiple conditions at unprecedented depth. However, the complexity of the information to be analyzed has turned this into a challenging task. In the last few years, a plethora of tools have been developed, allowing researchers to process RNA-Seq data to study the expression of isoforms and splicing events, and their relative changes under different conditions. We provide an overview of the methods available to study splicing from short RNA-Seq data. We group the methods according to the different questions they address: 1) Assignment of the sequencing reads to their likely gene of origin. This is addressed by methods that map reads to the genome and/or to the available gene annotations. 2) Recovering the sequence of splicing events and isoforms. This is addressed by transcript reconstruction and de novo assembly methods. 3) Quantification of events and isoforms. Either after reconstructing transcripts or using an annotation, many methods estimate the expression level or the relative usage of isoforms and/or events. 4) Providing an isoform or event view of differential splicing or expression. These include methods that compare relative event/isoform abundance or isoform expression across two or more conditions. 5) Visualizing splicing regulation. Various tools facilitate the visualization of the RNA-Seq data in the context of alternative splicing. In this review, we do not describe the specific mathematical models behind each method. Our aim is rather to provide an overview that could serve as an entry point for users who need to decide on a suitable tool for a specific analysis. We also attempt to propose a classification of the tools according to the operations they do, to facilitate the comparison and choice of methods.Comment: 31 pages, 1 figure, 9 tables. Small corrections adde

    The maternal and early embryonic transcriptome of the milkweed bug Oncopeltus fasciatus

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Most evolutionary developmental biology ("evo-devo") studies of emerging model organisms focus on small numbers of candidate genes cloned individually using degenerate PCR. However, newly available sequencing technologies such as 454 pyrosequencing have recently begun to allow for massive gene discovery in animals without sequenced genomes. Within insects, although large volumes of sequence data are available for holometabolous insects, developmental studies of basally branching hemimetabolous insects typically suffer from low rates of gene discovery.</p> <p>Results</p> <p>We used 454 pyrosequencing to sequence over 500 million bases of cDNA from the ovaries and embryos of the milkweed bug <it>Oncopeltus fasciatus</it>, which lacks a sequenced genome. This indirectly developing insect occupies an important phylogenetic position, branching basal to Diptera (including fruit flies) and Hymenoptera (including honeybees), and is an experimentally tractable model for short-germ development. 2,087,410 reads from both normalized and non-normalized cDNA assembled into 21,097 sequences (isotigs) and 112,531 singletons. The assembled sequences fell into 16,617 unique gene models, and included predictions of splicing isoforms, which we examined experimentally. Discovery of new genes plateaued after assembly of ~1.5 million reads, suggesting that we have sequenced nearly all transcripts present in the cDNA sampled. Many transcripts have been assembled at close to full length, and there is a net gain of sequence data for over half of the pre-existing <it>O. fasciatus </it>accessions for developmental genes in GenBank. We identified 10,775 unique genes, including members of all major conserved metazoan signaling pathways and genes involved in several major categories of early developmental processes. We also specifically address the effects of cDNA normalization on gene discovery in <it>de novo </it>transcriptome analyses.</p> <p>Conclusions</p> <p>Our sequencing, assembly and annotation framework provide a simple and effective way to achieve high-throughput gene discovery for organisms lacking a sequenced genome. These data will have applications to the study of the evolution of arthropod genes and genetic pathways, and to the wider evolution, development and genomics communities working with emerging model organisms.</p> <p>[The sequence data from this study have been submitted to GenBank under study accession number SRP002610 (<url>http://www.ncbi.nlm.nih.gov/sra?term=SRP002610</url>). Custom scripts generated are available at <url>http://www.extavourlab.com/protocols/index.html</url>. Seven Additional files are available.]</p

    The first transcriptome of Italian wall lizard, a new tool to infer about the Island Syndrome

    Get PDF
    Some insular lizards show a high degree of differentiation from their conspecific mainland populations, like Licosa island lizards, which are described as affected by Reversed Island Syndrome (RIS). In previous works, we demonstrated that some traits of RIS, as melanization, depend on a differential expression of gene encoding melanocortin receptors. To better understand the basis of syndrome, and providing raw data for future investigations, we generate the first de novo transcriptome of the Italian wall lizard. Comparing mainland and island transcriptomes, we link differences in life-traits to differential gene expression. Our results, taking together testis and brain sequences, generated 275,310 and 269,885 transcripts, 18,434 and 21,606 proteins in Gene Ontology annotation, for mainland and island respectively. Variant calling analysis identified about the same number of SNPs in island and mainland population. Instead, through a differential gene expression analysis we found some putative genes involved in syndrome more expressed in insular samples like Major Histocompatibility Complex class I, Immunoglobulins, Melanocortin 4 receptor, Neuropeptide Y and Proliferating Cell Nuclear Antigen

    De novo characterization of the gametophyte transcriptome in bracken fern, Pteridium aquilinum

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Because of their phylogenetic position and unique characteristics of their biology and life cycle, ferns represent an important lineage for studying the evolution of land plants. Large and complex genomes in ferns combined with the absence of economically important species have been a barrier to the development of genomic resources. However, high throughput sequencing technologies are now being widely applied to non-model species. We leveraged the Roche 454 GS-FLX Titanium pyrosequencing platform in sequencing the gametophyte transcriptome of bracken fern (<it>Pteridium aquilinum</it>) to develop genomic resources for evolutionary studies.</p> <p>Results</p> <p>681,722 quality and adapter trimmed reads totaling 254 Mbp were assembled <it>de novo </it>into 56,256 unique sequences (i.e. unigenes) with a mean length of 547.2 bp and a total assembly size of 30.8 Mbp with an average read-depth coverage of 7.0Γ—. We estimate that 87% of the complete transcriptome has been sequenced and that all transcripts have been tagged. 61.8% of the unigenes had blastx hits in the NCBI nr protein database, representing 22,596 unique best hits. The longest open reading frame in 52.2% of the unigenes had positive domain matches in InterProScan searches. We assigned 46.2% of the unigenes with a GO functional annotation and 16.0% with an enzyme code annotation. Enzyme codes were used to retrieve and color KEGG pathway maps. A comparative genomics approach revealed a substantial proportion of genes expressed in bracken gametophytes to be shared across the genomes of <it>Arabidopsis</it>, <it>Selaginella </it>and <it>Physcomitrella</it>, and identified a substantial number of potentially novel fern genes. By comparing the list of <it>Arabidopsis </it>genes identified by blast with a list of gametophyte-specific <it>Arabidopsis </it>genes taken from the literature, we identified a set of potentially conserved gametophyte specific genes. We screened unigenes for repetitive sequences to identify 548 potentially-amplifiable simple sequence repeat loci and 689 expressed transposable elements.</p> <p>Conclusions</p> <p>This study is the first comprehensive transcriptome analysis for a fern and represents an important scientific resource for comparative evolutionary and functional genomics studies in land plants. We demonstrate the utility of high-throughput sequencing of a normalized cDNA library for <it>de novo </it>transcriptome characterization and gene discovery in a non-model plant.</p

    De Novo Transcriptome Sequencing in Anopheles funestus Using Illumina RNA-Seq Technology

    Get PDF
    BACKGROUND: Anopheles funestus is one of the primary vectors of human malaria, which causes a million deaths each year in sub-Saharan Africa. Few scientific resources are available to facilitate studies of this mosquito species and relatively little is known about its basic biology and evolution, making development and implementation of novel disease control efforts more difficult. The An. funestus genome has not been sequenced, so in order to facilitate genome-scale experimental biology, we have sequenced the adult female transcriptome of An. funestus from a newly founded colony in Burkina Faso, West Africa, using the Illumina GAIIx next generation sequencing platform. METHODOLOGY/PRINCIPAL FINDINGS: We assembled short Illumina reads de novo using a novel approach involving iterative de novo assemblies and "target-based" contig clustering. We then selected a conservative set of 15,527 contigs through comparisons to four Dipteran transcriptomes as well as multiple functional and conserved protein domain databases. Comparison to the Anopheles gambiae immune system identified 339 contigs as putative immune genes, thus identifying a large portion of the immune system that can form the basis for subsequent studies of this important malaria vector. We identified 5,434 1:1 orthologues between An. funestus and An. gambiae and found that among these 1:1 orthologues, the protein sequence of those with putative immune function were significantly more diverged than the transcriptome as a whole. Short read alignments to the contig set revealed almost 367,000 genetic polymorphisms segregating in the An. funestus colony and demonstrated the utility of the assembled transcriptome for use in RNA-seq based measurements of gene expression. CONCLUSIONS/SIGNIFICANCE: We developed a pipeline that makes de novo transcriptome sequencing possible in virtually any organism at a very reasonable cost ($6,300 in sequencing costs in our case). We anticipate that our approach could be used to develop genomic resources in a diversity of systems for which full genome sequence is currently unavailable. Our An. funestus contig set and analytical results provide a valuable resource for future studies in this non-model, but epidemiologically critical, vector insect

    RNA-Seq Mapping and Detection of Gene Fusions with a Suffix Array Algorithm

    Get PDF
    High-throughput RNA sequencing enables quantification of transcripts (both known and novel), exon/exon junctions and fusions of exons from different genes. Discovery of gene fusions–particularly those expressed with low abundance– is a challenge with short- and medium-length sequencing reads. To address this challenge, we implemented an RNA-Seq mapping pipeline within the LifeScope software. We introduced new features including filter and junction mapping, annotation-aided pairing rescue and accurate mapping quality values. We combined this pipeline with a Suffix Array Spliced Read (SASR) aligner to detect chimeric transcripts. Performing paired-end RNA-Seq of the breast cancer cell line MCF-7 using the SOLiD system, we called 40 gene fusions among over 120,000 splicing junctions. We validated 36 of these 40 fusions with TaqMan assays, of which 25 were expressed in MCF-7 but not the Human Brain Reference. An intra-chromosomal gene fusion involving the estrogen receptor alpha gene ESR1, and another involving the RPS6KB1 (Ribosomal protein S6 kinase beta-1) were recurrently expressed in a number of breast tumor cell lines and a clinical tumor sample

    Patterns of Positive Selection and Neutral Evolution in the Protein-Coding Genes of Tetraodon and Takifugu

    Get PDF
    Recent genome-wide analyses have revealed patterns of positive selection acting on protein-coding genes in humans and mammals. To assess whether the conclusions drawn from these analyses are valid for other vertebrates and to identify mammalian specificities, I have investigated the selective pressure acting on protein-coding genes of the puffer fishes Tetraodon and Takifugu. My results indicate that the strength of purifying selection in puffer fishes is similar to previous reports for murids but stronger in hominids, which have a smaller population size. Gene ontology analyses show that more than half of the biological processes targeted by positive selection in mammals are also targeted in puffer fishes, highlighting general patterns for vertebrates. Biological processes enriched with positively selected genes that are shared between mammals and fishes include immune and defense responses, signal transduction, regulation of transcription and several of their descendent terms. Mammalian-specific processes displaying an excess of positively selected genes are related to sensory perception and neurological processes. The comparative analyses also revealed that, for both mammals and fishes, genes encoding extracellular proteins are preferentially targeted by positive selection, indicating that adaptive evolution occurs more often in the extra-cellular environment rather than inside the cell. Moreover, I present here the first genome-wide characterization of neutrally-evolving regions of protein-coding genes. This analysis revealed an unexpectedly high proportion of genes containing both positively selected motifs and neutrally-evolving regions, uncovering a strong link between neutral evolution and positive selection. I speculate that neutrally-evolving regions are a major source of novelties screened by natural selection

    De Novo Analysis of Transcriptome Dynamics in the Migratory Locust during the Development of Phase Traits

    Get PDF
    Locusts exhibit remarkable density-dependent phenotype (phase) changes from the solitary to the gregarious, making them one of the most destructive agricultural pests. This phenotype polyphenism arises from a single genome and diverse transcriptomes in different conditions. Here we report a de novo transcriptome for the migratory locust and a comprehensive, representative core gene set. We carried out assembly of 21.5 Gb Illumina reads, generated 72,977 transcripts with N50 2,275 bp and identified 11,490 locust protein-coding genes. Comparative genomics analysis with eight other sequenced insects was carried out to indentify the genomic divergence between hemimetabolous and holometabolous insects for the first time and 18 genes relevant to development was found. We further utilized the quantitative feature of RNA-seq to measure and compare gene expression among libraries. We first discovered how divergence in gene expression between two phases progresses as locusts develop and identified 242 transcripts as candidates for phase marker genes. Together with the detailed analysis of deep sequencing data of the 4th instar, we discovered a phase-dependent divergence of biological investment in the molecular level. Solitary locusts have higher activity in biosynthetic pathways while gregarious locusts show higher activity in environmental interaction, in which genes and pathways associated with regulation of neurotransmitter activities, such as neurotransmitter receptors, synthetase, transporters, and GPCR signaling pathways, are strongly involved. Our study, as the largest de novo transcriptome to date, with optimization of sequencing and assembly strategy, can further facilitate the application of de novo transcriptome. The locust transcriptome enriches genetic resources for hemimetabolous insects and our understanding of the origin of insect metamorphosis. Most importantly, we identified genes and pathways that might be involved in locust development and phase change, and may thus benefit pest management
    • …
    corecore