85 research outputs found

    Estimation of alternative splicing isoform frequencies from RNA-Seq data

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Massively parallel whole transcriptome sequencing, commonly referred as RNA-Seq, is quickly becoming the technology of choice for gene expression profiling. However, due to the short read length delivered by current sequencing technologies, estimation of expression levels for alternative splicing gene isoforms remains challenging.</p> <p>Results</p> <p>In this paper we present a novel expectation-maximization algorithm for inference of isoform- and gene-specific expression levels from RNA-Seq data. Our algorithm, referred to as IsoEM, is based on disambiguating information provided by the distribution of insert sizes generated during sequencing library preparation, and takes advantage of base quality scores, strand and read pairing information when available. The open source Java implementation of IsoEM is freely available at <url>http://dna.engr.uconn.edu/software/IsoEM/</url>.</p> <p>Conclusions</p> <p>Empirical experiments on both synthetic and real RNA-Seq datasets show that IsoEM has scalable running time and outperforms existing methods of isoform and gene expression level estimation. Simulation experiments confirm previous findings that, for a fixed sequencing cost, using reads longer than 25-36 bases does not necessarily lead to better accuracy for estimating expression levels of annotated isoforms and genes.</p

    Methods to study splicing from high-throughput RNA Sequencing data

    Full text link
    The development of novel high-throughput sequencing (HTS) methods for RNA (RNA-Seq) has provided a very powerful mean to study splicing under multiple conditions at unprecedented depth. However, the complexity of the information to be analyzed has turned this into a challenging task. In the last few years, a plethora of tools have been developed, allowing researchers to process RNA-Seq data to study the expression of isoforms and splicing events, and their relative changes under different conditions. We provide an overview of the methods available to study splicing from short RNA-Seq data. We group the methods according to the different questions they address: 1) Assignment of the sequencing reads to their likely gene of origin. This is addressed by methods that map reads to the genome and/or to the available gene annotations. 2) Recovering the sequence of splicing events and isoforms. This is addressed by transcript reconstruction and de novo assembly methods. 3) Quantification of events and isoforms. Either after reconstructing transcripts or using an annotation, many methods estimate the expression level or the relative usage of isoforms and/or events. 4) Providing an isoform or event view of differential splicing or expression. These include methods that compare relative event/isoform abundance or isoform expression across two or more conditions. 5) Visualizing splicing regulation. Various tools facilitate the visualization of the RNA-Seq data in the context of alternative splicing. In this review, we do not describe the specific mathematical models behind each method. Our aim is rather to provide an overview that could serve as an entry point for users who need to decide on a suitable tool for a specific analysis. We also attempt to propose a classification of the tools according to the operations they do, to facilitate the comparison and choice of methods.Comment: 31 pages, 1 figure, 9 tables. Small corrections adde

    The first transcriptome of Italian wall lizard, a new tool to infer about the Island Syndrome

    Get PDF
    Some insular lizards show a high degree of differentiation from their conspecific mainland populations, like Licosa island lizards, which are described as affected by Reversed Island Syndrome (RIS). In previous works, we demonstrated that some traits of RIS, as melanization, depend on a differential expression of gene encoding melanocortin receptors. To better understand the basis of syndrome, and providing raw data for future investigations, we generate the first de novo transcriptome of the Italian wall lizard. Comparing mainland and island transcriptomes, we link differences in life-traits to differential gene expression. Our results, taking together testis and brain sequences, generated 275,310 and 269,885 transcripts, 18,434 and 21,606 proteins in Gene Ontology annotation, for mainland and island respectively. Variant calling analysis identified about the same number of SNPs in island and mainland population. Instead, through a differential gene expression analysis we found some putative genes involved in syndrome more expressed in insular samples like Major Histocompatibility Complex class I, Immunoglobulins, Melanocortin 4 receptor, Neuropeptide Y and Proliferating Cell Nuclear Antigen

    The maternal and early embryonic transcriptome of the milkweed bug Oncopeltus fasciatus

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Most evolutionary developmental biology ("evo-devo") studies of emerging model organisms focus on small numbers of candidate genes cloned individually using degenerate PCR. However, newly available sequencing technologies such as 454 pyrosequencing have recently begun to allow for massive gene discovery in animals without sequenced genomes. Within insects, although large volumes of sequence data are available for holometabolous insects, developmental studies of basally branching hemimetabolous insects typically suffer from low rates of gene discovery.</p> <p>Results</p> <p>We used 454 pyrosequencing to sequence over 500 million bases of cDNA from the ovaries and embryos of the milkweed bug <it>Oncopeltus fasciatus</it>, which lacks a sequenced genome. This indirectly developing insect occupies an important phylogenetic position, branching basal to Diptera (including fruit flies) and Hymenoptera (including honeybees), and is an experimentally tractable model for short-germ development. 2,087,410 reads from both normalized and non-normalized cDNA assembled into 21,097 sequences (isotigs) and 112,531 singletons. The assembled sequences fell into 16,617 unique gene models, and included predictions of splicing isoforms, which we examined experimentally. Discovery of new genes plateaued after assembly of ~1.5 million reads, suggesting that we have sequenced nearly all transcripts present in the cDNA sampled. Many transcripts have been assembled at close to full length, and there is a net gain of sequence data for over half of the pre-existing <it>O. fasciatus </it>accessions for developmental genes in GenBank. We identified 10,775 unique genes, including members of all major conserved metazoan signaling pathways and genes involved in several major categories of early developmental processes. We also specifically address the effects of cDNA normalization on gene discovery in <it>de novo </it>transcriptome analyses.</p> <p>Conclusions</p> <p>Our sequencing, assembly and annotation framework provide a simple and effective way to achieve high-throughput gene discovery for organisms lacking a sequenced genome. These data will have applications to the study of the evolution of arthropod genes and genetic pathways, and to the wider evolution, development and genomics communities working with emerging model organisms.</p> <p>[The sequence data from this study have been submitted to GenBank under study accession number SRP002610 (<url>http://www.ncbi.nlm.nih.gov/sra?term=SRP002610</url>). Custom scripts generated are available at <url>http://www.extavourlab.com/protocols/index.html</url>. Seven Additional files are available.]</p

    RNA-Seq Mapping and Detection of Gene Fusions with a Suffix Array Algorithm

    Get PDF
    High-throughput RNA sequencing enables quantification of transcripts (both known and novel), exon/exon junctions and fusions of exons from different genes. Discovery of gene fusions–particularly those expressed with low abundance– is a challenge with short- and medium-length sequencing reads. To address this challenge, we implemented an RNA-Seq mapping pipeline within the LifeScope software. We introduced new features including filter and junction mapping, annotation-aided pairing rescue and accurate mapping quality values. We combined this pipeline with a Suffix Array Spliced Read (SASR) aligner to detect chimeric transcripts. Performing paired-end RNA-Seq of the breast cancer cell line MCF-7 using the SOLiD system, we called 40 gene fusions among over 120,000 splicing junctions. We validated 36 of these 40 fusions with TaqMan assays, of which 25 were expressed in MCF-7 but not the Human Brain Reference. An intra-chromosomal gene fusion involving the estrogen receptor alpha gene ESR1, and another involving the RPS6KB1 (Ribosomal protein S6 kinase beta-1) were recurrently expressed in a number of breast tumor cell lines and a clinical tumor sample

    Epidemiology of intra-abdominal infection and sepsis in critically ill patients: “AbSeS”, a multinational observational cohort study and ESICM Trials Group Project

    Get PDF
    Purpose: To describe the epidemiology of intra-abdominal infection in an international cohort of ICU patients according to a new system that classifies cases according to setting of infection acquisition (community-acquired, early onset hospital-acquired, and late-onset hospital-acquired), anatomical disruption (absent or present with localized or diffuse peritonitis), and severity of disease expression (infection, sepsis, and septic shock). Methods: We performed a multicenter (n = 309), observational, epidemiological study including adult ICU patients diagnosed with intra-abdominal infection. Risk factors for mortality were assessed by logistic regression analysis. Results: The cohort included 2621 patients. Setting of infection acquisition was community-acquired in 31.6%, early onset hospital-acquired in 25%, and late-onset hospital-acquired in 43.4% of patients. Overall prevalence of antimicrobial resistance was 26.3% and difficult-to-treat resistant Gram-negative bacteria 4.3%, with great variation according to geographic region. No difference in prevalence of antimicrobial resistance was observed according to setting of infection acquisition. Overall mortality was 29.1%. Independent risk factors for mortality included late-onset hospital-acquired infection, diffuse peritonitis, sepsis, septic shock, older age, malnutrition, liver failure, congestive heart failure, antimicrobial resistance (either methicillin-resistant Staphylococcus aureus, vancomycin-resistant enterococci, extended-spectrum beta-lactamase-producing Gram-negative bacteria, or carbapenem-resistant Gram-negative bacteria) and source control failure evidenced by either the need for surgical revision or persistent inflammation. Conclusion: This multinational, heterogeneous cohort of ICU patients with intra-abdominal infection revealed that setting of infection acquisition, anatomical disruption, and severity of disease expression are disease-specific phenotypic characteristics associated with outcome, irrespective of the type of infection. Antimicrobial resistance is equally common in community-acquired as in hospital-acquired infection

    De Novo Analysis of Transcriptome Dynamics in the Migratory Locust during the Development of Phase Traits

    Get PDF
    Locusts exhibit remarkable density-dependent phenotype (phase) changes from the solitary to the gregarious, making them one of the most destructive agricultural pests. This phenotype polyphenism arises from a single genome and diverse transcriptomes in different conditions. Here we report a de novo transcriptome for the migratory locust and a comprehensive, representative core gene set. We carried out assembly of 21.5 Gb Illumina reads, generated 72,977 transcripts with N50 2,275 bp and identified 11,490 locust protein-coding genes. Comparative genomics analysis with eight other sequenced insects was carried out to indentify the genomic divergence between hemimetabolous and holometabolous insects for the first time and 18 genes relevant to development was found. We further utilized the quantitative feature of RNA-seq to measure and compare gene expression among libraries. We first discovered how divergence in gene expression between two phases progresses as locusts develop and identified 242 transcripts as candidates for phase marker genes. Together with the detailed analysis of deep sequencing data of the 4th instar, we discovered a phase-dependent divergence of biological investment in the molecular level. Solitary locusts have higher activity in biosynthetic pathways while gregarious locusts show higher activity in environmental interaction, in which genes and pathways associated with regulation of neurotransmitter activities, such as neurotransmitter receptors, synthetase, transporters, and GPCR signaling pathways, are strongly involved. Our study, as the largest de novo transcriptome to date, with optimization of sequencing and assembly strategy, can further facilitate the application of de novo transcriptome. The locust transcriptome enriches genetic resources for hemimetabolous insects and our understanding of the origin of insect metamorphosis. Most importantly, we identified genes and pathways that might be involved in locust development and phase change, and may thus benefit pest management

    Uterine Gene Expression in the Live-Bearing Lizard, Chalcides ocellatus, Reveals Convergence of Squamate Reptile and Mammalian Pregnancy Mechanisms

    Get PDF
    Although the morphological and physiological changes involved in pregnancy in live-bearing reptiles are well studied, the genetic mechanisms that underlie these changes are not known. We used the viviparous African Ocellated Skink, Chalcides ocellatus, as a model to identify a near complete gene expression profile associated with pregnancy using RNA-Seq analyses of uterine transcriptomes. Pregnancy in C. ocellatus is associated with upregulation of uterine genes involved with metabolism, cell proliferation and death, and cellular transport. Moreover, there are clear parallels between the genetic processes associated with pregnancy in mammals and Chalcides in expression of genes related to tissue remodeling, angiogenesis, immune system regulation, and nutrient provisioning to the embryo. In particular, the pregnant uterine transcriptome is dominated by expression of proteolytic enzymes that we speculate are involved both with remodeling the chorioallantoic placenta and histotrophy in the omphaloplacenta. Elements of the maternal innate immune system are downregulated in the pregnant uterus, indicating a potential mechanism to avoid rejection of the embryo. We found a downregulation of major histocompatability complex loci and estrogen and progesterone receptors in the pregnant uterus. This pattern is similar to mammals but cannot be explained by the mammalian model. The latter finding provides evidence that pregnancy is controlled by different endocrinological mechanisms in mammals and reptiles. Finally, 88% of the identified genes are expressed in both the pregnant and the nonpregnant uterus, and thus, morphological and physiological changes associated with C. ocellatus pregnancy are likely a result of regulation of genes continually expressed in the uterus rather than the initiation of expression of unique genes
    corecore