Search CORE

19 research outputs found

A technology-agnostic long-read analysis pipeline for transcriptome discovery and quantification

Author: Balderrama-Gutierrez Gabriela
Chu Sophie
England Whitney
Jiang Shan
Mortazavi Ali
Rahmanian Sorena
Reese Fairlie
Spitale Robert C.
Tenner Andrea
Trout Diane
Williams Brian
Wold Barbara
Wyman Dana
Zeng Weihua
Publication venue
Publication date: 18/06/2019
Field of study

Alternative splicing is widely acknowledged to be a crucial regulator of gene expression and is a key contributor to both normal developmental processes and disease states. While cost-effective and accurate for quantification, short-read RNA-seq lacks the ability to resolve full-length transcript isoforms despite increasingly sophisticated computational methods. Long-read sequencing platforms such as Pacific Biosciences (PacBio) and Oxford Nanopore (ONT) bypass the transcript reconstruction challenges of short-reads. Here we describe TALON, the ENCODE4 pipeline for analyzing PacBio cDNA and ONT direct-RNA transcriptomes. We apply TALON to three human ENCODE Tier 1 cell lines and show that while both technologies perform well at full-transcript discovery and quantification, each technology has its distinct artifacts. We further apply TALON to mouse cortical and hippocampal transcriptomes and find that a substantial proportion of neuronal genes have more reads associated with novel isoforms than annotated ones. The TALON pipeline for technology-agnostic, long-read transcriptome discovery and quantification tracks both known and novel transcript models as well as expression levels across datasets for both simple studies and larger projects such as ENCODE that seek to decode transcriptional regulation in the human and mouse genomes to predict more accurate expression levels of genes and transcripts than possible with short-reads alone

Caltech Authors

A technology-agnostic long-read analysis pipeline for transcriptome discovery and quantification

Author: Balderrama-Gutierrez Gabriela
Chu Sophie
England Whitney
Jiang Shan
Mortazavi Ali
Rahmanian Sorena
Reese Fairlie
Spitale Robert C.
Tenner Andrea
Trout Diane
Williams Brian
Wold Barbara
Wyman Dana
Zeng Weihua
Publication venue
Publication date: 18/06/2019
Field of study

Facilitation through altered resource availability in a mixed-species rodent malaria infection

Author: Bell
Bruce
Burns
Carter
Cavinato
Cowman
Daubersies
De Roode
Douglas
Fairlie-Clarke
Ferrari
Fraser
Färnert
Genton
Ginouves
Graham
Griffiths
Griffiths
Hamilton
Hamilton
Haydon
Juliano
Killick-Kendrick
Knowles
Looareesuwan
Mayxay
Mcqueen
Mercereau-Puijalon
Metcalf
Mideo
Mideo
Mueller
O'Donnell
Pedersen
Pollitt
Pollitt
Pollitt
Pollitt
R Core Team
Read
Reece
Reece
Reece
Reese
Råberg
Savill
Seixas
Smith
Spence
Timms
Tjitra
Valkiunas
Viney
Wolday
Publication venue: 'Wiley'
Publication date: 01/09/2016
Field of study

A major challenge in disease ecology is to understand how co‐infecting parasite species interact. We manipulate in vivo resources and immunity to explain interactions between two rodent malaria parasites, Plasmodium chabaudi and P. yoelii. These species have analogous resource‐use strategies to the human parasites Plasmodium falciparum and P. vivax: P. chabaudi and P. falciparum infect red blood cells (RBC) of all ages (RBC generalist); P. yoelii and P. vivax preferentially infect young RBCs (RBC specialist). We find that: (1) recent infection with the RBC generalist facilitates the RBC specialist (P. yoelii density is enhanced ~10 fold). This occurs because the RBC generalist increases availability of the RBC specialist's preferred resource; (2) co‐infections with the RBC generalist and RBC specialist are highly virulent; (3) and the presence of an RBC generalist in a host population can increase the prevalence of an RBC specialist. Thus, we show that resources shape how parasite species interact and have epidemiological consequences

Crossref

PubMed Central

Edinburgh Research Explorer

Systematic assessment of long-read RNA-seq methods for transcript identification and quantification

Author: Adams Matthew S
Balderrama-Gutierrez Gabriela
Barnes If
Behera Amit K
Berry Andrew
Birol Inanc
Bostan Hamed
Brooks Angela N
Brooks Ashley M
Capella Salvador
Carbonell-Sala Sílvia
Carninci Piero
Chen Ying
Conesa Ana
De María Maite
Denslow Nancy D
Dhillon Namrita
Diekhans Mark
Du Mei RM
Fai Au Kin
Felton Colette
Fernandez-Gonzalez Jose M
Ferrández-Peral Luis
Frankish Adam
Garcia-Reyero Natàlia
Goetz Stefan
Gonzalez Jose M
Guigó Roderic
Göke Jonathan
Hafezqorani Saber
Hasan Çelik Muhammed
Hernández-Ferrer Carles
Herwig Ralf
Hunt Toby
Hunter Margaret E
Jerryd Meade Marcus
Kawaji Hideya
Kei Wan Yuk
Kondratova Liudmyla
Lagarde Julien
Laird Smith Melissa
Lee Joseph
Li Haoran
Liang Li Jian
Liang Cindy E
Lienhard Matthias
Liu Tianyuan
Loveland Jane E
Martinez-Martin Alessandra
Menor Carlos
Mestre-Tomás Jorge
Mikheenko Alla
Ming Nip Ka
Moraga Amador David A
Mortazavi Ali
Mudge Jonathan M
Mulligan Dennis
Panayotova Nedka G
Paniagua Alejandro
Pardo-Palacios Francisco J
Pertea Mihaela
Prjibelski Andrey D
Reese Fairlie
Repchevsky Dmitry
Ritchie Matthew E
Rouchka Eric
Saint-John Brandon
Sapena Enrique
Sheynkman Gloria M
Sheynkman Leon
Sim Andre D
Suner Marie-Marthe
Takahashi Hazuki
Tang Alison D
Tilgner Hagen U
Vollmers Christopher
Wang Changqing
Wang Dingjie
Williams Brian
Wold Barbara J
Wong Brandon Y
Yang Chen
Youngworth Ingrid Ashley
Publication venue: bioXRiv
Publication date: 27/07/2023
Field of study

The Long-read RNA-Seq Genome Annotation Assessment Project (LRGASP) Consortium was formed to evaluate the effectiveness of long-read approaches for transcriptome analysis. The consortium generated over 427 million long-read sequences from cDNA and direct RNA datasets, encompassing human, mouse, and manatee species, using different protocols and sequencing platforms. These data were utilized by developers to address challenges in transcript isoform detection and quantification, as well as de novo transcript isoform identification. The study revealed that libraries with longer, more accurate sequences produce more accurate transcripts than those with increased read depth, whereas greater read depth improved quantification accuracy. In well-annotated genomes, tools based on reference sequences demonstrated the best performance. When aiming to detect rare and novel transcripts or when using reference-free approaches, incorporating additional orthogonal data and replicate samples are advised. This collaborative study offers a benchmark for current practices and provides direction for future method development in transcriptome analysis

UCL Discovery

Recommended from our members

Characterizing transcript diversity using long-read RNA sequencing

Author: Reese Fairlie
Publication venue: eScholarship, University of California
Publication date: 01/01/2023
Field of study

Alternative transcripts arise from the same gene via alternative TSS usage, splicing, and polyA site choice. Such transcripts can give rise to functional disparities in protein structure, post-transcriptional regulation, and translational efficiency. Moreover, their expression in appropriate spatiotemporal contexts is a key feature of eukaryotic genomes. However, detecting and quantifying these transcript isoforms across tissues, cell types, and species has been challenging due to their longer lengths compared to the short reads typical of standard RNA-seq. In contrast, long-read RNA-seq (LR-RNA-seq) provides complete transcript structures, enabling investigation of transcript features and usage with greater fidelity. Here, I describe my work on application of LR-RNA-seq to characterizing and comparing full-length transcriptomes. First, I describe Swan, a software library I developed to facilitate visualization of full-length transcripts and to compare transcript usage between biological conditions. Next, I describe the ENCODE4 human and mouse LR-RNA-seq datasets, where I applied a novel triplet-based framework to harmonize and classify transcripts that share transcript start sites, exon junction chains, and transcript end sites. Lastly, I discuss the application of our single-nucleus LR-RNA-seq technique (LR-Split-seq) on two geneticallydistinct mouse strains to uncover cell type and genotype-specific transcript usage patterns. Collectively, these projects form a solid foundation for future analyses of long read transcriptomes to quantify changes in transcript diversity and transcript usage between samples, cell types, and genotypes within and between species

eScholarship - University of California

Swan: a library for the analysis and visualization of long-read transcriptomes.

Author: Reese Fairlie,
Publication venue
Publication date: 14/07/2023
Field of study

Ezid

Characterizing transcript diversity using long-read RNA sequencing

Author: Reese Fairlie
Publication venue
Publication date: 01/01/2023
Field of study

Ezid

Recommended from our members

Exon size and sequence conservation improves identification of splice-altering nucleotides.

Author: Forouzmand Elmira
Hertel Klemens
Movassat Maliheh
Reese Fairlie
Publication venue: eScholarship, University of California
Publication date: 01/12/2019
Field of study

Pre-mRNA splicing is regulated through multiple trans-acting splicing factors. These regulators interact with the pre-mRNA at intronic and exonic positions. Given that most exons are protein coding, the evolution of exons must be modulated by a combination of selective coding and splicing pressures. It has previously been demonstrated that selective splicing pressures are more easily deconvoluted when phylogenetic comparisons are made for exons of identical size, suggesting that exon size-filtered sequence alignments may improve identification of nucleotides evolved to mediate efficient exon ligation. To test this hypothesis, an exon size database was created, filtering 76 vertebrate sequence alignments based on exon size conservation. In addition to other genomic parameters, such as splice-site strength, gene position, or flanking intron length, this database permits the identification of exons that are size- and/or sequence-conserved. Highly size-conserved exons are always sequence-conserved. However, sequence conservation does not necessitate exon size conservation. Our analysis identified evolutionarily young exons and demonstrated that length conservation is a strong predictor of alternative splicing. A published data set of approximately 5000 exonic SNPs associated with disease was analyzed to test the hypothesis that exon size-filtered sequence comparisons increase detection of splice-altering nucleotides. Improved splice predictions could be achieved when mutations occur at the third codon position, especially when a mutation decreases exon inclusion efficiency. The results demonstrate that coding pressures dominate nucleotide composition at invariable codon positions and that exon size-filtered sequence alignments permit identification of splice-altering nucleotides at wobble positions

eScholarship - University of California

Recommended from our members

Exon size and sequence conservation improves identification of splice-altering nucleotides.

Author: Forouzmand Elmira
Hertel Klemens J
Movassat Maliheh
Reese Fairlie
Publication venue: eScholarship, University of California
Publication date: 01/12/2019
Field of study

eScholarship - University of California

Systematic assessment of long-read RNA-seq methods for transcript identification and quantification

Author: Carbonell-Sala Silvia
Conesa Ana
Pardo-Palacios Francisco J.
Reese Fairlie
Publication venue: Research Square
Publication date: 01/01/2022
Field of study

Francisco Pardo-Palacios, Fairlie Reese, Silvia Carbonell-Sala: et al.With increased usage of long-read sequencing technologies to perform transcriptome analyses, there becomes a greater need to evaluate different methodologies including library preparation, sequencing platform, and computational analysis tools. Here, we report the study design of a community effort called the Long-read RNA-Seq Genome Annotation Assessment Project (LRGASP) Consortium, whose goals are characterizing the strengths and remaining challenges in using long-read approaches to identify and quantify the transcriptomes of both model and non-model organisms. The LRGASP organizers have generated cDNA and direct RNA datasets in human, mouse, and manatee samples using different protocols followed by sequencing on Illumina, Pacific Biosciences, and Oxford Nanopore Technologies platforms. Participants will use the provided data to submit predictions for three challenges: transcript isoform detection with a high-quality genome, transcript isoform quantification, and de novo transcript isoform identification. Evaluators from different institutions will determine which pipelines have the highest accuracy for a variety of metrics using benchmarks that include spike-in synthetic transcripts, simulated data, and a set of undisclosed, manually curated transcripts by GENCODE. We also describe plans for experimental validation of predictions that are platform-specific and computational tool-specific. We believe that a community effort to evaluate long-read RNA-seq methods will help move the field toward a better consensus on the best approaches to use for transcriptome analyses.N

Digital.CSIC