Search CORE

24 research outputs found

A technology-agnostic long-read analysis pipeline for transcriptome discovery and quantification

Author: Balderrama-Gutierrez Gabriela
Chu Sophie
England Whitney
Jiang Shan
Mortazavi Ali
Rahmanian Sorena
Reese Fairlie
Spitale Robert C.
Tenner Andrea
Trout Diane
Williams Brian
Wold Barbara
Wyman Dana
Zeng Weihua
Publication venue
Publication date: 18/06/2019
Field of study

Alternative splicing is widely acknowledged to be a crucial regulator of gene expression and is a key contributor to both normal developmental processes and disease states. While cost-effective and accurate for quantification, short-read RNA-seq lacks the ability to resolve full-length transcript isoforms despite increasingly sophisticated computational methods. Long-read sequencing platforms such as Pacific Biosciences (PacBio) and Oxford Nanopore (ONT) bypass the transcript reconstruction challenges of short-reads. Here we describe TALON, the ENCODE4 pipeline for analyzing PacBio cDNA and ONT direct-RNA transcriptomes. We apply TALON to three human ENCODE Tier 1 cell lines and show that while both technologies perform well at full-transcript discovery and quantification, each technology has its distinct artifacts. We further apply TALON to mouse cortical and hippocampal transcriptomes and find that a substantial proportion of neuronal genes have more reads associated with novel isoforms than annotated ones. The TALON pipeline for technology-agnostic, long-read transcriptome discovery and quantification tracks both known and novel transcript models as well as expression levels across datasets for both simple studies and larger projects such as ENCODE that seek to decode transcriptional regulation in the human and mouse genomes to predict more accurate expression levels of genes and transcripts than possible with short-reads alone

Caltech Authors

A technology-agnostic long-read analysis pipeline for transcriptome discovery and quantification

Author: Balderrama-Gutierrez Gabriela
Chu Sophie
England Whitney
Jiang Shan
Mortazavi Ali
Rahmanian Sorena
Reese Fairlie
Spitale Robert C.
Tenner Andrea
Trout Diane
Williams Brian
Wold Barbara
Wyman Dana
Zeng Weihua
Publication venue
Publication date: 18/06/2019
Field of study

Generation of a humanized Aβ expressing mouse demonstrating aspects of Alzheimer's disease-like pathology.

The majority of Alzheimer’s disease (AD) cases are late-onset and occur sporadically, however most mouse models of the disease harbor pathogenic mutations, rendering them better representations of familial autosomal-dominant forms of the disease. Here, we generated knock-in mice that express wildtype human Aβ under control of the mouse App locus. Remarkably, changing 3 amino acids in the mouse Aβ sequence to its wild-type human counterpart leads to age-dependent impairments in cognition and synaptic plasticity, brain volumetric changes, inflammatory alterations, the appearance of Periodic Acid-Schiff (PAS) granules and changes in gene expression. In addition, when exon 14 encoding the Aβ sequence was flanked by loxP sites we show that Cre-mediated excision of exon 14 ablates hAβ expression, rescues cognition and reduces the formation of PAS granules

Repositorio Institucional Universidad de Málaga

Systematic assessment of long-read RNA-seq methods for transcript identification and quantification

Author: Adams Matthew S
Balderrama-Gutierrez Gabriela
Barnes If
Behera Amit K
Berry Andrew
Birol Inanc
Bostan Hamed
Brooks Angela N
Brooks Ashley M
Capella Salvador
Carbonell-Sala Sílvia
Carninci Piero
Chen Ying
Conesa Ana
De María Maite
Denslow Nancy D
Dhillon Namrita
Diekhans Mark
Du Mei RM
Fai Au Kin
Felton Colette
Fernandez-Gonzalez Jose M
Ferrández-Peral Luis
Frankish Adam
Garcia-Reyero Natàlia
Goetz Stefan
Gonzalez Jose M
Guigó Roderic
Göke Jonathan
Hafezqorani Saber
Hasan Çelik Muhammed
Hernández-Ferrer Carles
Herwig Ralf
Hunt Toby
Hunter Margaret E
Jerryd Meade Marcus
Kawaji Hideya
Kei Wan Yuk
Kondratova Liudmyla
Lagarde Julien
Laird Smith Melissa
Lee Joseph
Li Haoran
Liang Li Jian
Liang Cindy E
Lienhard Matthias
Liu Tianyuan
Loveland Jane E
Martinez-Martin Alessandra
Menor Carlos
Mestre-Tomás Jorge
Mikheenko Alla
Ming Nip Ka
Moraga Amador David A
Mortazavi Ali
Mudge Jonathan M
Mulligan Dennis
Panayotova Nedka G
Paniagua Alejandro
Pardo-Palacios Francisco J
Pertea Mihaela
Prjibelski Andrey D
Reese Fairlie
Repchevsky Dmitry
Ritchie Matthew E
Rouchka Eric
Saint-John Brandon
Sapena Enrique
Sheynkman Gloria M
Sheynkman Leon
Sim Andre D
Suner Marie-Marthe
Takahashi Hazuki
Tang Alison D
Tilgner Hagen U
Vollmers Christopher
Wang Changqing
Wang Dingjie
Williams Brian
Wold Barbara J
Wong Brandon Y
Yang Chen
Youngworth Ingrid Ashley
Publication venue: bioXRiv
Publication date: 27/07/2023
Field of study

The Long-read RNA-Seq Genome Annotation Assessment Project (LRGASP) Consortium was formed to evaluate the effectiveness of long-read approaches for transcriptome analysis. The consortium generated over 427 million long-read sequences from cDNA and direct RNA datasets, encompassing human, mouse, and manatee species, using different protocols and sequencing platforms. These data were utilized by developers to address challenges in transcript isoform detection and quantification, as well as de novo transcript isoform identification. The study revealed that libraries with longer, more accurate sequences produce more accurate transcripts than those with increased read depth, whereas greater read depth improved quantification accuracy. In well-annotated genomes, tools based on reference sequences demonstrated the best performance. When aiming to detect rare and novel transcripts or when using reference-free approaches, incorporating additional orthogonal data and replicate samples are advised. This collaborative study offers a benchmark for current practices and provides direction for future method development in transcriptome analysis

UCL Discovery

Systematic assessment of long-read RNA-seq methods for transcript identification and quantification

Author: Adams Matthew S.
Au Kin Fai
Balderrama-Gutierrez Gabriela
Barnes If
Behera Amit K.
Berry Andrew E.
Birol Inanc
Bostan Hamed
Brooks Angela N.
Brooks Ashley M.
Capella-Gutierrez Salvador
Carbonell-Sala Sílvia
Carninci Piero
Chen Ying
Conesa Ana
Cousineau Alyssa
De María Maite
Denslow Nancy D.
Dhillon Namrita
Diekhans Mark
Du Mei R. M.
Felton Colette
Fernandez-Gonzalez Jose M.
Ferrández-Peral Luis
Frankish Adam
Garcia-Reyero Natàlia
Gonzalez Martinez Jose M.
Guigó Roderic
Göke Jonathan
Götz Stefan
Hafezqorani Saber
Hernández-Ferrer Carles
Herwig Ralf
Hunt Toby
Hunter Margaret E.
Kawaji Hideya
Kondratova Liudmyla
Lagarde Julien
Lee Joseph
Li Haoran
Li Jian-Liang
Liang Cindy E.
Lienhard Matthias
Liu Tianyuan
Loveland Jane E.
Maehr Rene
Martinez-Martin Alessandra
Meade Marcus Jerryd
Menor Carlos
Mestre-Tomás Jorge
Mikheenko Alla
Moraga Amador David A.
Mortazavi Ali
Mudge Jonathan M.
Mulligan Dennis
Nip Ka Ming
Panayotova Nedka G.
Paniagua Alejandro
Pardo-Palacios Francisco J.
Pertea Mihaela
Prjibelski Andrey D.
Reese Fairlie
Ren Xingjie
Repchevsky Dmitry
Ritchie Matthew E.
Rouchka Eric
Saint-John Brandon
Sapena Enrique
Shen Yin
Sheynkman Gloria M.
Sheynkman Leon
Sim Andre D.
Smith Melissa Laird
Suner Marie-Marthe
Takahashi Hazuki
Tang Alison D.
Tilgner Hagen U.
Vollmers Christopher
Wan Yuk Kei
Wang Changqing
Wang Dingjie
Williams Brian
Wold Barbara J.
Wong Brandon Y.
Yang Chen
Youngworth Ingrid A.
Çelik Muhammed Hasan
Publication venue: Nature Research
Publication date: 07/06/2024
Field of study

The Long-read RNA-Seq Genome Annotation Assessment Project Consortium was formed to evaluate the effectiveness of long-read approaches for transcriptome analysis. Using different protocols and sequencing platforms, the consortium generated over 427 million long-read sequences from complementary DNA and direct RNA datasets, encompassing human, mouse and manatee species. Developers utilized these data to address challenges in transcript isoform detection, quantification and de novo transcript detection. The study revealed that libraries with longer, more accurate sequences produce more accurate transcripts than those with increased read depth, whereas greater read depth improved quantification accuracy. In well-annotated genomes, tools based on reference sequences demonstrated the best performance. Incorporating additional orthogonal data and replicate samples is advised when aiming to detect rare and novel transcripts or using reference-free approaches. This collaborative study offers a benchmark for current practices and provides direction for future method development in transcriptome analysis

Online Research @ Cardiff

Multi-tissue integrative analysis of personal epigenomes

Author: Adrian Jessika
Aganezov Sergey
Balderrama-Gutierrez Gabriela
Banskota Samridhi
Bernstein Bradley
Berthel Ana
Borsari Beatrice
Cameron Christopher
Chang Justin
Chee Sora
Chen Zhanlin
Cherry Michael
Chhetri Surya
Choudhary Jyoti
Corona Guillermo
Danyko Cassidy
Davis Carrie
Dobin Alexander
Drenkow Jorg
Epstein Charles
Farid Daniel
Farrell Nina
Gabdank Idan
Galeev Timur
Gao Jiahao
Gaskell Elizabeth
Gerstein Mark
Gillis Jesse
Gingeras Thomas
Gofin Yoel
Gorkin David
Gu Mengting
Guigo Roderic
Gursoy Gamze
Hecht Vivian
Hitz Benjamin
Issner Robbyn
Kirsche Melanie
Kong Xiangmeng
Lam Bonita
Levine Morgan
Li Bian
Li Shantao
Li Tianxiao
Li Xiqi
Lin Khine
Liu Jason
Luo Ruibang
Mackiewicz Mark
Martins Gabriel
Mendenhall Eric
Milosavljevic Aleksandar
Moore Jill
Mortazavi Ali
Mudge Jonathan
Myers Richard
Navarro Fabio
Nelson Nicholas
Noble William
Nusbaum Chad
Popov Ioann
Pratt Henry
Qiu Yunjiang
Ramakrishnan Srividya
Raymond Joe
Ren Bing
Rozowsky Joel
Salichos Leonidas
Scavelli Alexandra
Schatz Michael
Schreiber Jacob
Sedlazeck Fritz
See Lei
Sherman Rachel
Shi Minyi
Shi Xu
Shoresh Noam
Sloan Cricket
Snyder Michael
Strattan Seth
Sun Maxwell
Tan Zhen
Tanaka Forrest
Vlasova Anna
Wang Jun
Weng Zhiping
Werner Jonathan
Williams Brian
Wold Barbara
Wright James
Xiong Kun
Xu Jinrui
Xu Min
Yan Chengfei
Yang Yucheng
Yu Keyang
Yu Lu
Zaleski Christopher
Zhang Jing
Publication venue: 'Cold Spring Harbor Laboratory'
Publication date: 26/04/2021
Field of study

Evaluating the impact of genetic variants on transcriptional regulation is a central goal in biological science that has been constrained by reliance on a single reference genome. To address this, we constructed phased, diploid genomes for four cadaveric donors (using long-read sequencing) and systematically charted noncoding regulatory elements and transcriptional activity across more than 25 tissues from these donors. Integrative analysis revealed over a million variants with allele-specific activity, coordinated, locus-scale allelic imbalances, and structural variants impacting proximal chromatin structure. We relate the personal genome analysis to the ENCODE encyclopedia, annotating allele- and tissue-specific elements that are strongly enriched for variants impacting expression and disease phenotypes. These experimental and statistical approaches, and the corresponding EN-TEx resource, provide a framework for personalized functional genomics

Cold Spring Harbor Laboratory Institutional Repository

Caltech Authors

Recommended from our members

Transcriptome dynamics of neurodegeneration using single-cell and long-read approaches

Author: Balderrama Gutierrez Gabriela
Publication venue: eScholarship, University of California
Publication date: 01/01/2021
Field of study

Alzheimer’s disease is characterized by plaques and tangles that lead to neurodegeneration and dementia. Clinical trials for AD drugs have a high failure rate and could benefit from better mouse models of late onset AD. Changes in gene expression, alternative splicing and chromatin profiles have been described as indicators of the pathology. The focus of this thesis is to characterize already available models of AD using single-cell and long-read transcriptomics. Chapter 2 is a time course of neurodegeneration in the 3xTg-AD mouse, which is the only mouse AD model that has plaques and tangles, similar to human AD. We use bulk RNA-seq in the hippocampus of 3xTg mice to identify distinct gene modules associated with microglia and oligodendrocytes that increase with aging and pathology. We further investigate the changes in cell populations using single-nucleus RNA-seq of the hippocampus and cortex of 3xTg and 5xFAD mice to detect major changes in astrocytes and oligodendrocytes groups. We recover a common path of astrocyte activation with the 5xFAD mouse and find that 3xTg derived astrocytes seem to be at an earlier stage of activation. In order to investigate the activation of microglia in 3xTG, we also generated a single-cell RNA-seq dataset of microglial cells and found multiple subtypes, including a set of microglia with distinct transcription factor expression profile that is associated with an early increase in Csf1 expression before the full onset of DAM gene expression. Finally, scATAC-seq reveals a set of chromatin accessible areas shared across multiple activation states found in the scRNA-seq that matches glial activation processes. Overall, differences between the main glial groups point to a slower activation process in the 3xTg model when compared to the 5xFAD. Our study contributes to the identification of progressive transcriptional changes of glial cells in a model that has plaques and tangles.Single-cell microfluidic systems are optimized for smaller cell types than most cells in the brain, which are also difficult to dissociate. The Split-seq barcode strategy without any microfluidics and fixation steps before cell labeling allows for multiplexed cells and nuclei to be sequenced at the same time. We use Split-seq in Chapter 3 to sequence the transcriptome of 24,270 nuclei as well as single-cell microglia from the cortex and hippocampus of one 24mo female 3xTg-AD mouse. Comparison of Split-seq cell clusters against clusters from our existing time course study of 3xTg-AD (Chapter 2), we recover all of the main cell types and detect genes that were problematic, such as Gfap in astrocytes. However, nuclei from derived microglia lack the major identifiers of DAM, which were detectable at low levels in single-cells. Sub-clustering of Astrocytes recovers 11 distinct clusters including an activation cluster that overlaps not only with previously identified markers such as Gfap but also novel markers such as Thy1 expression. The Split-seq protocol show promise for scaling up future single-cell transcriptomics studies of AD. AD has been extensively characterized using short-read sequencing. However, most studies focus on gene expression changes and rarely analyze isoform changes. Full-length, high-throughput mRNA sequencing using long-read technologies is the best way to explore transcript isoform diversity, as regular short-reads do not provide enough information about the connectivity between distant exons. We explore in Chapter 4 the transcriptome of the mouse C57BL6/J and 5xFAD cortex and hippocampus at 8 months of age. We recover >90% of genes previously associated with the 5xFAD genotype. We further detect 244 and 471 isoform switches in cortex and hippocampus respectively. We also found 194 genes with TSS switches and 714 for TES switches relevant for the 5xFAD genotype. Genes presenting isoform changes include genes such as Csf2ra, Csf1 and Lamp2. Long-read transcriptome analysis of mouse models of disease can provide additional insights into how isoform switches can alter gene activity during disease progression

eScholarship - University of California

Recommended from our members

Transcriptome dynamics of neurodegeneration using single-cell and long-read approaches

Author: Balderrama Gutierrez Gabriela
Publication venue: eScholarship, University of California
Publication date: 01/01/2021
Field of study

eScholarship - University of California

An Infection-Tolerant Mammalian Reservoir for Several Zoonotic Agents Broadly Counters the Inflammatory Effects of Endotoxin.

Author: Balderrama-Gutierrez Gabriela,
Publication venue
Publication date: 25/04/2021
Field of study

Ezid

Peromyscus WGCNA supplement

Author: Balderrama-Gutierrez Gabriela
Publication venue: Dryad
Publication date: 01/01/2020
Field of study

Ezid