Search CORE

17 research outputs found

Systematic assessment of long-read RNA-seq methods for transcript identification and quantification

Author: Adams Matthew S
Balderrama-Gutierrez Gabriela
Barnes If
Behera Amit K
Berry Andrew
Birol Inanc
Bostan Hamed
Brooks Angela N
Brooks Ashley M
Capella Salvador
Carbonell-Sala Sílvia
Carninci Piero
Chen Ying
Conesa Ana
De María Maite
Denslow Nancy D
Dhillon Namrita
Diekhans Mark
Du Mei RM
Fai Au Kin
Felton Colette
Fernandez-Gonzalez Jose M
Ferrández-Peral Luis
Frankish Adam
Garcia-Reyero Natàlia
Goetz Stefan
Gonzalez Jose M
Guigó Roderic
Göke Jonathan
Hafezqorani Saber
Hasan Çelik Muhammed
Hernández-Ferrer Carles
Herwig Ralf
Hunt Toby
Hunter Margaret E
Jerryd Meade Marcus
Kawaji Hideya
Kei Wan Yuk
Kondratova Liudmyla
Lagarde Julien
Laird Smith Melissa
Lee Joseph
Li Haoran
Liang Li Jian
Liang Cindy E
Lienhard Matthias
Liu Tianyuan
Loveland Jane E
Martinez-Martin Alessandra
Menor Carlos
Mestre-Tomás Jorge
Mikheenko Alla
Ming Nip Ka
Moraga Amador David A
Mortazavi Ali
Mudge Jonathan M
Mulligan Dennis
Panayotova Nedka G
Paniagua Alejandro
Pardo-Palacios Francisco J
Pertea Mihaela
Prjibelski Andrey D
Reese Fairlie
Repchevsky Dmitry
Ritchie Matthew E
Rouchka Eric
Saint-John Brandon
Sapena Enrique
Sheynkman Gloria M
Sheynkman Leon
Sim Andre D
Suner Marie-Marthe
Takahashi Hazuki
Tang Alison D
Tilgner Hagen U
Vollmers Christopher
Wang Changqing
Wang Dingjie
Williams Brian
Wold Barbara J
Wong Brandon Y
Yang Chen
Youngworth Ingrid Ashley
Publication venue: bioXRiv
Publication date: 27/07/2023
Field of study

The Long-read RNA-Seq Genome Annotation Assessment Project (LRGASP) Consortium was formed to evaluate the effectiveness of long-read approaches for transcriptome analysis. The consortium generated over 427 million long-read sequences from cDNA and direct RNA datasets, encompassing human, mouse, and manatee species, using different protocols and sequencing platforms. These data were utilized by developers to address challenges in transcript isoform detection and quantification, as well as de novo transcript isoform identification. The study revealed that libraries with longer, more accurate sequences produce more accurate transcripts than those with increased read depth, whereas greater read depth improved quantification accuracy. In well-annotated genomes, tools based on reference sequences demonstrated the best performance. When aiming to detect rare and novel transcripts or when using reference-free approaches, incorporating additional orthogonal data and replicate samples are advised. This collaborative study offers a benchmark for current practices and provides direction for future method development in transcriptome analysis

UCL Discovery

Recommended from our members

Comprehensive molecular characterization of gastric adenocarcinoma

Author: Abdel-Misih Raafat
Ajani Jaffer
Akbani Rehan
Albert Monique
Alexopoulou Iakovina
Ally Adrian
Alonso Shelley
Askoy B. Arman
Ayala Brenda
Balasundaram Miruna
Bartlett John
Bass Adam J.
Baylin Stephen B.
Beer David G.
Belyaev Smitry
Bennett Joseph
Benz Christopher
Bernard Brady
Beroukhim Rameen
Birol Inanc
Black Aaron D.
Bootwalla Moiz S.
Boussioutas Alex
Bowen Jay
Bowlby Reanne
Bristow Christopher A.
Brooks Denise
Brown Jennifer
Brzezinski Jakub
Burton Robert
Butterfield Yaron S. N.
Camargo M. Constanza
Carlsen Rebecca
Carney Julie Ann
Carter Scott L.
Cheong Jae-Ho
Cherniack Andrew
Cherniack Andrew D.
Chin Lynda
Cho Eunjung
Cho Juok
Chu Andy
Chu Justin
Chuah Eric
Chudamani Sudha
Chun Hye-Jung E.
Cibulskis Kristian
Ciriello Giovanni
Clarke Amanda
Crain Daniel
Curely Erin
Curley Erin
Curtis Christina
Davidsen Tanja
Demchok John A.
Dhalla Noreen
Dhir Rajiv
DiCara Daniel
Ding Li
Dolzhansky Oleg
Dresdner Gideon
Eley Greg
Engel Jay
Fedosenko Konstantin
Fisher Sheila
Frazer Scott
Gabriel Stacey B.
Gao Jianjiong
Gardner Johanna
Garman Katherine
Gastier-Foster Julie M.
Gehlenborg Nils
Getz Gad
Gross Benjamin
Guin Ranabir
Gulley Margaret
Hadjipanayis Angela
Haussler David
Heiman David I.
Helsel Carmen
Herman James G.
Hinoue Toshinori
Holt Robert A.
Hutter Carolyn M.
Iacocca Mary
Ibbs Matthew
Iype Lisa
Jacobsen Anders
Janjigian Yelena Y.
Jensen Mark A.
Jones Steven J.M.
Jung Joonil
Kasaian Katayoon
Kelsen David P.
Kemkes Ariane
Kim Hark K.
Kim Jaegil
Kim Jihun
Kim Sang-Bae
Korski Konstanty
Kramer Roger W.
Kreisberg Richard
Kucherlapati Raju
Kwon Sun-Young
Kycler Witold
Ladanyi Marc
Lai Phillip H.
Laird Peter W.
Lander Eric S.
Landreneau Rodney
Lau Kevin
Lawrence Michael S.
Lee Darlene
Lee Jae-Hyuk
Lee Ju-Seog
Lee Semin
Lee William
Leiserson Mark D. M.
Leporowska Ewa
Leraas Kristen M.
Li Haiyan A.
Lichtenberg Tara M.
Lichtenstein Lee
Lim Emilia
Lin Pei
Ling Shiyun
Liu Jia
Liu Wenbin
Liu Yingchun
Lu Yiling
Luketich James
Ma Yussanne
Mackiewicz Andrzej
Mahadeshwar Harshad S.
Mallery David
Manikhas Georgy
Marra Marco A.
Mayo Michael
McAllister Cynthia
McCall Shannon J.
McLellan Michael
Meyerson Matthew
Miller Michael
Mills Shaw Kenna R.
Mills Gordon
Mills Gordon B.
Moore Richard A.
Morris Scott
Mungall Andrew J.
Mungall Karen L.
Murawa Dawid
Murawa Pawel
Murray Bradley A.
Ng Sam
Ng Santa Cruz Sam
Nip Ka Ming
Niu Beifang
Noble Michael S.
Odze Robert
Ojesina Akinyemi I.
Pantazi Angeliki
Parfenov Michael
Park Do-Youn
Park Peter J.
Park Young S.
Paulauskis Joseph
Pedamallu Chandra
Pedamallu Chandra Sekhar
Pennathur Arjun
Penny Robert
Piazuelo M. Blanca
Pihl Todd
Potapova Olga
Protopopov Alexei
Rabeno Brenda
Rabkin Charles S.
Raman Rohini
Ramirez Nilsa C.
Ramirez Ricardo
Rao Arvind
Raphael Benjamin J.
Rathmell W. Kimryn
Ren Xiaojia
Reynolds Sheila M.
Robertson A. Gordon
Rosenberg Mara
Rovira Hector
Sakai Ryo
Saksena Gordon
Sander Chris
Santoso Netty
Schein Jacqueline E.
Schneider Barbara G.
Schultz Nikolaus
Schumacher Steven E.
Seidman Jonathan
Senbabaoglu Yasin
Seth Sahil
Shelton Candace
Shelton Troy
Shen Hui
Shen Ronglai
Sherman Mark
Sheth Margi
Shmulevich Ilya
Sinha Rileen
Sipahimalani Payal
Sofia Heidi J.
Song Xingzhi
Sougnez Carrie
Spychała Arkadiusz
Stojanov Petar
Stuart Josh M.
Suchorska Wiktoria M.
Sumer S. Onur
Sun Yichao
Tabak Barbara
Tabler Teresa R.
Tam Angela
Tang Jiabin
Tang Laura
Tarnuzzer Roy
Tasman Natalie
Tatka Honorata
Taylor Barry S.
Taylor-Weiner Amaro
Teresiak Marek
Thiessen Nina
Thorsson Vesteinn
Thorsson Vésteinn
Triche Timothy
Van Den Berg David J.
Verhaak Roeland G.W.
Voet Doug
Voronina Olga
Walton Jessica
Wan Yunhu
Wang Zhining
Weaver Stephanie
Weinhold Nils
Weinstein John N.
Weisenberger Daniel J.
Willis Joseph E.
Wise Lisa
Wiznerowicz Maciej
Wu Hsin-Ta
Xi Ruibin
Xu Andrew W.
Yang Da
Yang Liming
Yang Lixing
Zack Travis I.
Zenklusen Jean Claude
Zhang Hailei
Zhang Jianhua
Zhang Wei
Zmuda Erik
Zou Lihua
ŁaŸniak Radoslaw
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/10/2014
Field of study

Gastric cancer is a leading cause of cancer deaths, but analysis of its molecular and clinical characteristics has been complicated by histological and aetiological heterogeneity. Here we describe a comprehensive molecular evaluation of 295 primary gastric adenocarcinomas as part of The Cancer Genome Atlas (TCGA) project. We propose a molecular classification dividing gastric cancer into four subtypes: tumours positive for Epstein–Barr virus, which display recurrent PIK3CA mutations, extreme DNA hypermethylation, and amplification of JAK2, CD274 (also known as PD-L1) and PDCD1LG2 (also knownasPD-L2); microsatellite unstable tumours, which show elevated mutation rates, including mutations of genes encoding targetable oncogenic signalling proteins; genomically stable tumours, which are enriched for the diffuse histological variant and mutations of RHOA or fusions involving RHO-family GTPase-activating proteins; and tumours with chromosomal instability, which show marked aneuploidy and focal amplification of receptor tyrosine kinases. Identification of these subtypes provides a roadmap for patient stratification and trials of targeted therapies

Harvard University - DASH

RNA-Bloom : de novo RNA-seq assembly with Bloom filters

Author: Nip Ka Ming
Publication venue: University of British Columbia Press
Publication date: 01/11/2017
Field of study

High-throughput RNA sequencing (RNA-seq) is primarily used in measuring gene expression, quantifying transcript abundance, and building reference transcriptomes. Without bias from a reference sequence, de novo RNA-seq assembly is particularly useful for building new reference transcriptomes, detecting fusion genes, and discovering novel spliced transcripts. This is a challenging problem, and to address it at least eight approaches, including Trans-ABySS and Trinity, were developed within the past decade. For instance, using Trinity and 12 CPUs, it takes approximately one and a half day to assemble a human RNA-seq sample of over 100 million read pairs and requires up to 80 GB of memory. While the high memory usage typical of de novo RNA-seq assemblers may be alleviated by distributed computing, access to a high-performance computing environment is a requirement that may be limiting for smaller labs. In my thesis, I present a novel de novo RNA-seq assembler, “RNA-Bloom,” which utilizes compact data structures based on Bloom filters for the storage of k-mer counts and the de Bruijn graph in memory. Compared to Trans-ABySS and Trinity, RNA-Bloom can assemble a human transcriptome with comparable accuracy using nearly half as much memory and half the wall-clock time with 12 threads.Science, Faculty ofGraduat

University of British Columbia: cIRcle - UBC's Information Repository

Transcriptome assembly and visualization for RNA-sequencing data

Author: Nip Ka Ming
Publication venue: University of British Columbia Press
Publication date: 01/05/2023
Field of study

Since its introduction, RNA-sequencing has allowed us to interrogate the transcriptome of an organism, thereby advancing our understanding of cell biology and diseases. Typically, raw RNA-sequencing data is processed via computational methods, such as transcriptome assembly and visualization, to extract meaningful information. Transcriptome assembly aims to reconstruct full-length transcript sequences from RNA-sequencing reads, which are usually short fragments of the corresponding transcripts. Transcriptome visualization provides a platform for exploring and recognizing patterns in transcriptomic data. Transcriptome assembly and visualization tools have been instrumental in identification of gene structures, annotation of draft genomes, and discovery of molecular markers in diseases. Single-cell RNA-sequencing has enabled us to investigate transcriptome heterogeneity within a tissue sample containing up to a million cells. However, single-cell transcriptome analyses have been predominantly performed at the gene level instead of at the isoform level. In my thesis, I present computational solutions for transcriptome assembly and visualization of single-cell RNA-sequencing data thus enabling isoform-level analysis in single cell transcriptomes. Long-read RNA-sequencing technologies have gained traction in transcriptomic research in recent years as their throughput and data quality improved tremendously. Long-read sequencing is particularly useful in transcriptome assembly because its reads can potentially span multiple exons, which simplifies the transcriptome assembly problem. Reference-free assembly for long-read data is a computationally expensive task due to the long read lengths and high base error rates. In my thesis, I present a fast and memory-efficient reference-free assembly method for long-read RNA-sequencing data.Science, Faculty ofGraduat

University of British Columbia: cIRcle - UBC's Information Repository

Differential Hive Plots: Seeing Networks Change

Author: Inanc Birol
Ka Ming Nip
Marco Marra
Martin Krzywinski
Publication venue: 'MIT Press - Journals'
Publication date
Field of study

Crossref

RNA-Bloom enables reference-free and reference-guided sequence assembly for single-cell transcriptomes

Author: Chen Yang
Hamid Mohamadi
Inanc Birol
Justin Chu
Ka Ming Nip
Readman Chiu
René L. Warren
Publication venue: 'Cold Spring Harbor Laboratory'
Publication date
Field of study

Crossref

RResolver: efficient short-read repeat resolution within ABySS

Author: Afshinfard Amirhossein
Birol Inanc
Chu Justin
Coombe Lauren
Nikolić Vladimir
Nip Ka Ming
Warren René L.
Wong Johnathan
Publication venue: BioMed Central
Publication date: 21/06/2022
Field of study

Background De novo genome assembly is essential to modern genomics studies. As it is not biased by a reference, it is also a useful method for studying genomes with high variation, such as cancer genomes. De novo short-read assemblers commonly use de Bruijn graphs, where nodes are sequences of equal length k, also known as k-mers. Edges in this graph are established between nodes that overlap by

k - 1

k - 1 bases, and nodes along unambiguous walks in the graph are subsequently merged. The selection of k is influenced by multiple factors, and optimizing this value results in a trade-off between graph connectivity and sequence contiguity. Ideally, multiple k sizes should be used, so lower values can provide good connectivity in lesser covered regions and higher values can increase contiguity in well-covered regions. However, current approaches that use multiple k values do not address the scalability issues inherent to the assembly of large genomes. Results Here we present RResolver, a scalable algorithm that takes a short-read de Bruijn graph assembly with a starting k as input and uses a k value closer to that of the read length to resolve repeats. RResolver builds a Bloom filter of sequencing reads which is used to evaluate the assembly graph path support at branching points and removes paths with insufficient support. RResolver runs efficiently, taking only 26 min on average for an ABySS human assembly with 48 threads and 60 GiB memory. Across all experiments, compared to a baseline assembly, RResolver improves scaffold contiguity (NGA50) by up to 15% and reduces misassemblies by up to 12%. Conclusions RResolver adds a missing component to scalable de Bruijn graph genome assembly. By improving the initial and fundamental graph traversal outcome, all downstream ABySS algorithms greatly benefit by working with a more accurate and less complex representation of the genome. The RResolver code is integrated into ABySS and is available at https://github.com/bcgsc/abyss/tree/master/RResolver .Medicine, Faculty ofOther UBCMedical Genetics, Department ofReviewedFacultyResearche

PubMed Central

University of British Columbia: cIRcle - UBC's Information Repository

SPAT: Searching for Poly(A) Tails in RNA-Seq de novo Assemblies

Author: A Gordon Robertson (433215)
Aly Karsan (52070)
Anthony Raymond (433214)
Inanc Birol (277074)
Ka Ming Nip (433216)
Karen Mungall (66994)
Maayan Kreitzman (433217)
Readman Chiu (41422)
Shaun Jackman (99599)
Publication venue
Publication date
Field of study

<p>A method for detecting alternative polyadenylation in RNA-Seq libraries using de novo assembly with ABySS. It will be presented at HiTSeq and ISMB 2013 by Anthony Raymond.</p

FigShare

BioBloom tools: fast, accurate and memory-efficient host species sequence screening using bloom filters

Author: A. Gordon Robertson
Anthony Raymond
Bloom
Broder
Castellarin
Ferragina
Hamid Mohamadi
Inanç Birol
Justin Chu
Ka Ming Nip
Kostic
Li
Richard Mar
Sara Sadeghi
Shaun D. Jackman
Stranneheim
Tang
Xu
Yaron S. Butterfield
Publication venue: 'Oxford University Press (OUP)'
Publication date
Field of study

Crossref