Search CORE

216 research outputs found

RNA-Seq Mapping and Detection of Gene Fusions with a Suffix Array Algorithm

Author: A Ameur
A McPherson
A Mortazavi
A Sboner
Asim S. Siddiqui
B Li
Benjamin S. Kong
BJ Druker
BP Lewis
BP Rubin
C Adem
C Kumar-Sinha
C Lin
C Tognon
C Trapnell
C Trapnell
CA Maher
CA Maher
CA Westbrook
Catalin Barbacioru
Chieh-Yuan Li
D Zerbino
EL Kwak
ET Wang
F De Bona
F Denoeud
F Ozsolak
F Tang
Fiona C. Hyland
G Robertson
H Edgren
Heinz Breu
I Birol
J Wang
JD Rowley
Jeffrey K. Ichikawa
Jian Gu
Joel P. Brockman
John P. Bodeau
JP Koivunen
K Inaki
K Kannan
Kelli S. Bramlett
KF Au
KJ McKernan
KS Kosik
L Shi
Liviu Popescu
M Guttman
M Kinsella
M Krzywinski
M Nicolae
M Persson
M Yassour
Matthew W. Muller
MC Haffner
MF Berger
Milan Radovich
N Cloonan
N Cloonan
N Palanisamy
Nriti Garg
O Monni
OA Hampton
Onur Sakarya
P Shepherd
Paolo Vatta
Penn P. Whitley
RD Canales
Robert C. Nutter
S Perner
SA Tomlins
SG O'Brien
Sowmi Utiramerur
SR Knezevich
U Manber
U Nagalakshmi
Vidya Kudlingar
Weixiong Zhang
Y Hu
Y Surget-Groba
Yongzhi Chen
Yulei N. Wang
YW Asmann
Z Wang
Zheng Zhang
Publication venue: Public Library of Science
Publication date: 01/01/2012
Field of study

High-throughput RNA sequencing enables quantification of transcripts (both known and novel), exon/exon junctions and fusions of exons from different genes. Discovery of gene fusions–particularly those expressed with low abundance– is a challenge with short- and medium-length sequencing reads. To address this challenge, we implemented an RNA-Seq mapping pipeline within the LifeScope software. We introduced new features including filter and junction mapping, annotation-aided pairing rescue and accurate mapping quality values. We combined this pipeline with a Suffix Array Spliced Read (SASR) aligner to detect chimeric transcripts. Performing paired-end RNA-Seq of the breast cancer cell line MCF-7 using the SOLiD system, we called 40 gene fusions among over 120,000 splicing junctions. We validated 36 of these 40 fusions with TaqMan assays, of which 25 were expressed in MCF-7 but not the Human Brain Reference. An intra-chromosomal gene fusion involving the estrogen receptor alpha gene ESR1, and another involving the RPS6KB1 (Ribosomal protein S6 kinase beta-1) were recurrently expressed in a number of breast tumor cell lines and a clinical tumor sample

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

NOVEL COMPUTATIONAL METHODS FOR SEQUENCING DATA ANALYSIS: MAPPING, QUERY, AND CLASSIFICATION

Author: Liu Xinan
Publication venue: UKnowledge
Publication date: 01/01/2018
Field of study

Over the past decade, the evolution of next-generation sequencing technology has considerably advanced the genomics research. As a consequence, fast and accurate computational methods are needed for analyzing the large data in different applications. The research presented in this dissertation focuses on three areas: RNA-seq read mapping, large-scale data query, and metagenomics sequence classification. A critical step of RNA-seq data analysis is to map the RNA-seq reads onto a reference genome. This dissertation presents a novel splice alignment tool, MapSplice3. It achieves high read alignment and base mapping yields and is able to detect splice junctions, gene fusions, and circular RNAs comprehensively at the same time. Based on MapSplice3, we further extend a novel lightweight approach called iMapSplice that enables personalized mRNA transcriptional profiling. As huge amount of RNA-seq has been shared through public datasets, it provides invaluable resources for researchers to test hypotheses by reusing existing datasets. To meet the needs of efficiently querying large-scale sequencing data, a novel method, called SeqOthello, has been developed. It is able to efficiently query sequence k-mers against large-scale datasets and finally determines the existence of the given sequence. Metagenomics studies often generate tens of millions of reads to capture the presence of microbial organisms. Thus efficient and accurate algorithms are in high demand. In this dissertation, we introduce MetaOthello, a probabilistic hashing classifier for metagenomic sequences. It supports efficient query of a taxon using its k-mer signatures

University of Kentucky

INTEGRATE: Gene fusion discovery using whole genome and transcriptome data

Author: Fulton Robert S
Maher Christopher A
Schmidt Heather K
Tomlinson Chad
Warren Wesley C
White Nicole M
Wilson Richard K
Zhang Jin
Publication venue: Digital Commons@Becker
Publication date: 01/01/2016
Field of study

While next-generation sequencing (NGS) has become the primary technology for discovering gene fusions, we are still faced with the challenge of ensuring that causative mutations are not missed while minimizing false positives. Currently, there are many computational tools that predict structural variations (SV) and gene fusions using whole genome (WGS) and transcriptome sequencing (RNA-seq) data separately. However, as both WGS and RNA-seq have their limitations when used independently, we hypothesize that the orthogonal validation from integrating both data could generate a sensitive and specific approach for detecting high-confidence gene fusion predictions. Fortunately, decreasing NGS costs have resulted in a growing quantity of patients with both data available. Therefore, we developed a gene fusion discovery tool, INTEGRATE, that leverages both RNA-seq and WGS data to reconstruct gene fusion junctions and genomic breakpoints by split-read mapping. To evaluate INTEGRATE, we compared it with eight additional gene fusion discovery tools using the well-characterized breast cell line HCC1395 and peripheral blood lymphocytes derived from the same patient (HCC1395BL). The predictions subsequently underwent a targeted validation leading to the discovery of 131 novel fusions in addition to the seven previously reported fusions. Overall, INTEGRATE only missed six out of the 138 validated fusions and had the highest accuracy of the nine tools evaluated. Additionally, we applied INTEGRATE to 62 breast cancer patients from The Cancer Genome Atlas (TCGA) and found multiple recurrent gene fusions including a subset involving estrogen receptor. Taken together, INTEGRATE is a highly sensitive and accurate tool that is freely available for academic use

Digital Commons@Becker

PubMed Central

Technology dictates algorithms: Recent developments in read alignment

Author: Alkan Can
Alser Mohammed
Balliu Brunilda
Deshpande Dhrithi
Icer Baykal Pelin
Knyazev Sergey
Koslicki David
Mangul Serghei
Mutlu Onur
Rotman Jeremy
Shi Huwenbo
Singer Benjamin D.
Skums Pavel
Taraszka Kodi
Xue Victor
Yang Harry T.
Zelikovsky Alex
Publication venue
Publication date: 09/07/2020
Field of study

Massively parallel sequencing techniques have revolutionized biological and medical sciences by providing unprecedented insight into the genomes of humans, animals, and microbes. Modern sequencing platforms generate enormous amounts of genomic data in the form of nucleotide sequences or reads. Aligning reads onto reference genomes enables the identification of individual-specific genetic variants and is an essential step of the majority of genomic analysis pipelines. Aligned reads are essential for answering important biological questions, such as detecting mutations driving various human diseases and complex traits as well as identifying species present in metagenomic samples. The read alignment problem is extremely challenging due to the large size of analyzed datasets and numerous technological limitations of sequencing platforms, and researchers have developed novel bioinformatics algorithms to tackle these difficulties. Importantly, computational algorithms have evolved and diversified in accordance with technological advances, leading to todays diverse array of bioinformatics tools. Our review provides a survey of algorithmic foundations and methodologies across 107 alignment methods published between 1988 and 2020, for both short and long reads. We provide rigorous experimental evaluation of 11 read aligners to demonstrate the effect of these underlying algorithms on speed and efficiency of read aligners. We separately discuss how longer read lengths produce unique advantages and limitations to read alignment techniques. We also discuss how general alignment algorithms have been tailored to the specific needs of various domains in biology, including whole transcriptome, adaptive immune repertoire, and human microbiome studies

arXiv.org e-Print Archive

Repository for Publications and Research Data

Directory of Open Access Journals

iMapSplice: Alleviating Reference Bias Through Personalized RNA-seq Alignment

Author: Liu Jinze
Liu Xinan
MacLeod James N.
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 10/08/2018
Field of study

Genomic variants in both coding and non-coding sequences can have functionally important and sometimes deleterious effects on exon splicing of gene transcripts. For transcriptome profiling using RNA-seq, the accurate alignment of reads across exon junctions is a critical step. Existing algorithms that utilize a standard reference genome as a template sometimes have difficulty in mapping reads that carry genomic variants. These problems can lead to allelic ratio biases and the failure to detect splice variants created by splice site polymorphisms. To improve RNA-seq read alignment, we have developed a novel approach called iMapSplice that enables personalized mRNA transcriptome profiling. The algorithm makes use of personal genomic information and performs an unbiased alignment towards genome indices carrying both reference and alternative bases. Importantly, this breaks the dependency on reference genome splice site dinucleotide motifs and enables iMapSplice to discover personal splice junctions created through splice site polymorphisms. We report comparative analyses using a number of simulated and real datasets. Besides general improvements in read alignment and splice junction discovery, iMapSplice greatly alleviates allelic ratio biases and unravels many previously uncharacterized splice junctions created by splice site polymorphisms, with minimal overhead in computation time and storage. Software download URL: https://github.com/LiuBioinfo/iMapSplice

University of Kentucky

Novel graph based algorithms for transcriptome sequence analysis

Author: Durai Dilip
Publication venue: 'Walter de Gruyter GmbH'
Publication date: 01/01/2020
Field of study

RNA-sequencing (RNA-seq) is one of the most-widely used techniques in molecular biology. A key bioinformatics task in any RNA-seq workflow is the assembling the reads. As the size of transcriptomics data sets is constantly increasing, scalable and accurate assembly approaches have to be developed.Here, we propose several approaches to improve assembling of RNA-seq data generated by second-generation sequencing technologies. We demonstrated that the systematic removal of irrelevant reads from a high coverage dataset prior to assembly, reduces runtime and improves the quality of the assembly. Further, we propose a novel RNA-seq assembly work- flow comprised of read error correction, normalization, assembly with informed parameter selection and transcript-level expression computation. In recent years, the popularity of third-generation sequencing technologies in- creased as long reads allow for accurate isoform quantification and gene-fusion detection, which is essential for biomedical research. We present a sequence-to-graph alignment method to detect and to quantify transcripts for third-generation sequencing data. Also, we propose the first gene-fusion prediction tool which is specifically tailored towards long-read data and hence achieves accurate expression estimation even on complex data sets. Moreover, our method predicted experimentally verified fusion events along with some novel events, which can be validated in the future

Universaar

Acronym

MPG.PuRe

Analýza genové exprese na subgenové úrovni

Author: Kloda František
Publication venue: Univerzita Karlova, Přírodovědecká fakulta
Publication date: 01/01/2023
Field of study

RNA sekvenování nám umožňuje zkoumat expresi jednotlivých genů v buňkách. Vzniklá data je možné interpretovat na více úrovních, kde každá úroveň poskytuje rozdílný typ informace. Kromě měření exprese celých genů je možné kvantifikovat expresi jednolivých exonů, nebo transkriptů (isoforem genů), což umožňuje podrobnější stadium regulačních mechanizmů. Hlavní rozdíl mezi přístupy je při určování původu krátkých readů. Tento krok je především složitější při analýze exprese jednotlivých transkriptů kvůli velké míře sekvenční podobnosti mezi transkripty pocházejícími ze stejného genu. V této práci jsme popsali jedenáct nástrojů pro analýzu exprese na subgenové úrovni a pro porovnání jsme tři z těchto nástrojů spustili na reálných pacientských datech. Výsledky poskytnuté všemi třemi nástroji byli velmi podobné, nejvýraznější rozdíl byl v čase analýzy.RNA sequencing allows investigation of expression of singular genes in cells. It is possible to interpret the arisen data on multiple levels, each level providing a different type of information. Apart from measuring expression of whole genes, it is possible to quantify expression of singular exons, or transcripts (gene isoforms), which allows more detailed study of regulatory mechanisms. The main difference between approaches is in determining the origin of short reads. This step is significantly more complex in analysis of expression of transcripts, as transcripts derived from the same gene have typically larger rate of sequential similarity. In this thesis, we describe eleven tools for subgene level expression analysis a as comparison we have tested three of these tools on real patient data. The results provided by all three tools proved to be very similar with the greatest difference being the time needed for the analysis.Katedra buněčné biologieDepartment of Cell BiologyPřírodovědecká fakultaFaculty of Scienc

CU Digital Repository

Identifying the oncogenic potential of gene fusions exploiting miRNAs

Author: Elisa Ficarra
Marilisa Montemurro
Marta Lovino
Venere Sabrina Barrese
Publication venue: 'Elsevier BV'
Publication date: 01/01/2022
Field of study

It is estimated that oncogenic gene fusions cause about 20% of human cancer morbidity. Identifying potentially oncogenic gene fusions may improve affected patients’ diagnosis and treatment. Previous approaches to this issue included exploiting specific gene-related information, such as gene function and regulation. Here we propose a model that profits from the previous findings and includes the microRNAs in the oncogenic assessment. We present ChimerDriver, a tool to classify gene fusions as oncogenic or not oncogenic. ChimerDriver is based on a specifically designed neural network and trained on genetic and post-transcriptional information to obtain a reliable classification. The designed neural network integrates information related to transcription factors, gene ontologies, micro RNAs and other detailed information related to the functions of the genes involved in the fusion and the gene fusion structure. As a result, the performances on the test set reached 0.83 f1-score and 96% recall. The com parison with state-of-the-art tools returned comparable or higher results. Moreover, ChimerDriver performed well in a real-world case where 21 out of 24 validated gene fusion samples were detected by the gene fusion detection tool Starfusion. ChimerDriver integrates transcriptional and post-transcriptional information in an ad-hoc designed neural network to effectively discriminate oncogenic gene fusions from passenger ones. ChimerDriver source code is freely available at https://github.com/martalovino/ChimerDriver

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

Identifying the oncogenic potential of gene fusions exploiting miRNAs

Author: Barrese V. S.
Ficarra E.
Lovino M.
Montemurro M.
Publication venue: 'Elsevier BV'
Publication date: 01/01/2022
Field of study

It is estimated that oncogenic gene fusions cause about 20% of human cancer morbidity. Identifying potentially oncogenic gene fusions may improve affected patients’ diagnosis and treatment. Previous approaches to this issue included exploiting specific gene-related information, such as gene function and regulation. Here we propose a model that profits from the previous findings and includes the microRNAs in the oncogenic assessment. We present ChimerDriver, a tool to classify gene fusions as oncogenic or not oncogenic. ChimerDriver is based on a specifically designed neural network and trained on genetic and post-transcriptional information to obtain a reliable classification. The designed neural network integrates information related to transcription factors, gene ontologies, microRNAs and other detailed information related to the functions of the genes involved in the fusion and the gene fusion structure. As a result, the performances on the test set reached 0.83 f1-score and 96% recall. The comparison with state-of-the-art tools returned comparable or higher results. Moreover, ChimerDriver performed well in a real-world case where 21 out of 24 validated gene fusion samples were detected by the gene fusion detection tool Starfusion. ChimerDriver integrates transcriptional and post-transcriptional information in an ad-hoc designed neural network to effectively discriminate oncogenic gene fusions from passenger ones. ChimerDriver source code is freely available at https://github.com/martalovino/ChimerDriver

Archivio istituzionale della ricerca - Università di Modena e Reggio Emilia

Recommended from our members

Computational solutions for omics data

Author: A Butte
A Chatr-aryamontri
A Franceschini
A Joshi
A Lan
A Mortazavi
A Subramanian
A Tanay
AC Jungkamp
AJ Pinho
AK Wong
AR Whitney
B Langmead
B Langmead
B Paten
Bonnie Berger
BP Kelley
C Huttenhower
C Kingsford
C Trapnell
C Trapnell
C Trapnell
C Wang
CH Yeang
CJ Vaske
CS Liao
D Croft
D Earl
D Kim
D Kim
D Park
DB Allison
DB Jaffe
DR Zerbino
E Banks
E Banks
E Cerami
E Nabieva
E Segal
E Yeger-Lotem
EJ Rossin
ER Mardis
ES Lander
ET Wang
F Hach
F Hach
F Markowetz
F Ozsolak
F Vandin
F Vandin
F Vezzi
GE Zinman
H Li
H Li
I Ulitsky
I Ulitsky
IA Adzhubei
J Butler
J Clarke
J Flannick
J Goecks
J Lamb
J Pandey
JC Marioni
JC Venter
Jian Peng
JT Dudley
JT Leek
JT Simpson
JT Simpson
K Rhrissorrakrai
KI Goh
KY Yeung
L Parts
LD Stein
LH Hartwell
LM Heiser
LR Meyer
M Ascano
M Burrows
M Garber
M Gross
M Gstaiger
M Hafner
M Hsi-Yang Fritz
M Kircher
M Koyuturk
M Narayanan
M Reich
M Schatz
M Schmid
M Sirota
M Steffen
M Yandell
MB Gerstein
MB Gerstein
MC Brandon
MC Schatz
MG Grabherr
MH Maathuis
ML Metzker
Mona Singh
N Atias
N de Souza
N Tuncbag
NP Palmer
NT Ingolia
O Hirose
O Litvin
O Ogasawara
O Stegle
O Vanunu
P Ferragina
P Flicek
P Jiang
P Kumar
P Lu
P Shannon
PA Pevzner
PE Compeau
PG Doyle
PO Brown
PR Loh
PR Schmid
R Colak
R Gaujoux
R Li
R Li
R Li
R Singh
RC Gentleman
S Anders
S Batzoglou
S Christley
S Deorowicz
S Erten
S Kohler
S Levy
S Navlakha
S Ng
S Suthram
SA Chowdhury
SD Kahn
SF Altschul
SG Tringe
SL Salzberg
SS Huang
SS Shen-Orr
T Barrett
T Ideker
T Michoel
TS Furey
U Manber
UD Akavia
W Ali
W Li
W Tembe
WJ Kent
X Liu
X Wang
X Zhou
Y Prat
Y Wang
Y Zhang
YA Kim
Z Tu
Z Wang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/04/2013
Field of study

High-throughput experimental technologies are generating increasingly massive and complex genomic data sets. The sheer enormity and heterogeneity of these data threaten to make the arising problems computationally infeasible. Fortunately, powerful algorithmic techniques lead to software that can answer important biomedical questions in practice. In this Review, we sample the algorithmic landscape, focusing on state-of-the-art techniques, the understanding of which will aid the bench biologist in analysing omics data. We spotlight specific examples that have facilitated and enriched analyses of sequence, transcriptomic and network data sets.National Institutes of Health (U.S.) (Grant GM081871

Princeton University Open Access Repository

DSpace@MIT

Crossref

PubMed Central