Search CORE

Repositori d'Objectes Digitals per a l'Ensenyament la Recerca i la Cultura

Digital.CSIC

Serveur académique lausannois

Recommended from our members

Assessment of transcript reconstruction methods for RNA-seq

Author: Abril JF
Bertone P
Engström PG
Guigó R
Harrow J
Hubbard TJ
Kokocinski F
Steijger T
Publication venue: Nature Methods
Publication date: 29/11/2018
Field of study

We evaluated 25 protocol variants of 14 independent computational methods for exon identification, transcript reconstruction and expression-level quantification from RNA-seq data. Our results show that most algorithms are able to identify discrete transcript components with high success rates but that assembly of complete isoform structures poses a major challenge even when all constituent elements are identified. Expression-level estimates also varied widely across methods, even when based on similar transcript models. Consequently, the complexity of higher eukaryotic genomes imposes severe limitations on transcript recall and splice product discrimination that are likely to remain limiting factors for the analysis of current-generation RNA-seq data.This work was supported by European Molecular Biology Laboratory, US National Institutes of Health/NHGRI grants U54HG004555 and U54HG004557, Wellcome Trust grant WT098051, and grants BIO2011-26205 and CSD2007-00050 from the Ministerio de Educación y Ciencia

Apollo (Cambridge)

The origins, evolution, and functional potential of alternative splicing in vertebrates.

Author: Alioto T.
Derrien T.
Fernandez-Banet J.
Frankish A.
Guigó R.
Harrow J.
Howald C.
Hubbard T.
Mudge J.M.
Reymond A.
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2011
Field of study

Alternative splicing (AS) has the potential to greatly expand the functional repertoire of mammalian transcriptomes. However, few variant transcripts have been characterized functionally, making it difficult to assess the contribution of AS to the generation of phenotypic complexity and to study the evolution of splicing patterns. We have compared the AS of 309 protein-coding genes in the human ENCODE pilot regions against their mouse orthologs in unprecedented detail, utilizing traditional transcriptomic and RNAseq data. The conservation status of every transcript has been investigated, and each functionally categorized as coding (separated into coding sequence [CDS] or nonsense-mediated decay [NMD] linked) or noncoding. In total, 36.7% of human and 19.3% of mouse coding transcripts are species specific, and we observe a 3.6 times excess of human NMD transcripts compared with mouse; in contrast to previous studies, the majority of species-specific AS is unlinked to transposable elements. We observe one conserved CDS variant and one conserved NMD variant per 2.3 and 11.4 genes, respectively. Subsequently, we identify and characterize equivalent AS patterns for 22.9% of these CDS or NMD-linked events in nonmammalian vertebrate genomes, and our data indicate that functional NMD-linked AS is more widespread and ancient than previously thought. Furthermore, although we observe an association between conserved AS and elevated sequence conservation, as previously reported, we emphasize that 30% of conserved AS exons display sequence conservation below the average score for constitutive exons. In conclusion, we demonstrate the value of detailed comparative annotation in generating a comprehensive set of AS transcripts, increasing our understanding of AS evolution in vertebrates. Our data supports a model whereby the acquisition of functional AS has occurred throughout vertebrate evolution and is considered alongside amino acid change as a key mechanism in gene evolution

Recommended from our members

Systematic evaluation of spliced alignment programs for RNA-seq data

Author: Bertone P
Engström PG
Goldman N
Grant GR
Guigó R
Harrow J
Hubbard TJ
Kahles A
Rätsch G
Sipos B
Steijger T
Publication venue: Nature Methods
Publication date: 15/01/2018
Field of study

High-throughput RNA sequencing is an increasingly accessible method for studying gene structure and activity on a genome-wide scale. A critical step in RNA-seq data analysis is the alignment of partial transcript reads to a reference genome sequence. To assess the performance of current mapping software, we invited developers of RNA-seq aligners to process four large human and mouse RNA-seq data sets. In total, we compared 26 mapping protocols based on 11 programs and pipelines and found major performance differences between methods on numerous benchmarks, including alignment yield, basewise accuracy, mismatch and gap placement, exon junction discovery and suitability of alignments for transcript reconstruction. We observed concordant results on real and simulated RNA-seq data, confirming the relevance of the metrics employed. Future developments in RNA-seq alignment methods would benefit from improved placement of multimapped reads, balanced utilization of existing gene annotation and a reduced false discovery rate for splice junctions

Apollo (Cambridge)

The effects of death and post-mortem cold ischemia on human tissue transcriptomes

Author: Aguet François
Amador Raziel
Amadoz Alicia
Ardlie Kristin G.
Breschi Alessandra, 1988-
Carbonell-Caballero Jose
Curado Joao
Dopazo Joaquín
Ferreira Pedro G.
Godinho Caio P. Sá
Guigó Serra Roderic
Hidalgo Marta R.
Muñoz-Aguirre Manuel
Nurtdinov Ramil
Oliveira Carla
Oliveira Patrícia
Pervouchine Dmitri D.
Reverter Ferran
Sammeth Michael
Sodaei Reza, 1988-
Sousa Abel
Çubut Cankut
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2018
Field of study

Post-mortem tissues samples are a key resource for investigating patterns of gene expression. However, the processes triggered by death and the post-mortem interval (PMI) can significantly alter physiologically normal RNA levels. We investigate the impact of PMI on gene expression using data from multiple tissues of post-mortem donors obtained from the GTEx project. We find that many genes change expression over relatively short PMIs in a tissue-specific manner, but this potentially confounding effect in a biological analysis can be minimized by taking into account appropriate covariates. By comparing ante- and post-mortem blood samples, we identify the cascade of transcriptional events triggered by death of the organism. These events do not appear to simply reflect stochastic variation resulting from mRNA degradation, but active and ongoing regulation of transcription. Finally, we develop a model to predict the time since death from the analysis of the transcriptome of a few readily accessible tissues.Peer ReviewedPostprint (published version

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Directory of Open Access Journals

Queen Mary Research Online

Repositório Aberto da Universidade do Porto

Fondo Bibliográfico Digital Institucional

Comparison of mouse and human genomes followed by experimental verification yields an estimated 1,019 additional genes.

Author: Abril Ferrando Josep Francesc, 1970-
Agarwal Pankaj
Antonarakis Stylianos E
Brent Michael R.
Dermitzakis E.T.
Guigó Roderic
Keibler Evan
Lyle Robert
Parra Genís
Ponting Chris P
Reymond Alexandre
Ucla Catherine
Publication venue: 'Proceedings of the National Academy of Sciences'
Publication date: 26/01/2023
Field of study

A primary motivation for sequencing the mouse genome was to accelerate the discovery of mammalian genes by using sequence conservation between mouse and human to identify coding exons. Achieving this goal proved challenging because of the large proportion of the mouse and human genomes that is apparently conserved but apparently does not code for protein. We developed a two-stage procedure that exploits the mouse and human genome sequences to produce a set of genes with a much higher rate of experimental verification than previously reported prediction methods. RT-PCR amplification and direct sequencing applied to an initial sample of mouse predictions that do not overlap previously known genes verified the regions flanking one intron in 139 predictions, with verification rates reaching 76%. On average, the confirmed predictions show more restricted expression patterns than the mouse orthologs of known human genes, and two-thirds lack homologs in fish genomes, demonstrating the sensitivity of this dual-genome approach to hard-to-find genes. We verified 112 previously unknown homologs of known proteins, including two homeobox proteins relevant to developmental biology, an aquaporin, and a homolog of dystrophin. We estimate that transcription and splicing can be verified for >1,000 gene predictions identified by this method that do not overlap known genes. This is likely to constitute a significant fraction of the previously unknown, multiexon mammalian genes

Diposit Digital de la Universitat de Barcelona

From identification to validation to gene count

Author: Aken Bronwen
Amid Clara
Carninci Piero
Ezkurdia Iakes
Frankish Adam
Gilbert James
Gingeras Thomas R.
Guigó Serra Roderic
Harrow Jennifer
HAVANA
Hubbard Tim J.
Kokocinski Felix
Searle Stephen
Tress Michael
White Simon
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

The current GENCODE gene count of ~ 30,000, including 21,727 protein-coding and 8,483 RNA genes, is significantly lower than the 100,000 genes anticipated by early estimates. Accurate annotation of protein-coding and non-coding genes and pseudogenes is essential in calculating the true gene count and gaining insight into human evolution. As part of the GENCODE Consortium, the HAVANA team produces high quality manual gene annotation, which forms the basis for the reference gene set being used by the ENCODE project and provides a rich annotation of alternative splice variants and assignment of functional potential. However, the protein-coding potential of some splice variants is uncertain and valid splice variants can remain unannotated if they are absent from current cDNA libraries. Recent technological developments in sequencing and mass spectrometry have created a vast amount of new transcript and protein data that facilitate the identification and validation of new and existing transcripts, while harboring their own limitations and problems

Cold Spring Harbor Laboratory Institutional Repository

Institutional Repository of the Freie Universität Berlin

King's Research Portal

A novel and well-defined benchmarking method for second generation read mapping

Author: A Döring
A Valouev
Anne-Katrin Emde
B Langmead
C Alkan
C Amid
D Weese
DA Wheeler
David Weese
DR Bentley
ER Mardis
G Myers
G Navarro
H Li
J Deng
J Dohm
J Qin
KJ McKernan
Knut Reinert
M Holtgrewe
Manuel Holtgrewe
P Sanders
R Guigó
R Li
SB Ng
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Background Second generation sequencing technologies yield DNA sequence data at ultra high-throughput. Common to most biological applications is a mapping of the reads to an almost identical or highly similar reference genome. The assessment of the quality of read mapping results is not straightforward and has not been formalized so far. Hence, it has not been easy to compare different read mapping approaches in a unified way and to determine which program is the best for what task. Results We present a new benchmark method, called Rabema (Read Alignment BEnchMArk), for read mappers. It consists of a strict definition of the read mapping problem and of tools to evaluate the result of arbitrary read mappers supporting the SAM output format. Conclusions We show the usefulness of the benchmark program by performing a comparison of popular read mappers. The tools supporting the benchmark are licensed under the GPL and available from http://www.seqan.de/projects/rabema.html

Springer - Publisher Connector

Repository: Freie Universität Berlin (FU), Math Department (fu_mi_publications)

Cold Spring Harbor Laboratory Institutional Repository

EGASP: the human ENCODE Genome Annotation Assessment Project

Author: Abril Ferrando Josep Francesc, 1970-
Antonarakis Stylianos E.
Ashburner Michael
Bajic Vladimir B.
Birney Ewan
Castelo Robert
Denoeud France
Eyras Eduardo
Flicek Paul
Gingeras Thomas R.
Guigó Serra Roderic
Harrow Jennifer
Hubbard Tim
Lagarde Julien
Lewis Suzanna E.
Reese Martin G.
Reymond Alexandre
Ucla Catherine
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2006
Field of study

Background: Non-long terminal repeat (non-LTR) retrotransposons have contributed to shaping the structure and function of genomes. In silico and experimental approaches have been used to identify the non-LTR elements of the urochordate Ciona intestinalis. Knowledge of the types and abundance of non-LTR elements in urochordates is a key step in understanding their contribution to the structure and function of vertebrate genomes. Results: Consensus elements phylogenetically related to the I, LINE1, LINE2, LOA and R2 elements of the 14 eukaryotic non-LTR clades are described from C. intestinalis. The ascidian elements showed conservation of both the reverse transcriptase coding sequence and the overall structural organization seen in each clade. The apurinic/apyrimidinic endonuclease and nucleic-acid-binding domains encoded upstream of the reverse transcriptase, and the RNase H and the restriction enzyme-like endonuclease motifs encoded downstream of the reverse transcriptase were identified in the corresponding Ciona families. Conclusions: The genome of C. intestinalis harbors representatives of at least five clades of non-LTR retrotransposons. The copy number per haploid genome of each element is low, less than 100, far below the values reported for vertebrate counterparts but within the range for protostomes. Genomic and sequence analysis shows that the ascidian non-LTR elements are unmethylated and flanked by genomic segments with a gene density lower than average for the genome. The analysis provides valuable data for understanding the evolution of early chordate genomes and enlarges the view on the distribution of the non-LTR retrotransposons in eukaryotes

CiteSeerX

Serveur académique lausannois

Secretaría de Estado de Cultura

King's Research Portal

Diposit Digital de la Universitat de Barcelona

Archive ouverte UNIGE

Using ESTs to improve the accuracy of de novo gene prediction

Author: A Krogh
AA Salamov
AC Siepel
C Wei
Chaochun Wei
DR Maglott
E Birney
I Korf
JE Allen
JE Allen
KD Pruitt
KD Pruitt
KL Howe
L Stein
LW Hillier
M Stanke
MG Reese
Michael R Brent
MJ van Baren
MR Brent
MS Boguski
P Flicek
R Guigo
R Guigó
R Mott
RA Gibbs
RH Brown
RH Waterston
S Foissac
SS Gross
The MGC Project Team
TW Harris
TW Harris
VV Solovyev
WJ Kent
Publication venue: BioMed Central
Publication date: 01/07/2006
Field of study

BACKGROUND: ESTs are a tremendous resource for determining the exon-intron structures of genes, but even extensive EST sequencing tends to leave many exons and genes untouched. Gene prediction systems based exclusively on EST alignments miss these exons and genes, leading to poor sensitivity. De novo gene prediction systems, which ignore ESTs in favor of genomic sequence, can predict such "untouched" exons, but they are less accurate when predicting exons to which ESTs align. TWINSCAN is the most accurate de novo gene finder available for nematodes and N-SCAN is the most accurate for mammals, as measured by exact CDS gene prediction and exact exon prediction. RESULTS: TWINSCAN_EST is a new system that successfully combines EST alignments with TWINSCAN. On the whole C. elegans genome TWINSCAN_EST shows 14% improvement in sensitivity and 13% in specificity in predicting exact gene structures compared to TWINSCAN without EST alignments. Not only are the structures revealed by EST alignments predicted correctly, but these also constrain the predictions without alignments, improving their accuracy. For the human genome, we used the same approach with N-SCAN, creating N-SCAN_EST. On the whole genome, N-SCAN_EST produced a 6% improvement in sensitivity and 1% in specificity of exact gene structure predictions compared to N-SCAN. CONCLUSION: TWINSCAN_EST and N-SCAN_EST are more accurate than TWINSCAN and N-SCAN, while retaining their ability to discover novel genes to which no ESTs align. Thus, we recommend using the EST versions of these programs to annotate any genome for which EST information is available. TWINSCAN_EST and N-SCAN_EST are part of the TWINSCAN open source software package

Springer - Publisher Connector

Directory of Open Access Journals