Search CORE

116 research outputs found

Transcript quantification with RNA-Seq data

Author: A Mortazavi
G Schweikert
Gunnar Rätsch
H Jiang
Jonas Behr
M Sammeth
Regina Bohnert
Z Wang
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Motivation Novel high-throughput sequencing technologies open exciting new approaches to transcriptome profiling. Sequencing transcript populations of interest, e.g. from different tissues or variable stress conditions, with RNA sequencing (RNA-Seq) [1] generates millions of short reads. Accurately aligned to a reference genome, they provide digital counts and thus facilitate transcript quantification. As the observed read counts only provide the summation of all expressed sequences at one locus, the inference of the underlying transcript abundances is crucial for further quantitative analyses. Methods To approach this problem, we have developed a new technique, called rQuant, based on quadratic programming. Given a gene annotation and position-wise exon/intron read coverage from read alignments, we determine the abundances for each annotated transcript by minimising a suitable loss function. It penalises the deviation of the observed from the expected read coverage given the transcript weights. The observed read coverage is typically non-uniformly distributed over the transcript due to several biases in the generation of the sequencing libraries and the sequencing. This leads to distortions of the transcript abundances, if not corrected properly. We therefore extended our approach to jointly optimise transcript profiles, modeling the coverage deviations depending on the position in the transcript. Our method can be applied without knowledge of the underlying transcript abundances and equally benefits from loci with and without alternative transcripts. Results To quantitatively evaluate the quality of our abundance predictions, we used a set of simulated reads from transcripts with known expression as a benchmark set. It was generated using the Flux Simulator [2] modeling biases in RNA-Seq as well as preparation experiments. Table 1 shows preliminary results with segment- and position-based loss as well as with and without the transcript profiles. Our results indicate that the position-based modeling together with transcript profiles allows us to accurately infer the underlying expression of single transcripts as well as of multiple isoforms of one gene locus

Crossref

Springer - Publisher Connector

PubMed Central

MPG.PuRe

Complete Alternative Splicing Events Are Bubbles in Splicing Graphs

Author: Fu X.Y.
Goux-Pelletan M.
Grasso C.
Michael Sammeth
Streuli M.
Sugnet C.W.
Publication venue: 'Mary Ann Liebert Inc'
Publication date
Field of study

Crossref

Symposium on the Scottish labour market

In the post-war period, up to the late 1960s, Britain enjoyed a modicum of unemployment and government policies which were geared to producing Full Employment were considered a success. It was simple - boost demand and more people would find work. But the mid 1970s the economic regency enjoyed by those advocating demand sided policies fell into disrepute as the OPEC nations raised prices dramatically and brought in a new era of both rising prices and unemployment. The laws of economics, which previously had viewed policy decisions as the choice between lower unemployment and higher inflation were now redundant. Both unemployment and inflation were moving in the same direction. The era of stagflation had begun

University of Strathclyde Institutional Repository

UCL Discovery

Oxford University Research Archive

Leiden University Scholary Publications

MPG.PuRe

TRStalker: an efficient heuristic for finding fuzzy tandem repeats

Author: Alessio Vecchio
Ames
Benson
Benson
Boeva
Brodzik
Buchner
Burkhardt
Burkhardt
Bussey
Campuzano
de la Higuera
Dujon
Elemento
Fischetti
Gelfand
Glusman
Grissa
Gupta
Gusfield
Gusfield
Hauth
Jiang
Jurka
Kelkar
Kolpakov
Kolpakov
Kolpakov
Krishnan
Kurtz
Kurtz
Landau
Leclercq
Legendre
M. Elena Renda
Marco Pellegrini
Motwani
Mudunuri
Mulmuley
O'Dushlaine
Parisi
Peterlongo
Rivals
Rivals
Rowen
Saha
Sammeth
Sharma
Sim
Smit
Sokol
Stolovitzky
Vissers
Vogler
Warburton
Wells
Wexler
Wexler
Wooster
Publication venue: Oxford University Press
Publication date: 01/01/2010
Field of study

Motivation: Genomes in higher eukaryotic organisms contain a substantial amount of repeated sequences. Tandem Repeats (TRs) constitute a large class of repetitive sequences that are originated via phenomena such as replication slippage and are characterized by close spatial contiguity. They play an important role in several molecular regulatory mechanisms, and also in several diseases (e.g. in the group of trinucleotide repeat disorders). While for TRs with a low or medium level of divergence the current methods are rather effective, the problem of detecting TRs with higher divergence (fuzzy TRs) is still open. The detection of fuzzy TRs is propaedeutic to enriching our view of their role in regulatory mechanisms and diseases. Fuzzy TRs are also important as tools to shed light on the evolutionary history of the genome, where higher divergence correlates with more remote duplication events

CiteSeerX

Crossref

PubMed Central

Archivio della Ricerca - Università di Pisa

Evolution of Exon-Intron Structure and Alternative Splicing

Author: A von Bubnoff
AL Hughes
AM McGuire
AM Morse
BB Wang
BR Graveley
C Adami
C Ben-Dov
CE Shannon
D Brett
DA Benson
DA Wheeler
DL Black
DL Black
E Grotkopp
E Kim
EI Severing
ER Mardis
FA Kondrashov
FA Kondrashov
G Orphanides
H Nagasaki
H Nagasaki
JB Li
Juan Valcarcel
JW Valentine
K Iida
K Nishikura
Konstantin V. Krutovsky
KR Chi
LY Cui
M Deutsch
M Lynch
M McKeown
M Sammeth
MA Campbell
MR Ahuja
MW Gray
MW Gray
MY Long
NM Kopelman
S Mano
Tomasz E. Koralewski
VN Babenko
ZX Su
Publication venue: Public Library of Science
Publication date: 01/01/2011
Field of study

Despite significant advances in high-throughput DNA sequencing, many important species remain understudied at the genome level. In this study we addressed a question of what can be predicted about the genome-wide characteristics of less studied species, based on the genomic data from completely sequenced species. Using NCBI databases we performed a comparative genome-wide analysis of such characteristics as alternative splicing, number of genes, gene products and exons in 36 completely sequenced model species. We created statistical regression models to fit these data and applied them to loblolly pine (Pinus taeda L.), an example of an important species whose genome has not been completely sequenced yet. Using these models, the genome-wide characteristics, such as total number of genes and exons, can be roughly predicted based on parameters estimated from available limited genomic data, e.g. exon length and exon/gene ratio

CiteSeerX

Crossref

Directory of Open Access Journals

PubMed Central

Texas A&M Repository

Comparative Analysis of Human Protein-Coding and Noncoding RNAs between Brain and 10 Mixed Cell Lines by RNA-Seq

Author: A Huttenhofer
A Siepel
AA Aravin
AC Marques
B Langmead
B Li
Bing He
C Garofalo
C Trapnell
CA Brosnan
CC Babbitt
CZ Han
DL Black
DM Cork
E Birney
ET Wang
FF Costa
Geng Chen
I Martianov
J Feng
JE Wilusz
Jian Luo
Jürgen Brosius
K Fejes-Toth
K Hashimoto
K Laud
Kangping Yin
L Kong
L Shi
Leming Shi
M Griffith
M Guttman
M Guttman
M Ishikawa
M Mallardo
M Sammeth
M Wrage
Mingyao Liu
N Novoradovskaya
P Carninci
Peng Li
PJ French
R Klinck
R Louro
RA Gupta
S Marguerat
SW Blume
Tieliu Shi
TR Mercer
TR Mercer
U Nagalakshmi
UA Orom
V Pedraza
X Cai
Y Lee
Y Okazaki
Ya Qi
Yuanzhang Fang
Z Wang
Publication venue: Public Library of Science
Publication date: 30/11/2011
Field of study

In their expression process, different genes can generate diverse functional products, including various protein-coding or noncoding RNAs. Here, we investigated the protein-coding capacities and the expression levels of their isoforms for human known genes, the conservation and disease association of long noncoding RNAs (ncRNAs) with two transcriptome sequencing datasets from human brain tissues and 10 mixed cell lines. Comparative analysis revealed that about two-thirds of the genes expressed between brain and cell lines are the same, but less than one-third of their isoforms are identical. Besides those genes specially expressed in brain and cell lines, about 66% of genes expressed in common encoded different isoforms. Moreover, most genes dominantly expressed one isoform and some genes only generated protein-coding (or noncoding) RNAs in one sample but not in another. We found 282 human genes could encode both protein-coding and noncoding RNAs through alternative splicing in the two samples. We also identified more than 1,000 long ncRNAs, and most of those long ncRNAs contain conserved elements across either 46 vertebrates or 33 placental mammals or 10 primates. Further analysis showed that some long ncRNAs differentially expressed in human breast cancer or lung cancer, several of those differentially expressed long ncRNAs were validated by RT-PCR. In addition, those validated differentially expressed long ncRNAs were found significantly correlated with certain breast cancer or lung cancer related genes, indicating the important biological relevance between long ncRNAs and human cancers. Our findings reveal that the differences of gene expression profile between samples mainly result from the expressed gene isoforms, and highlight the importance of studying genes at the isoform level for completely illustrating the intricate transcriptome

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Sequence variation between 462 human individuals fine-tunes functional sites of RNA processing

Author: Barann Matthias
Esteve-Codina Anna
Ezquina Suzana
Ferreira Pedro G.
Friedlander Marc R.
GEUVADIS Consortium
Guigo Roderic
Lappalainen Tuuli
Oti Martin
Palotie A.
Rivas Manuel A.
Rosenstiel Philip
Sammeth Michael
Strom Tim M.
Wieland Thomas
Publication venue
Publication date: 01/01/2016
Field of study

A. Palotie on työryhmän GEUVADIS Consortium jäsen.Recent advances in the cost-efficiency of sequencing technologies enabled the combined DNA-and RNA-sequencing of human individuals at the population-scale, making genome-wide investigations of the inter-individual genetic impact on gene expression viable. Employing mRNA-sequencing data from the Geuvadis Project and genome sequencing data from the 1000 Genomes Project we show that the computational analysis of DNA sequences around splice sites and poly-A signals is able to explain several observations in the phenotype data. In contrast to widespread assessments of statistically significant associations between DNA polymorphisms and quantitative traits, we developed a computational tool to pinpoint the molecular mechanisms by which genetic markers drive variation in RNA-processing, cataloguing and classifying alleles that change the affinity of core RNA elements to their recognizing factors. The in silico models we employ further suggest RNA editing can moonlight as a splicing-modulator, albeit less frequently than genomic sequence diversity. Beyond existing annotations, we demonstrate that the ultra-high resolution of RNA-Seq combined from 462 individuals also provides evidence for thousands of bona fide novel elements of RNA processing-alternative splice sites, introns, and cleavage sites-which are often rare and lowly expressed but in other characteristics similar to their annotated counterparts.Peer reviewe

PubMed Central

UPF Digital Repository

Digital.CSIC

Diposit Digital de Documents de la UAB

Helsingin yliopiston digitaalinen arkisto

Archive ouverte UNIGE

A General Definition and Nomenclature for Alternative Splicing Events

Author: A Bhasi
AA Mironov
AJ Lopez
AR Kornblihtt
B Modrek
BT Lee
CW Smith
CW Sugnet
D Bollina
D Holste
DA Benson
DB Malko
DL Black
E Birney
E Buratti
E Coward
E Eyras
E Kim
ES Lander
F Denoeud
FA Kondrashov
G Parra
GE Crooks
H Ji
H Kaessmann
H Kim
H Nagasaki
H Pearson
HD Huang
I Listerman
IA Swinburne
J Harrow
J Takeda
JM Johnson
KD Pruitt
KL Fox-Walsh
L Collins
L Florea
L Stein
M Akerman
M Ashburner
M Burset
M Hiller
M Yandell
M Zavolan
MB Gerstein
Michael R. Brent
Michael Sammeth
MS Boguski
N Kim
N Kim
P Akiva
P Senapathy
P Sperisen
RE Breithart
Roderic Guigó
S Foissac
S Gupta
S Heber
S Stamm
S Stamm
SB Hedges
Sylvain Foissac
TJ Hubbard
TM Chern
V Le Texier
Y Xing
Y Zhou
YH Huang
Publication venue: Public Library of Science
Publication date: 01/01/2008
Field of study

Understanding the molecular mechanisms responsible for the regulation of the transcriptome present in eukaryotic cells is one of the most challenging tasks in the postgenomic era. In this regard, alternative splicing (AS) is a key phenomenon contributing to the production of different mature transcripts from the same primary RNA sequence. As a plethora of different transcript forms is available in databases, a first step to uncover the biology that drives AS is to identify the different types of reflected splicing variation. In this work, we present a general definition of the AS event along with a notation system that involves the relative positions of the splice sites. This nomenclature univocally and dynamically assigns a specific “AS code” to every possible pattern of splicing variation. On the basis of this definition and the corresponding codes, we have developed a computational tool (AStalavista) that automatically characterizes the complete landscape of AS events in a given transcript annotation of a genome, thus providing a platform to investigate the transcriptome diversity across genes, chromosomes, and species. Our analysis reveals that a substantial part—in human more than a quarter—of the observed splicing variations are ignored in common classification pipelines. We have used AStalavista to investigate and to compare the AS landscape of different reference annotation sets in human and in other metazoan species and found that proportions of AS events change substantially depending on the annotation protocol, species-specific attributes, and coding constraints acting on the transcripts. The AStalavista system therefore provides a general framework to conduct specific studies investigating the occurrence, impact, and regulation of AS

Crossref

Directory of Open Access Journals

PubMed Central

UPF Digital Repository

Secretaría de Estado de Cultura

ProdInra

List of Texts

Recommended from our members

Transcriptome and genome sequencing uncovers functional variation in humans

Author: Almlöf Jonas
Amstislavskiy Vyacheslav
Antonarakis Stylianos E
Barann Matthias
Beltran Sergi
Bertier Gabrielle
Brazma Alvis
Buermans Henk PJ
Carracedo Ángel
Dermitzakis Emmanouil T
Donnelly Peter
Esser Daniela
Estivill Xavier
Ferreira Pedro G
Flicek Paul
Friedländer Marc R
Giger Thomas
Gonzàlez-Porta Mar
Greger Liliana
Griebel Thasso
Guigó Roderic
Gut Ivo G
Gut Marta
Häsler Robert
Kahlem Katja
Karlberg Olof
Kilpinen Helena
Kurbatova Natalja
Lappalainen Tuuli
Lehrach Hans
Lek Monkol
Lizano Esther
MacArthur Daniel G
McCarthy Mark I
Meitinger Thomas
Monlong Jean
Montgomery Stephen B
Ongen Halit
Padioleau Ismael
Pirinen Matti
Pulyakhina Irina
Ribeca Paolo
Rivas Manuel A
Rosenstiel Philip
Sammeth Michael
Schreiber Stefan
Schwarzmayr Thomas
Stegle Oliver
Strom Tim M
Sudbrak Ralf
Sultan Marc
Syvänen Ann-Christine
Tikhonov Andrew
van Iterson Maarten
van Ommen Gert-Jan
Wieland Thomas
‘t Hoen Peter AC
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 11/04/2014
Field of study

Summary Genome sequencing projects are discovering millions of genetic variants in humans, and interpretation of their functional effects is essential for understanding the genetic basis of variation in human traits. Here we report sequencing and deep analysis of mRNA and miRNA from lymphoblastoid cell lines of 462 individuals from the 1000 Genomes Project – the first uniformly processed RNA-seq data from multiple human populations with high-quality genome sequences. We discovered extremely widespread genetic variation affecting regulation of the majority of genes, with transcript structure and expression level variation being equally common but genetically largely independent. Our characterization of causal regulatory variation sheds light on cellular mechanisms of regulatory and loss-of-function variation, and allowed us to infer putative causal variants for dozens of disease-associated loci. Altogether, this study provides a deep understanding of the cellular mechanisms of transcriptome variation and of the landscape of functional variants in the human genome

Harvard University - DASH

A User's Guide to the Encyclopedia of DNA Elements (ENCODE)

Author: Abdelhamid R. F.
Alioto T.
Altshuler R. C.
Antonarakis S. E.
Battenhouse A.
Batut P.
Bell I.
Bell K. G.
Bernstein B. E.
Bhinge A. A.
Birney E.
Borel C.
Boyle A. P.
Carninci P.
Chakrabortty S.
Chrast J.
Coyne M. J.
Crawford G. E.
Davis C. A.
Djebali S.
Dobin A.
Drenkow J.
Dumais E.
Dumais J.
Dunham I.
Durham T.
Epstein C. B.
Ernst J.
Fejes-Toth K.
Furey T. S.
Gao H.
Gertz J.
Gingeras T. R.
Giresi P. G.
Gordon A.
Graison E. A. Y.
Grasfeder L. L.
Guigo R.
Hannon G.
Hardison R. C.
Hayashizaki Y.
Howald C.
Issner R.
Iyer V. R.
Jha S.
Kapranov P.
Keefe D.
Kellis M.
Kent W. J.
Kheradpour P.
Kim S. K. C.
Kingswood C.
Ku M. C.
Lagarde J.
Lassmann T.
Lee B. K.
Lieb J. D.
Lin M. F.
Lin W.
Liu Z.
London D.
McDaniell R. M.
Merkel A.
Mikkelsen T. S.
Myers R. M.
Pauli F.
Poh W. T.
Preall J.
Reddy T. E.
Reymond A.
Ribeca P.
Ruan X. A.
Ruan Y. J.
Sammeth M.
Schlesinger F.
Shahab A.
Sheffield N. C.
Shestak C.
Shibata Y.
Shoresh N.
Showers K. A.
Snyder M.
Song L. Y.
Sotirova V.
Stamatoyannopoulos J.
Takahashi H.
Tilgner H.
Truong T.
Ucla C.
Vales T.
Wang H. E.
Wang L.
Wang T. Y.
Ward L. D.
Wei C. L.
Winter D.
Wold B.
Zaleski C.
Zhang X. L.
Zhang Z. C.
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/09/2010
Field of study

The mission of the Encyclopedia of DNA Elements (ENCODE) Project is to enable the scientific and medical communities to interpret the human genome sequence and apply it to understand human biology and improve health. The ENCODE Consortium is integrating multiple technologies and approaches in a collective effort to discover and define the functional elements encoded in the human genome, including genes, transcripts, and transcriptional regulatory regions, together with their attendant chromatin states and DNA methylation patterns. In the process, standards to ensure high-quality data have been implemented, and novel algorithms have been developed to facilitate analysis. Data and derived results are made available through a freely accessible database. Here we provide an overview of the project and the resources it is generating and illustrate the application of ENCODE data to interpret the human genome.National Human Genome Research Institute (U.S.)National Institutes of Health (U.S.

DSpace@MIT

Cold Spring Harbor Laboratory Institutional Repository

Serveur académique lausannois

HAL-Inserm

Directory of Open Access Journals

HAL Descartes

Carolina Digital Repository

eScholarship - University of California

UPF Digital Repository

King's Research Portal

Archive ouverte UNIGE