Search CORE

79 research outputs found

PhyloCSF: a comparative genomics method to distinguish protein-coding and non-coding regions

Author: Arvestad
Blanchette
Brent
Butler
Clark
Goldman
Guttman
Guttman
Holmes
I. Jungreis
Kellis
Lin
M. F. Lin
M. Kellis
Ota
Ozsolak
Stark
Whelan
Yang
Publication venue
Publication date: 17/08/2010
Field of study

As high-throughput transcriptome sequencing provides evidence for novel transcripts in many species, there is a renewed need for accurate methods to classify small genomic regions as protein-coding or non-coding. We present PhyloCSF, a novel comparative genomics method that analyzes a multi-species nucleotide sequence alignment to determine whether it is likely to represent a conserved protein-coding region, based on a formal statistical comparison of phylogenetic codon models. We show that PhyloCSF's classification performance in 12-species _Drosophila_ genome alignments exceeds all other methods we compared in a previous study, and we provide a software implementation for use by the community. We anticipate that this method will be widely applicable as the transcriptomes of many additional species, tissues, and subcellular compartments are sequenced, particularly in the context of ENCODE and modENCODE

A gene expression atlas of embryonic neurogenesis in Drosophila reveals complex spatiotemporal regulation of lncRNAs

Author: Abreu R.L.P.
Irizarry R.A.
Jungreis I.
Kellis M.
McCorkindale A.L.
Menzel P.
Meyer I.M.
Shukla C.J.
Wahle P.
Werner S.
Zinzen R.P.
Publication venue: 'The Company of Biologists'
Publication date: 28/03/2019
Field of study

Cell type specification during early nervous system development in Drosophila melanogaster requires precise regulation of gene expression in time and space. Resolving the programs driving neurogenesis has been a major challenge owing to the complexity and rapidity with which distinct cell populations arise. To resolve the cell type-specific gene expression dynamics in early nervous system development, we have sequenced the transcriptomes of purified neurogenic cell types across consecutive time points covering crucial events in neurogenesis. The resulting gene expression atlas comprises a detailed resource of global transcriptome dynamics that permits systematic analysis of how cells in the nervous system acquire distinct fates. We resolve known gene expression dynamics and uncover novel expression signatures for hundreds of genes among diverse neurogenic cell types, most of which remain unstudied. We also identified a set of conserved long noncoding RNAs (lncRNAs) that are regulated in a tissue-specific manner and exhibit spatiotemporal expression during neurogenesis with exquisite specificity. lncRNA expression is highly dynamic and demarcates specific subpopulations within neurogenic cell types. Our spatiotemporal transcriptome atlas provides a comprehensive resource for investigating the function of coding genes and noncoding RNAs during crucial stages of early neurogenesis

MDC Repository

Heterologous Stop Codon Readthrough of Metazoan Readthrough Candidates in Yeast

Author: A Firth
B Bonetti
Clara S. Chan
G Stahl
I Jungreis
Irwin Jungreis
J Harger
J Salas-Marco
J Skuzeski
Joseph Schacherer
K Keeling
M Lin
Manolis Kellis
N Wills
O Namy
O Namy
P Cimino
P Ferreira
P Steneberg
T Serio
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2013
Field of study

Recent analysis of genomic signatures in mammals, flies, and worms indicates that functional translational stop codon readthrough is considerably more abundant in metazoa than previously recognized, but this analysis provides only limited clues about the function or mechanism of readthrough. If an mRNA known to be read through in one species is also read through in another, perhaps these questions can be studied in a simpler setting. With this end in mind, we have investigated whether some of the readthrough genes in human, fly, and worm also exhibit readthrough when expressed in S. cerevisiae. We found that readthrough was highest in a gene with a post-stop hexamer known to trigger readthrough, while other metazoan readthrough genes exhibit borderline readthrough in S. cerevisiae.National Institutes of Health (U.S.) (5U54HG004555-03

CiteSeerX

DSpace@MIT

Crossref

Directory of Open Access Journals

PubMed Central

Stop codon readthrough generates a C-terminally extended variant of the human vitamin D receptor with reduced calcitriol response

Author: Atkins John F.
Dmitriev Ruslan I.
Ivanov Ivaylo P.
Jungreis Irwin
Kellis Manolis
Loughran Gary
Power Michael
Tzani Ioanna
Publication venue: 'American Society for Biochemistry & Molecular Biology (ASBMB)'
Publication date: 01/01/2018
Field of study

Although stop codon readthrough is used extensively by viruses to expand their gene expression, verified instances of mammalian readthrough have only recently been uncovered by systems biology and comparative genomics approaches. Previously our analysis of conserved protein coding signatures that extend beyond annotated stop codons predicted stop codon readthrough of several mammalian genes, all of which have been validated experimentally. Four mRNAs display highly efficient stop codon readthrough, and these mRNAs have a UGA stop codon immediately followed by CUAG (UGA_CUAG) that is conserved throughout vertebrates. Extending on the identification of this readthrough motif, we here investigated stop codon readthrough, using tissue culture reporter assays, for all previously untested human genes containing UGA_CUAG. The readthrough efficiency of the annotated stop codon for the sequence encoding vitamin D receptor (VDR) was 6.7%. It was the highest of those tested but all showed notable levels of readthrough. The VDR is a member of the nuclear receptor superfamily of ligand-inducible transcription factors and binds its major ligand, calcitriol, via its C-terminal ligand-binding domain. Readthrough of the annotated VDR mRNA results in a 67 amino-acid-long C-terminal extension that generates a VDR proteoform named VDRx. VDRx may form homodimers and heterodimers with VDR but, compared to VDR, VDRx displayed a reduced transcriptional response to calcitriol even in the presence of its partner retinoid X receptor

DSpace@MIT

Crossref

Ghent University Academic Bibliography

Cork Open Research Archive

Discovery of high-confidence human protein-coding genes and exons by whole-genome PhyloCSF helps elucidate 118 GWAS loci.

Author: Bruford E.
Choudhary J.S.
Davidson C.
Fitzgerald S.
Frankish A.
Gonzalez J.M.
He L.
Hunt T.
Jungreis I.
Kay M.
Kellis M.
Li Y.
Mudge J.M.
Seal R.
Tweedie S.
Waterhouse R.M.
Wright J.C.
Publication venue: 'Cold Spring Harbor Laboratory'
Publication date: 01/12/2019
Field of study

The most widely appreciated role of DNA is to encode protein, yet the exact portion of the human genome that is translated remains to be ascertained. We previously developed PhyloCSF, a widely used tool to identify evolutionary signatures of protein-coding regions using multispecies genome alignments. Here, we present the first whole-genome PhyloCSF prediction tracks for human, mouse, chicken, fly, worm, and mosquito. We develop a workflow that uses machine learning to predict novel conserved protein-coding regions and efficiently guide their manual curation. We analyze more than 1000 high-scoring human PhyloCSF regions and confidently add 144 conserved protein-coding genes to the GENCODE gene set, as well as additional coding regions within 236 previously annotated protein-coding genes, and 169 pseudogenes, most of them disabled after primates diverged. The majority of these represent new discoveries, including 70 previously undetected protein-coding genes. The novel coding genes are additionally supported by single-nucleotide variant evidence indicative of continued purifying selection in the human lineage, coding-exon splicing evidence from new GENCODE transcripts using next-generation transcriptomic data sets, and mass spectrometry evidence of translation for several new genes. Our discoveries required simultaneous comparative annotation of other vertebrate genomes, which we show is essential to remove spurious ORFs and to distinguish coding from pseudogene regions. Our new coding regions help elucidate disease-associated regions by revealing that 118 GWAS variants previously thought to be noncoding are in fact protein altering. Altogether, our PhyloCSF data sets and algorithms will help researchers seeking to interpret these genomes, while our new annotations present exciting loci for further experimental characterization

Serveur académique lausannois

Institute of Cancer Research Repository

Genomic RNA Elements Drive Phase Separation of the SARS-CoV-2 Nucleocapsid

Author: Baric R.S.
Boerneke M.A.
Ekena J.
Fritch E.J.
Gladfelter A.S.
Hou Y.J.
Iserman C.
Jungreis I.
Kellis M.
McLaughlin G.A.
Roden C.A.
Sealfon R.S.G.
Sheahan T.P.
Theesfeld C.L.
Troyanskaya O.G.
Weeks K.M.
Weidmann C.A.
Publication venue: Cell Press
Publication date: 01/01/2020
Field of study

We report that the SARS-CoV-2 nucleocapsid protein (N-protein) undergoes liquid-liquid phase separation (LLPS) with viral RNA. N-protein condenses with specific RNA genomic elements under physiological buffer conditions and condensation is enhanced at human body temperatures (33°C and 37°C) and reduced at room temperature (22°C). RNA sequence and structure in specific genomic regions regulate N-protein condensation while other genomic regions promote condensate dissolution, potentially preventing aggregation of the large genome. At low concentrations, N-protein preferentially crosslinks to specific regions characterized by single-stranded RNA flanked by structured elements and these features specify the location, number, and strength of N-protein binding sites (valency). Liquid-like N-protein condensates form in mammalian cells in a concentration-dependent manner and can be altered by small molecules. Condensation of N-protein is RNA sequence and structure specific, sensitive to human body temperature, and manipulatable with small molecules, and therefore presents a screenable process for identifying antiviral compounds effective against SARS-CoV-2

Carolina Digital Repository

Evolution of enhanced innate immune evasion by SARS-CoV-2

Author: Batra J
Beltrao P
Bischof ML
Bonfanti P
Bouhaddou M
Braberg H
Chen KH
Fabius JM
Fossati A
García-Sastre A
Goodfellow IG
Harjai B
Hiatt J
Hosmillo M
Jahun A
Jolly C
Jungreis I
Jura N
Kellis M
Krogan NJ
McGovern BL
Memon D
Noursadeghi M
Obernier K
Pelin A
Polacco B
Ragazzini R
Reuschl AK
Richards A
Rojc A
Rosales R
Shokat K
Soucheray M
Swaney DL
Takeuchi Y
Thorne LG
Towers GJ
Turner J
Ummadi M
Verba K
Whelan MVX
White K
Zuliani-Alvarez L
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 17/02/2022
Field of study

Emergence of SARS-CoV-2 variants of concern (VOCs) suggests viral adaptation to enhance human-to-human transmission1,2. Although much effort has focused on characterisation of spike changes in VOCs, mutations outside spike likely contribute to adaptation. Here we used unbiased abundance proteomics, phosphoproteomics, RNAseq and viral replication assays to show that isolates of the Alpha (B.1.1.7) variant3 more effectively suppress innate immune responses in airway epithelial cells, compared to first wave isolates. We found that Alpha has dramatically increased subgenomic RNA and protein levels of N, Orf9b and Orf6, all known innate immune antagonists. Expression of Orf9b alone suppressed the innate immune response through interaction with TOM70, a mitochondrial protein required for RNA sensing adaptor MAVS activation. Moreover, the activity of Orf9b and its association with TOM70 was regulated by phosphorylation. We propose that more effective innate immune suppression, through enhanced expression of specific viral antagonist proteins, increases the likelihood of successful Alpha transmission, and may increase in vivo replication and duration of infection4. The importance of mutations outside Spike in adaptation of SARS-CoV-2 to humans is underscored by the observation that similar mutations exist in the Delta and Omicron N/Orf9b regulatory regions

UCL Discovery

Extensive identification and analysis of conserved small ORFs in animals

Author: A Pauli
A Pauli
A Siepel
A-R Carvunis
AA Bazzini
AC Marques
AL Wolfe
AR Bassett
B Banfai
B Escobar
B Obermayer
B Pei
B Schwanhäusser
B Vanderperre
B Vanderperre
BA Wilson
Benedikt Obermayer
C Akimoto
C Gonzalez
CB Burge
Chris Bielow
D Grün
Denise Thiel
DK Gascoigne
DM Anderson
DS Kelkar
E Birney
E Ladoukakis
EG Magny
G Loughran
G-L Chew
GP Wagner
Guido Mastrobuoni
H Dinkel
H Hezroni
H Nielsen
Henrik Zauber
HJ Dyson
I Jungreis
I Ulitsky
I Ulitsky
IA Vergara
J Cox
J Crappé
J Ma
J Ruiz-Orera
J Savard
J Somers
JE Smith
JG Dunn
JL Aspden
JM Chick
JM Engreitz
JP Kastenmayer
K Hanyu-Nakamura
Kamila Kutz
KY Paek
L Wang
Lorenzo Calviello
M Cesana
M Eravci
M Guttman
M Guttman
M Guttman
M Kellis
M Kellis
M Rè
M Stadler
M Sturm
M Wilhelm
M-S Kim
Matthias Selbach
MB Gerstein
MC Frith
MD Sury
ME Dinger
MF Lin
MF Lin
MI Galindo
MK Iyer
ML Crowe
ML Nielsen
ML Senar
MM Savitski
MN Cabili
Nikolaus Rajewsky
NT Ingolia
NT Ingolia
P Tompa
RC Pink
RJ Allen
RS Young
S Heesch van
S Iuchi
S Lee
S Prabakaran
S Schleich
SA Slavoff
SB Azimifar
SE Calvo
Sebastian D. Mackowiak
SJ Andrews
Stefan Kempa
T Derrien
T Geiger
T Kondo
T Kondo
TP Miettinen
TR Mercer
V Ranwez
W-Y Chung
X Gao
X Xing
Y Shen
Z Dosztányi
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

GENCODE 2021

Author: Armstrong J
Barnes I
Berry A
Bignell A
Boix C
Carbonell Sala S
Choudhary JS
Cunningham F
Di Domenico T
Diekhans M
Donaldson S
Fiddes IT
Flicek P
Frankish A
García Girón C
Gerstein M
Gonzalez JM
Grego T
Guigó R
Hardy M
Hourlier T
Howe KL
Hubbard TJP
Hunt T
Izuogu OG
Johnson R
Jungreis I
Kellis M
Lagarde J
Loveland JE
Martin FJ
Martínez L
Mohanan S
Mudge JM
Muir P
Navarro FCP
Parker A
Paten B
Pei B
Pozo F
Riera FC
Ruffier M
Schmitt BM
Sisu C
Stapleton E
Suner MM
Sycheva I
Tress ML
Uszczynska-Ratajczak B
Wolf MY
Wright JC
Xu J
Yang YT
Yates A
Zerbino D
Zhang Y
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2021
Field of study

© The Author(s) 2020. Published by Oxford University Press on behalf of Nucleic Acids Research. The GENCODE project annotates human and mouse genes and transcripts supported by experimental data with high accuracy, providing a foundational resource that supports genome biology and clinical genomics. GENCODE annotation processes make use of primary data and bioinformatic tools and analysis generated both within the consortium and externally to support the creation of transcript structures and the determination of their function. Here, we present improvements to our annotation infrastructure, bioinformatics tools, and analysis, and the advances they support in the annotation of the human and mouse genomes including: the completion of first pass manual annotation for the mouse reference genome; targeted improvements to the annotation of genes associated with SARS-CoV-2 infection; collaborative projects to achieve convergence across reference annotation databases for the annotation of human and mouse protein-coding genes; and the first GENCODE manually supervised automated annotation of lncRNAs. Our annotation is accessible via Ensembl, the UCSC Genome Browser and https://www.gencodegenes.org.National Human Genome Research Institute of the National Institutes of Health [U41HG007234]; the content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health; Wellcome Trust [WT108749/Z/15/Z, WT200990/Z/16/Z]; European Molecular Biology Laboratory; Swiss National Science Foundation through the National Center of Competence in Research ‘RNA & Disease’ (to R.J.); Medical Faculty of the University of Bern (to R.J). Funding for open access charge: National Institutes of Health

DSpace@MIT

UPF Digital Repository

King's Research Portal

Bern Open Repository and Information System (BORIS)

Institute of Cancer Research Repository

Brunel University Research Archive

Recommended from our members

GENCODE: reference annotation for the human and mouse genomes in 2023

Author: Arnan C
Banerjee A
Barnes I
Bennett R
Berry A
Bignell A
Boix C
Calvet F
Carbonell-Sala S
Cerdán-Vélez D
Choudhary JS
Cunningham F
Davidson C
Diekhans M
Donaldson S
Dursun C
Fatima R
Flicek P
Frankish A
Gerstein M
Giorgetti S
Giron CG
Gonzalez JM
Guigo R
Gómez LM
Hardy M
Harrison PW
Hollis Z
Hourlier T
Hubbard TJP
Hunt T
James B
Jiang Y
Johnson R
Jungreis I
Kay M
Kellis M
Kundaje A
Lagarde J
Loveland JE
Martin FJ
Mudge JM
Nair S
Ni P
Paten B
Pozo F
Ramalingam V
Ruffier M
Schmitt BM
Schreiber JM
Sisu C
Steed E
Sumathipala D
Suner M-M
Sycheva I
Tress ML
Uszczynska-Ratajczak B
Wass E
Wright JC
Yang YT
Yates A
Zafrulla Z
Publication venue: 'Oxford University Press (OUP)'
Publication date: 24/11/2022
Field of study

Data availability: No new data were generated or analysed in support of this research.Copyright © The Author(s) 2022. GENCODE produces high quality gene and transcript annotation for the human and mouse genomes. All GENCODE annotation is supported by experimental data and serves as a reference for genome biology and clinical genomics. The GENCODE consortium generates targeted experimental data, develops bioinformatic tools and carries out analyses that, along with externally produced data and methods, support the identification and annotation of transcript structures and the determination of their function. Here, we present an update on the annotation of human and mouse genes, including developments in the tools, data, analyses and major collaborations which underpin this progress. For example, we report the creation of a set of non-canonical ORFs identified in GENCODE transcripts, the LRGASP collaboration to assess the use of long transcriptomic data to build transcript models, the progress in collaborations with RefSeq and UniProt to increase convergence in the annotation of human and mouse protein-coding genes, the propagation of GENCODE across the human pan-genome and the development of new tools to support annotation of regulatory features by GENCODE. Our annotation is accessible via Ensembl, the UCSC Genome Browser and https://www.gencodegenes.org.National Human Genome Research Institute of the National Institutes of Health [U41HG007234, R01HG004037]; Wellcome Trust [WT222155/Z/20/Z]; European Molecular Biology Laboratory. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health. Funding for open access charge: National Institutes of Health

Brunel University Research Archive