Search CORE

63 research outputs found

Loose ends: almost one in five human genes still have unresolved coding status

Author: Abascal Federico
Juan David
Jungreis Irwin
Martinez Laura
Rigau Maria
Rodriguez Jose Manuel
Tress Michael L.
Vazquez Jesus
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2018
Field of study

Seventeen years after the sequencing of the human genome, the human proteome is still under revision. One in eight of the 22 210 coding genes listed by the Ensembl/GENCODE, RefSeq and UniProtKB reference databases are annotated differently across the three sets. We have carried out an in-depth investigation on the 2764 genes classified as coding by one or more sets of manual curators and not coding by others. Data from large-scale genetic variation analyses suggests that most are not under protein-like purifying selection and so are unlikely to code for functional proteins. A further 1470 genes annotated as coding in all three reference sets have characteristics that are typical of non-coding genes or pseudogenes. These potential non-coding genes also appear to be undergoing neutral evolution and have considerably less supporting transcript and protein evidence than other coding genes. We believe that the three reference databases currently overestimate the number of human coding genes by at least 2000, complicating and adding noise to large-scale biomedical experiments. Determining which potential non-coding genes do not code for proteins is a difficult but vitally important task since the human reference proteome is a fundamental pillar of most basic research and supports almost all large-scale biomedical projects.National Institutes of Health [2 U41 HG007234 to I.J., L.M., J.M.R. and M.L.T., R01 HG004037 to I.J.]. Funding for open access charge: NIH [2 U41 HG007234].S

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

REPISALUD

Evidence of abundant stop codon readthrough in Drosophila and other Metazoa

Author: Chan Clara Sophia
Jungreis Irwin
Kellis Manolis
Lin Michael F.
Negre Nicolas
Spokony Rebecca
Victorsen Alec
White Kevin P.
Publication venue: 'Cold Spring Harbor Laboratory'
Publication date: 01/12/2010
Field of study

While translational stop codon readthrough is often used by viral genomes, it has been observed for only a handful of eukaryotic genes. We previously used comparative genomics evidence to recognize protein-coding regions in 12 species of Drosophila and showed that for 149 genes, the open reading frame following the stop codon has a protein-coding conservation signature, hinting that stop codon readthrough might be common in Drosophila. We return to this observation armed with deep RNA sequence data from the modENCODE project, an improved higher-resolution comparative genomics metric for detecting protein-coding regions, comparative sequence information from additional species, and directed experimental evidence. We report an expanded set of 283 readthrough candidates, including 16 double-readthrough candidates; these were manually curated to rule out alternatives such as A-to-I editing, alternative splicing, dicistronic translation, and selenocysteine incorporation. We report experimental evidence of translation using GFP tagging and mass spectrometry for several readthrough regions. We find that the set of readthrough candidates differs from other genes in length, composition, conservation, stop codon context, and in some cases, conserved stem–loops, providing clues about readthrough regulation and potential mechanisms. Lastly, we expand our studies beyond Drosophila and find evidence of abundant readthrough in several other insect species and one crustacean, and several readthrough candidates in nematode and human, suggesting that functionally important translational stop codon readthrough is significantly more prevalent in Metazoa than previously recognized.National Institutes of Health (U.S.) (U54 HG00455-01)National Science Foundation (U.S.) (CAREER 0644282)Alfred P. Sloan Foundatio

DSpace@MIT

Crossref

PubMed Central

Evidence of efficient stop codon readthrough in four mammalian genes

Author: Atkins John F.
Baranov Pavel V.
Chou Ming-Yuan
Ivanov Ivaylo P.
Jungreis Irwin
Kellis Manolis
Kiran Anmol M.
Loughran Gary
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2014
Field of study

Stop codon readthrough is used extensively by viruses to expand their gene expression. Until recent discoveries in Drosophila, only a very limited number of readthrough cases in chromosomal genes had been reported. Analysis of conserved protein coding signatures that extend beyond annotated stop codons identified potential stop codon readthrough of four mammalian genes. Here we use a modified targeted bioinformatic approach to identify a further three mammalian readthrough candidates. All seven genes were tested experimentally using reporter constructs transfected into HEK-293T cells. Four displayed efficient stop codon readthrough, and these have UGA immediately followed by CUAG. Comparative genomic analysis revealed that in the four readthrough candidates containing UGA-CUAG, this motif is conserved not only in mammals but throughout vertebrates with the first six of the seven nucleotides being universally conserved. The importance of the CUAG motif was confirmed using a systematic mutagenesis approach. One gene, OPRL1, encoding an opiate receptor, displayed extremely efficient levels of readthrough (∼31%) in HEK-293T cells. Signals both 5′ and 3′ of the OPRL1 stop codon contribute to this high level of readthrough. The sequence UGA-CUA alone can support 1.5% readthrough, underlying its importance.National Institutes of Health (U.S.) (NIH-1-R01-HG004037-07)National Institutes of Health (U.S.) (NSF-DBI-0644282)National Institutes of Health (U.S.) (NIH-U41-HG007234

Cork Open Research Archive

Heterologous Stop Codon Readthrough of Metazoan Readthrough Candidates in Yeast

Author: A Firth
B Bonetti
Clara S. Chan
G Stahl
I Jungreis
Irwin Jungreis
J Harger
J Salas-Marco
J Skuzeski
Joseph Schacherer
K Keeling
M Lin
Manolis Kellis
N Wills
O Namy
O Namy
P Cimino
P Ferreira
P Steneberg
T Serio
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2013
Field of study

Recent analysis of genomic signatures in mammals, flies, and worms indicates that functional translational stop codon readthrough is considerably more abundant in metazoa than previously recognized, but this analysis provides only limited clues about the function or mechanism of readthrough. If an mRNA known to be read through in one species is also read through in another, perhaps these questions can be studied in a simpler setting. With this end in mind, we have investigated whether some of the readthrough genes in human, fly, and worm also exhibit readthrough when expressed in S. cerevisiae. We found that readthrough was highest in a gene with a post-stop hexamer known to trigger readthrough, while other metazoan readthrough genes exhibit borderline readthrough in S. cerevisiae.National Institutes of Health (U.S.) (5U54HG004555-03

CiteSeerX

DSpace@MIT

Crossref

Directory of Open Access Journals

PubMed Central

Conflicting and ambiguous names of overlapping ORFs in the SARS-CoV-2 genome: A homology-based resolution.

Author: Ardern Zachary
Finkel Yaara
Firth Andrew E
Gorbalenya Alexander E
Jungreis Irwin
Kellis Manolis
Krogan Nevan J
Nelson Chase W
Pavesi Angelo
Sato Kei
Stern-Ginossar Noam
Ziebuhr John
Publication venue: Virology
Publication date: 01/02/2021
Field of study

At least six small alternative-frame open reading frames (ORFs) overlapping well-characterized SARS-CoV-2 genes have been hypothesized to encode accessory proteins. Researchers have used different names for the same ORF or the same name for different ORFs, resulting in erroneous homological and functional inferences. We propose standard names for these ORFs and their shorter isoforms, developed in consultation with the Coronaviridae Study Group of the International Committee on Taxonomy of Viruses. We recommend calling the 39 codon Spike-overlapping ORF ORF2b; the 41, 57, and 22 codon ORF3a-overlapping ORFs ORF3c, ORF3d, and ORF3b; the 33 codon ORF3d isoform ORF3d-2; and the 97 and 73 codon Nucleocapsid-overlapping ORFs ORF9b and ORF9c. Finally, we document conflicting usage of the name ORF3b in 32 studies, and consequent erroneous inferences, stressing the importance of reserving identical names for homologs. We recommend that authors referring to these ORFs provide lengths and coordinates to minimize ambiguity caused by prior usage of alternative names

DSpace@MIT

Leiden University Scholary Publications

Apollo (Cambridge)

Evidence for a novel overlapping coding sequence in POLG initiated at a CUG start codon

Author: Choudhary Jyoti S.
Firth Andrew E.
Jungreis Irwin
Kellis Manolis
Khan Yousuf A.
Mudge Jonathan M.
Wright James C.
Publication venue: BMC Genetics
Publication date: 06/03/2020
Field of study

Abstract: Background: POLG, located on nuclear chromosome 15, encodes the DNA polymerase γ(Pol γ). Pol γ is responsible for the replication and repair of mitochondrial DNA (mtDNA). Pol γ is the only DNA polymerase found in mitochondria for most animal cells. Mutations in POLG are the most common single-gene cause of diseases of mitochondria and have been mapped over the coding region of the POLG ORF. Results: Using PhyloCSF to survey alternative reading frames, we found a conserved coding signature in an alternative frame in exons 2 and 3 of POLG, herein referred to as ORF-Y that arose de novo in placental mammals. Using the synplot2 program, synonymous site conservation was found among mammals in the region of the POLG ORF that is overlapped by ORF-Y. Ribosome profiling data revealed that ORF-Y is translated and that initiation likely occurs at a CUG codon. Inspection of an alignment of mammalian sequences containing ORF-Y revealed that the CUG codon has a strong initiation context and that a well-conserved predicted RNA stem-loop begins 14 nucleotides downstream. Such features are associated with enhanced initiation at near-cognate non-AUG codons. Reanalysis of the Kim et al. (2014) draft human proteome dataset yielded two unique peptides that map unambiguously to ORF-Y. An additional conserved uORF, herein referred to as ORF-Z, was also found in exon 2 of POLG. Lastly, we surveyed Clinvar variants that are synonymous with respect to the POLG ORF and found that most of these variants cause amino acid changes in ORF-Y or ORF-Z. Conclusions: We provide evidence for a novel coding sequence, ORF-Y, that overlaps the POLG ORF. Ribosome profiling and mass spectrometry data show that ORF-Y is expressed. PhyloCSF and synplot2 analysis show that ORF-Y is subject to strong purifying selection. An abundance of disease-correlated mutations that map to exons 2 and 3 of POLG but also affect ORF-Y provides potential clinical significance to this finding

DSpace@MIT

Apollo (Cambridge)

Institute of Cancer Research Repository

Stop codon readthrough generates a C-terminally extended variant of the human vitamin D receptor with reduced calcitriol response

Author: Atkins John F.
Dmitriev Ruslan I.
Ivanov Ivaylo P.
Jungreis Irwin
Kellis Manolis
Loughran Gary
Power Michael
Tzani Ioanna
Publication venue: 'American Society for Biochemistry & Molecular Biology (ASBMB)'
Publication date: 01/01/2018
Field of study

Although stop codon readthrough is used extensively by viruses to expand their gene expression, verified instances of mammalian readthrough have only recently been uncovered by systems biology and comparative genomics approaches. Previously our analysis of conserved protein coding signatures that extend beyond annotated stop codons predicted stop codon readthrough of several mammalian genes, all of which have been validated experimentally. Four mRNAs display highly efficient stop codon readthrough, and these mRNAs have a UGA stop codon immediately followed by CUAG (UGA_CUAG) that is conserved throughout vertebrates. Extending on the identification of this readthrough motif, we here investigated stop codon readthrough, using tissue culture reporter assays, for all previously untested human genes containing UGA_CUAG. The readthrough efficiency of the annotated stop codon for the sequence encoding vitamin D receptor (VDR) was 6.7%. It was the highest of those tested but all showed notable levels of readthrough. The VDR is a member of the nuclear receptor superfamily of ligand-inducible transcription factors and binds its major ligand, calcitriol, via its C-terminal ligand-binding domain. Readthrough of the annotated VDR mRNA results in a 67 amino-acid-long C-terminal extension that generates a VDR proteoform named VDRx. VDRx may form homodimers and heterodimers with VDR but, compared to VDR, VDRx displayed a reduced transcriptional response to calcitriol even in the presence of its partner retinoid X receptor

DSpace@MIT

Crossref

Ghent University Academic Bibliography

Cork Open Research Archive

Discovery of Human sORF-Encoded Polypeptides (SEPs) in Cell Lines and Tissue

Author: Budnik Bogdan A.
Jungreis Irwin
Kellis Manolis
Ma Jiao
Neveu John
Saghatelian Alan
Schwaid Adam G.
Slavoff Sarah A.
Ward Carl C.
Publication venue: 'American Chemical Society (ACS)'
Publication date: 14/02/2014
Field of study

The existence of nonannotated protein-coding human short open reading frames (sORFs) has been revealed through the direct detection of their sORF-encoded polypeptide (SEP) products. The discovery of novel SEPs increases the size of the genome and the proteome and provides insights into the molecular biology of mammalian cells, such as the prevalent usage of non-AUG start codons. Through modifications of the existing SEP-discovery workflow, we discover an additional 195 SEPs in K562 cells and extend this methodology to identify novel human SEPs in additional cell lines and human tissue for a final tally of 237 new SEPs. These results continue to expand the human genome and proteome and demonstrate that SEPs are a ubiquitous class of nonannotated polypeptides that require further investigation

DSpace@MIT

Crossref

Harvard University - DASH

PubMed Central

FigShare

Recommended from our members

A high-resolution map of human evolutionary constraint using 29 mammals.

Author: Alföldi Jessica
Baldwin Jen
Baylor College of Medicine Human Genome Sequencing Center Sequencing Team
Beal Kathryn
Birney Ewan
Bloom Toby
Broad Institute Sequencing Platform and Whole Genome Assembly Team
Chang Jean
Chin Chee Whye
Clamp Michele
Clawson Hiram
Cree Andrew
Cuff James
Delehaunty Kim
Di Palma Federica
Dihn Huyen H
Dooling David
Ernst Jason
Fitzgerald Stephen
Flicek Paul
Fowler Gerald
Fronik Catrina
Fulton Bob
Fulton Lucinda
Garber Manuel
Genome Institute at Washington University
Gibbs Richard A
Gnerre Sante
Goldman Nick
Graves Tina
Green Eric D
Guttman Mitchell
Haussler David
Heiman Dave
Herrero Javier
Holloway Alisha K
Hubisz Melissa J
Jaffe David B
Jhangiani Shalili
Jordan Gregory
Joshi Vandita
Jungreis Irwin
Kellis Manolis
Kent W James
Kheradpour Pouya
Kostka Dennis
Kovar Christie L
Lander Eric S
Lara Marcia
Lee Sandra
Lewis Lora R
Lin Michael F
Lindblad-Toh Kerstin
Lowe Craig B
Mardis Elaine R
Margulies Elliott H
Martins Andre L
Massingham Tim
Mauceli Evan
Minx Patrick
Moltke Ida
Muzny Donna M
Nazareth Lynne V
Nicol Robert
Nusbaum Chad
Okwuonu Geoffrey
Parker Brian J
Pedersen Jakob S
Pollard Katherine S
Raney Brian J
Rasmussen Matthew D
Robinson Jim
Santibanez Jireh
Siepel Adam
Sodergren Erica
Stark Alexander
Vilella Albert J
Ward Lucas D
Warren Wesley C
Washietl Stefan
Weinstock George M
Wen Jiayu
Wilkinson Jane
Wilson Richard K
Worley Kim C
Xie Xiaohui
Young Sarah
Zody Michael C
Zuk Or
Publication venue: eScholarship, University of California
Publication date: 01/10/2011
Field of study

The comparison of related genomes has emerged as a powerful lens for genome interpretation. Here we report the sequencing and comparative analysis of 29 eutherian genomes. We confirm that at least 5.5% of the human genome has undergone purifying selection, and locate constrained elements covering ∼4.2% of the genome. We use evolutionary signatures and comparisons with experimental data sets to suggest candidate functions for ∼60% of constrained bases. These elements reveal a small number of new coding exons, candidate stop codon readthrough events and over 10,000 regions of overlapping synonymous constraint within protein-coding exons. We find 220 candidate RNA structural families, and nearly a million elements overlapping potential promoter, enhancer and insulator regions. We report specific amino acid residues that have undergone positive selection, 280,000 non-coding elements exapted from mobile elements and more than 1,000 primate- and human-accelerated elements. Overlap with disease-associated variants indicates that our findings will be relevant for studies of human biology, health and disease

eScholarship - University of California

GENCODE: reference annotation for the human and mouse genomes in 2023.

Author: Arnan Carme
Banerjee Abhimanyu
Barnes If
Bennett Ruth
Berry Andrew
Bignell Alexandra
Boix Carles
Calvet Ferriol
Carbonell-Sala Sílvia
Cerdán-Vélez Daniel
Choudhary Jyoti S
Cunningham Fiona
Davidson Claire
Diekhans Mark
Donaldson Sarah
Dursun Cagatay
Fatima Reham
Flicek Paul
Frankish Adam
Gerstein Mark
Giorgetti Stefano
Giron Carlos Garcıa
Gonzalez Jose Manuel
Guigo Roderic
Gómez Laura Martínez
Hardy Matthew
Harrison Peter W
Hollis Zoe
Hourlier Thibaut
Hubbard Tim J P
Hunt Toby
James Benjamin
Jiang Yunzhe
Johnson Rory
Jungreis Irwin
Kay Mike
Kellis Manolis
Kundaje Anshul
Lagarde Julien
Loveland Jane E
Martin Fergal J
Mudge Jonathan M
Nair Surag
Ni Pengyu
Paten Benedict
Pozo Fernando
Ramalingam Vivek
Ruffier Magali
Schmitt Bianca M
Schreiber Jacob M
Sisu Cristina
Steed Emily
Sumathipala Dulika
Suner Marie-Marthe
Sycheva Irina
Tress Michael L
Uszczynska-Ratajczak Barbara
Wass Elizabeth
Wright James C
Yang Yucheng T
Yates Andrew
Zafrulla Zahoor
Publication venue: 'Oxford University Press (OUP)'
Publication date: 24/11/2022
Field of study

GENCODE produces high quality gene and transcript annotation for the human and mouse genomes. All GENCODE annotation is supported by experimental data and serves as a reference for genome biology and clinical genomics. The GENCODE consortium generates targeted experimental data, develops bioinformatic tools and carries out analyses that, along with externally produced data and methods, support the identification and annotation of transcript structures and the determination of their function. Here, we present an update on the annotation of human and mouse genes, including developments in the tools, data, analyses and major collaborations which underpin this progress. For example, we report the creation of a set of non-canonical ORFs identified in GENCODE transcripts, the LRGASP collaboration to assess the use of long transcriptomic data to build transcript models, the progress in collaborations with RefSeq and UniProt to increase convergence in the annotation of human and mouse protein-coding genes, the propagation of GENCODE across the human pan-genome and the development of new tools to support annotation of regulatory features by GENCODE. Our annotation is accessible via Ensembl, the UCSC Genome Browser and https://www.gencodegenes.org

Bern Open Repository and Information System (BORIS)