127 research outputs found
Heterologous Stop Codon Readthrough of Metazoan Readthrough Candidates in Yeast
Recent analysis of genomic signatures in mammals, flies, and worms indicates that functional translational stop codon readthrough is considerably more abundant in metazoa than previously recognized, but this analysis provides only limited clues about the function or mechanism of readthrough. If an mRNA known to be read through in one species is also read through in another, perhaps these questions can be studied in a simpler setting. With this end in mind, we have investigated whether some of the readthrough genes in human, fly, and worm also exhibit readthrough when expressed in S. cerevisiae. We found that readthrough was highest in a gene with a post-stop hexamer known to trigger readthrough, while other metazoan readthrough genes exhibit borderline readthrough in S. cerevisiae.National Institutes of Health (U.S.) (5U54HG004555-03
Finite covers of random 3-manifolds
A 3-manifold is Haken if it contains a topologically essential surface. The
Virtual Haken Conjecture posits that every irreducible 3-manifold with infinite
fundamental group has a finite cover which is Haken. In this paper, we study
random 3-manifolds and their finite covers in an attempt to shed light on this
difficult question. In particular, we consider random Heegaard splittings by
gluing two handlebodies by the result of a random walk in the mapping class
group of a surface. For this model of random 3-manifold, we are able to compute
the probabilities that the resulting manifolds have finite covers of particular
kinds. Our results contrast with the analogous probabilities for groups coming
from random balanced presentations, giving quantitative theorems to the effect
that 3-manifold groups have many more finite quotients than random groups. The
next natural question is whether these covers have positive betti number. For
abelian covers of a fixed type over 3-manifolds of Heegaard genus 2, we show
that the probability of positive betti number is 0.
In fact, many of these questions boil down to questions about the mapping
class group. We are lead to consider the action of mapping class group of a
surface S on the set of quotients pi_1(S) -> Q. If Q is a simple group, we show
that if the genus of S is large, then this action is very mixing. In
particular, the action factors through the alternating group of each orbit.
This is analogous to Goldman's theorem that the action of the mapping class
group on the SU(2) character variety is ergodic.Comment: 60 pages; v2: minor changes. v3: minor changes; final versio
Evidence for a novel overlapping coding sequence in POLG initiated at a CUG start codon
Abstract: Background: POLG, located on nuclear chromosome 15, encodes the DNA polymerase γ(Pol γ). Pol γ is responsible for the replication and repair of mitochondrial DNA (mtDNA). Pol γ is the only DNA polymerase found in mitochondria for most animal cells. Mutations in POLG are the most common single-gene cause of diseases of mitochondria and have been mapped over the coding region of the POLG ORF. Results: Using PhyloCSF to survey alternative reading frames, we found a conserved coding signature in an alternative frame in exons 2 and 3 of POLG, herein referred to as ORF-Y that arose de novo in placental mammals. Using the synplot2 program, synonymous site conservation was found among mammals in the region of the POLG ORF that is overlapped by ORF-Y. Ribosome profiling data revealed that ORF-Y is translated and that initiation likely occurs at a CUG codon. Inspection of an alignment of mammalian sequences containing ORF-Y revealed that the CUG codon has a strong initiation context and that a well-conserved predicted RNA stem-loop begins 14 nucleotides downstream. Such features are associated with enhanced initiation at near-cognate non-AUG codons. Reanalysis of the Kim et al. (2014) draft human proteome dataset yielded two unique peptides that map unambiguously to ORF-Y. An additional conserved uORF, herein referred to as ORF-Z, was also found in exon 2 of POLG. Lastly, we surveyed Clinvar variants that are synonymous with respect to the POLG ORF and found that most of these variants cause amino acid changes in ORF-Y or ORF-Z. Conclusions: We provide evidence for a novel coding sequence, ORF-Y, that overlaps the POLG ORF. Ribosome profiling and mass spectrometry data show that ORF-Y is expressed. PhyloCSF and synplot2 analysis show that ORF-Y is subject to strong purifying selection. An abundance of disease-correlated mutations that map to exons 2 and 3 of POLG but also affect ORF-Y provides potential clinical significance to this finding
Evolution of enhanced innate immune evasion by SARS-CoV-2
Emergence of SARS-CoV-2 variants of concern (VOCs) suggests viral adaptation to enhance human-to-human transmission1,2. Although much effort has focused on characterisation of spike changes in VOCs, mutations outside spike likely contribute to adaptation. Here we used unbiased abundance proteomics, phosphoproteomics, RNAseq and viral replication assays to show that isolates of the Alpha (B.1.1.7) variant3 more effectively suppress innate immune responses in airway epithelial cells, compared to first wave isolates. We found that Alpha has dramatically increased subgenomic RNA and protein levels of N, Orf9b and Orf6, all known innate immune antagonists. Expression of Orf9b alone suppressed the innate immune response through interaction with TOM70, a mitochondrial protein required for RNA sensing adaptor MAVS activation. Moreover, the activity of Orf9b and its association with TOM70 was regulated by phosphorylation. We propose that more effective innate immune suppression, through enhanced expression of specific viral antagonist proteins, increases the likelihood of successful Alpha transmission, and may increase in vivo replication and duration of infection4. The importance of mutations outside Spike in adaptation of SARS-CoV-2 to humans is underscored by the observation that similar mutations exist in the Delta and Omicron N/Orf9b regulatory regions
Discovery of high-confidence human protein-coding genes and exons by whole-genome PhyloCSF helps elucidate 118 GWAS loci.
The most widely appreciated role of DNA is to encode protein, yet the exact portion of the human genome that is translated remains to be ascertained. We previously developed PhyloCSF, a widely used tool to identify evolutionary signatures of protein-coding regions using multispecies genome alignments. Here, we present the first whole-genome PhyloCSF prediction tracks for human, mouse, chicken, fly, worm, and mosquito. We develop a workflow that uses machine learning to predict novel conserved protein-coding regions and efficiently guide their manual curation. We analyze more than 1000 high-scoring human PhyloCSF regions and confidently add 144 conserved protein-coding genes to the GENCODE gene set, as well as additional coding regions within 236 previously annotated protein-coding genes, and 169 pseudogenes, most of them disabled after primates diverged. The majority of these represent new discoveries, including 70 previously undetected protein-coding genes. The novel coding genes are additionally supported by single-nucleotide variant evidence indicative of continued purifying selection in the human lineage, coding-exon splicing evidence from new GENCODE transcripts using next-generation transcriptomic data sets, and mass spectrometry evidence of translation for several new genes. Our discoveries required simultaneous comparative annotation of other vertebrate genomes, which we show is essential to remove spurious ORFs and to distinguish coding from pseudogene regions. Our new coding regions help elucidate disease-associated regions by revealing that 118 GWAS variants previously thought to be noncoding are in fact protein altering. Altogether, our PhyloCSF data sets and algorithms will help researchers seeking to interpret these genomes, while our new annotations present exciting loci for further experimental characterization
Recommended from our members
A high-resolution map of human evolutionary constraint using 29 mammals.
The comparison of related genomes has emerged as a powerful lens for genome interpretation. Here we report the sequencing and comparative analysis of 29 eutherian genomes. We confirm that at least 5.5% of the human genome has undergone purifying selection, and locate constrained elements covering ∼4.2% of the genome. We use evolutionary signatures and comparisons with experimental data sets to suggest candidate functions for ∼60% of constrained bases. These elements reveal a small number of new coding exons, candidate stop codon readthrough events and over 10,000 regions of overlapping synonymous constraint within protein-coding exons. We find 220 candidate RNA structural families, and nearly a million elements overlapping potential promoter, enhancer and insulator regions. We report specific amino acid residues that have undergone positive selection, 280,000 non-coding elements exapted from mobile elements and more than 1,000 primate- and human-accelerated elements. Overlap with disease-associated variants indicates that our findings will be relevant for studies of human biology, health and disease
A High-Resolution Map of Human Evolutionary Constraint Using 29 Mammals
The comparison of related genomes has emerged as a powerful lens for genome interpretation. Here we report the sequencing and comparative analysis of 29 eutherian genomes. We confirm that at least 5.5% of the human genome has undergone purifying selection, and locate constrained elements covering ~4.2% of the genome. We use evolutionary signatures and comparisons with experimental data sets to suggest candidate functions for ~60% of constrained bases. These elements reveal a small number of new coding exons, candidate stop codon readthrough events and over 10,000 regions of overlapping synonymous constraint within protein-coding exons. We find 220 candidate RNA structural families, and nearly a million elements overlapping potential promoter, enhancer and insulator regions. We report specific amino acid residues that have undergone positive selection, 280,000 non-coding elements exapted from mobile elements and more than 1,000 primate- and human-accelerated elements. Overlap with disease-associated variants indicates that our findings will be relevant for studies of human biology, health and disease.National Human Genome Research Institute (U.S.)National Institute of General Medical Sciences (U.S.) (Grant number GM82901)National Science Foundation (U.S.). Postdoctural Fellowship (Award 0905968)National Science Foundation (U.S.). Career (0644282)National Institutes of Health (U.S.) (R01-HG004037)Alfred P. Sloan Foundation.Austrian Science Fund. Erwin Schrodinger Fellowshi
Ebola virus epidemiology, transmission, and evolution during seven months in Sierra Leone
The 2013-2015 Ebola virus disease (EVD) epidemic is caused by the Makona variant of Ebola virus (EBOV). Early in the epidemic, genome sequencing provided insights into virus evolution and transmission and offered important information for outbreak response. Here, we analyze sequences from 232 patients sampled over 7 months in Sierra Leone, along with 86 previously released genomes from earlier in the epidemic. We confirm sustained human-to-human transmission within Sierra Leone and find no evidence for import or export of EBOV across national borders after its initial introduction. Using high-depth replicate sequencing, we observe both host-to-host transmission and recurrent emergence of intrahost genetic variants. We trace the increasing impact of purifying selection in suppressing the accumulation of nonsynonymous mutations over time. Finally, we note changes in the mucin-like domain of EBOV glycoprotein that merit further investigation. These findings clarify the movement of EBOV within the region and describe viral evolution during prolonged human-to-human transmission
Recommended from our members
GENCODE: reference annotation for the human and mouse genomes in 2023
Data availability: No new data were generated or analysed in support of this research.Copyright © The Author(s) 2022. GENCODE produces high quality gene and transcript annotation for the human and mouse genomes. All GENCODE annotation is supported by experimental data and serves as a reference for genome biology and clinical genomics. The GENCODE consortium generates targeted experimental data, develops bioinformatic tools and carries out analyses that, along with externally produced data and methods, support the identification and annotation of transcript structures and the determination of their function. Here, we present an update on the annotation of human and mouse genes, including developments in the tools, data, analyses and major collaborations which underpin this progress. For example, we report the creation of a set of non-canonical ORFs identified in GENCODE transcripts, the LRGASP collaboration to assess the use of long transcriptomic data to build transcript models, the progress in collaborations with RefSeq and UniProt to increase convergence in the annotation of human and mouse protein-coding genes, the propagation of GENCODE across the human pan-genome and the development of new tools to support annotation of regulatory features by GENCODE. Our annotation is accessible via Ensembl, the UCSC Genome Browser and https://www.gencodegenes.org.National Human Genome Research Institute of the National Institutes of Health [U41HG007234, R01HG004037]; Wellcome Trust [WT222155/Z/20/Z]; European Molecular Biology Laboratory. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health. Funding for open access charge: National Institutes of Health
- …