Bogs, Bugs, Borgs, and Bacteriophages: Metagenomic and Biochemical Insights into the Enigmatic World of Extrachromosomal Genetic Elements

Abstract

As a Ph.D. Candidate and National Science Foundation Predoctoral Fellow at the University of California, Berkeley, working in the labs of Dr. Jillian Banfield and Dr. Jennifer Doudna, I have dedicated my Ph.D. to the discovery and investigation of novel extrachromosomal elements and tools for biotechnological applications through a combination of genomics and biochemistry.The first chapter of this thesis uncovers 10 new clades of the largest bacteriophages ever found across many ecosystems worldwide, with genome sizes rivaling those of the smallest bacteria. We found that the phages are not only equipped with a wide variety of features typically associated with life and cellular organisms such as ribosomal proteins, tRNA synthetases and initiation and elongation factors, but also some of the viruses intriguingly utilize alternative genetic codes to translate their proteins. Notably, I discovered that the huge phage genomes encode CRISPR-Cas systems that may be used for inter-viral warfare. Some of these are miniature, previously undescribed CRISPR-Cas systems that are about half of the size of Cas9. This work was published in Nature. The second chapter describes the analysis and testing of one of the novel phage CRISPR-Cas systems, CRISPR-CasΦ, that we have shown can indeed exclude mobile elements such as plasmids from infecting the same host cell despite their small size, and can be applicable for programmable genome editing in bacterial, plant, and mammalian cells as the most compact functional CRISPR-Cas systems to date, potentially circumventing cell delivery barriers exhibited with CRISPR-Cas9 gene editing. Intriguingly, the CRISPR-CasΦ system exhibited a previously undescribed consolidation of chemistries in a Cas nuclease as the RuvC active site mediated both double-stranded DNA cleavage and RNA processing in a metal-dependent manner. This work was published in Science. The third chapter examines the discovery of enigmatic giant linear extrachromosomal elements, which we refer to as “Borgs”, inhabiting archaea. These elements that are about 1 Mbp long were recovered from multiple environments and may play a previously unrecognized role in controlling greenhouse gas emissions. Their genomes are represented in 2 uneven replichores, with inverted repeats >1.5kbp long on either end and dozens of tandem repeats throughout their genomes. They contain no obvious hallmarks of previously reported viruses or plasmids, and ~80% of their genes consist of novel and uncharacterized proteins. Our analysis of horizontal gene transfer suggests that many ribosomal, metabolic, and extracellular electron transfer genes and operons recently transferred from their hosts, including the nif operon for Nitrogen fixation and the MCR complex which was recently proposed to be involved in oxidation of methane. Evidence also suggests recent recombination events between different Borgs presumably within the same host cell. This work is currently in review at Nature. The fourth chapter describes an open-science effort for robust viral discovery computational pipelines driven by the COVID-19 pandemic. Working with a truly collaborative global team of bioinformaticians, this work describes the discovery of over 100,000 species of viruses to which I have contributed novel huge phage genomes. This manuscript was published in Nature. The final chapter examines the discovery of thousands of viruses encoding CRISPR-Cas systems, many of which target competing cryptic mobile elements that are predicted to infect the same bacterial hosts. From genome-resolved metagenomics and bioinformatics-enabled phylogenetic insights to biochemistry, structural biology, and eukaryotic genome editing, I describe hundreds of novel hypercompact and divergent CRISPR-Cas systems, with special consideration towards the novel Casλ family. Casλ possesses an aberrant RNA structure reminiscent of a naturally-occurring sgRNA and processes its own crRNA at the 3’ end, unlike any previously described single-RNA CRISPR-Cas system. The tertiary structure determined via cryo-EM reveals the machinery for PAM recognition, hybrid assembly, and DNA cleavage. RNA-targeting systems on viruses lack crucial residues or accessory proteins that would, in their bacterial counterparts, result in acute abortive infection, suggesting a potential strategy for phage systems to maintain host viability while preventing superinfection. In addition to their streamlined nature that is advantageous for cellular delivery, hypercompact phage systems can produce efficient genome editing in endogenous genes in mammalian and plant cells on par with, or in some cases, exceeding gold-standard Cas12a editing, demonstrating significant utility for biotechnological applications.Overall, this dissertation describes the use of a combination of bioinformatics and biochemistry to shed light on gigantic bacterial viruses, the proteins they encode on their genomes, and elements such as Borgs which we are only beginning to understand. Huge phages and Borgs represent little-known biology, the platforms for which are distinct from previously known systems, and significantly broaden our overall understanding of “non-living” selfish genetic entities. The metagenomic discovery and biochemical and structural characterization of hypercompact CRISPR-Cas systems in addition to analyses of their genome editing utility in eukaryotic cells pave the road for efficacious delivery of treatments to human cells in the near future

    Similar works