14 research outputs found
Research: A comprehensive and quantitative exploration of thousands of viral genomes
The complete assembly of viral genomes from metagenomic datasets (short genomic sequences gathered from environmental samples) has proven to be challenging, so there are significant blind spots when we view viral genomes through the lens of metagenomics. One approach to overcoming this problem is to leverage the thousands of complete viral genomes that are publicly available. Here we describe our efforts to assemble a comprehensive resource that provides a quantitative snapshot of viral genomic trends – such as gene density, noncoding percentage, and abundances of functional gene categories – across thousands of viral genomes. We have also developed a coarse-grained method for visualizing viral genome organization for hundreds of genomes at once, and have explored the extent of the overlap between bacterial and bacteriophage gene pools. Existing viral classification systems were developed prior to the sequencing era, so we present our analysis in a way that allows us to assess the utility of the different classification systems for capturing genomic trends
Virology By The Numbers: A Quantitative Exploration of Viral Energetics, Genomics, and Ecology
Over the past couple of decades, technological advancements in sequencing and imaging have unequivocally proven that the world of viruses is far bigger and more consequential than previously imagined. There are 1031 viruses estimated to inhabit our planet, outnumbering even bacteria. Despite their astronomical numbers and staggering sequence diversity, environmental viruses are poorly characterized. In this thesis we will demonstrate our three-pronged exploration of viruses through the lenses of energetics (Chapters 2 and 3), genomics (Chapter 4) and ecology (Chapter 5). We will first focus on one of the defining features of viruses, namely their reliance on their host for energy, and demonstrate the energetic cost of building a virus and mounting an infection. In our second study, we present one of the largest surveys of complete viral genomes, providing a comprehensive and quantitative snapshot of viral genomic trends for thousands of viruses. In our third study, we shift our focus towards ecological questions surrounding the large number of commensal phages inhabiting the human body. We discovered that phage community composition could serve as a fingerprint, or a "phageprint" – highly personal and stable over time. To our knowledge, this study is one of the largest studies of human phages and the first to demonstrate the feasibility of human identification based on phage sequences.</p
The Energetic Cost of Building a Virus
Viruses are incapable of autonomous energy production. Although many
experimental studies make it clear that viruses are parasitic entities that
hijack the host's molecular resources, a detailed estimate for the energetic
cost of viral synthesis is largely lacking. To quantify the energetic cost of
viruses to their hosts, we enumerated the costs associated with two very
distinct but representative DNA and RNA viruses, namely T4 and influenza. We
found that for these viruses, translation of viral proteins is the most
energetically expensive process. Interestingly, the cost of building a T4 phage
and a single influenza virus are nearly the same. Due to influenza's higher
burst size, however, the overall cost of a T4 phage infection is only 2-3% of
the cost of an influenza infection. The costs of these infections relative to
their host's estimated energy budget during the infection reveal that a T4
infection consumes about a third of its host's energy budget, where as an
influenza infection consumes only 1%. Building on our estimates for T4, we show
how the energetic costs of double-stranded DNA viruses scale with virus size,
revealing that the dominant cost of building a virus can switch from
translation to genome replication above a critical virus size. Lastly, using
our predictions for the energetic cost of viruses, we provide estimates for the
strengths of selection and genetic drift acting on newly incorporated genetic
elements in viral genomes, under conditions of energy limitation
Research: A comprehensive and quantitative exploration of thousands of viral genomes
The complete assembly of viral genomes from metagenomic datasets (short genomic sequences gathered from environmental samples) has proven to be challenging, so there are significant blind spots when we view viral genomes through the lens of metagenomics. One approach to overcoming this problem is to leverage the thousands of complete viral genomes that are publicly available. Here we describe our efforts to assemble a comprehensive resource that provides a quantitative snapshot of viral genomic trends – such as gene density, noncoding percentage, and abundances of functional gene categories – across thousands of viral genomes. We have also developed a coarse-grained method for visualizing viral genome organization for hundreds of genomes at once, and have explored the extent of the overlap between bacterial and bacteriophage gene pools. Existing viral classification systems were developed prior to the sequencing era, so we present our analysis in a way that allows us to assess the utility of the different classification systems for capturing genomic trends
Defining the Energetic Costs of Cellular Structures
All cellular structures are assembled from molecular building blocks, and molecular building blocks incur energetic costs to the cell. In an energy-limited environment, the energetic cost of a cellular structure imposes a fitness cost and impacts a cell's evolutionary trajectory. While the importance of energetic considerations was realized for decades, the distinction between direct energetic costs expended by the cell and potential energy that the cell diverts into cellular biomass components, which we define as the opportunity cost, was not explicitly made, leading to large differences in values for energetic costs of molecular building blocks used in the literature. We describe a framework that defines and separates various components relevant for estimating the energetic costs of molecular building blocks and the resulting cellular structures. This distinction among energetic costs is an essential step towards discussing the conversion of an energetic cost to a corresponding fitness cost
Human Phageprints: A high-resolution exploration of oral phages reveals globally-distributed phage families with individual-specific and temporally-stable community compositions
Metagenomic studies have revolutionized the study of novel phages. However these studies trade the depth of coverage for breadth. In this study we show that the targeted sequencing of a phage genomic region as small as 200-300 base pairs, can provide sufficient sequence diversity to serve as an individual-specific barcode or Phageprint. The targeted approach reveals a high-resolution view of phage communities that is not available through metagenomic datasets. By creating instructional videos and collection kits, we enabled citizen scientists to gather ~700 oral samples spanning ~100 individuals residing in different parts of the world. In examining phage communities at 6 different oral sites, and by comparing phage communities of individuals living across the globe, we were able to study the effect of spatial separation, ranging from several millimeters to thousands of kilometers. We found that the spatial separation of just a few centimeters (the distance between two oral sites) can already result in highly distinct phage community compositions. For larger distances, spanning the phage communities of different individuals living in different parts of the world, we did not observe any correlation between spatial distance and phage community composition as individuals residing in the same city did not have any more similar phage communities than individuals living on different continents. Additionally, we found that neither genetics nor cohabitation seem to play a role in the relatedness of phage community compositions across individuals. Cohabitating siblings and even identical twins did not have phage community compositions that were any more similar than those of unrelated individuals. The primary factor contributing to phage community composition relatedness is direct contact between two habitats, as is demonstrated by the similarity between oral phage community compositions of partners. Furthermore, by exploring phage communities across the span of a month, and in some cases several years, we observed highly stable community compositions. These studies consistently point to the existence of remarkably diverse and personal phage families that are stable in time and apparently present in people around the world
Energetic cost of building a virus
Viruses are incapable of autonomous energy production. Although many experimental studies make it clear that viruses are parasitic entities that hijack the molecular resources of the host, a detailed estimate for the energetic cost of viral synthesis is largely lacking. To quantify the energetic cost of viruses to their hosts, we enumerated the costs associated with two very distinct but representative DNA and RNA viruses, namely, T4 and influenza. We found that, for these viruses, translation of viral proteins is the most energetically expensive process. Interestingly, the costs of building a T4 phage and a single influenza virus are nearly the same. Due to influenza’s higher burst size, however, the overall cost of a T4 phage infection is only 2–3% of the cost of an influenza infection. The costs of these infections relative to their host’s estimated energy budget during the infection reveal that a T4 infection consumes about a third of its host’s energy budget, whereas an influenza infection consumes only ≈ 1%. Building on our estimates for T4, we show how the energetic costs of double-stranded DNA phages scale with the capsid size, revealing that the dominant cost of building a virus can switch from translation to genome replication above a critical size. Last, using our predictions for the energetic cost of viruses, we provide estimates for the strengths of selection and genetic drift acting on newly incorporated genetic elements in viral genomes, under conditions of energy limitation
Defining the Energetic Costs of Cellular Structures
All cellular structures are assembled from molecular building blocks, and molecular building blocks incur energetic costs to the cell. In an energy-limited environment, the energetic cost of a cellular structure imposes a fitness cost and impacts a cell's evolutionary trajectory. While the importance of energetic considerations was realized for decades, the distinction between direct energetic costs expended by the cell and potential energy that the cell diverts into cellular biomass components, which we define as the opportunity cost, was not explicitly made, leading to large differences in values for energetic costs of molecular building blocks used in the literature. We describe a framework that defines and separates various components relevant for estimating the energetic costs of molecular building blocks and the resulting cellular structures. This distinction among energetic costs is an essential step towards discussing the conversion of an energetic cost to a corresponding fitness cost
Intrinsically disordered proteins and conformational noise: Implications in cancer
Intrinsically disordered proteins, IDPs, are proteins that lack a rigid 3D structure under physiological conditions, at least in vitro. Despite the lack of structure, IDPs play important roles in biological processes and transition from disorder to order upon binding to their targets. With multiple conformational states and rapid conformational dynamics, they engage in myriad and often “promiscuous” interactions. These stochastic interactions between IDPs and their partners, defined here as conformational noise, is an inherent characteristic of IDP interactions. The collective effect of conformational noise is an ensemble of protein network configurations, from which the most suitable can be explored in response to perturbations, conferring protein networks with remarkable flexibility and resilience. Moreover, the ubiquitous presence of IDPs as transcriptional factors and, more generally, as hubs in protein networks, is indicative of their role in propagation of transcriptional (genetic) noise. As effectors of transcriptional and conformational noise, IDPs rewire protein networks and unmask latent interactions in response to perturbations. Thus, noise-driven activation of latent pathways could underlie state-switching events such as cellular transformation in cancer. To test this hypothesis, we created a model of a protein network with the topological characteristics of a cancer protein network and tested its response to a perturbation in presence of IDP hubs and conformational noise. Because numerous IDPs are found to be epigenetic modifiers and chromatin remodelers, we hypothesize that they could further channel noise into stable, heritable genotypic changes
Human Phageprints: A high-resolution exploration of oral phages reveals globally-distributed phage families with individual-specific and temporally-stable community compositions
Metagenomic studies have revolutionized the study of novel phages. However these studies trade the depth of coverage for breadth. In this study we show that the targeted sequencing of a phage genomic region as small as 200-300 base pairs, can provide sufficient sequence diversity to serve as an individual-specific barcode or Phageprint. The targeted approach reveals a high-resolution view of phage communities that is not available through metagenomic datasets. By creating instructional videos and collection kits, we enabled citizen scientists to gather ~700 oral samples spanning ~100 individuals residing in different parts of the world. In examining phage communities at 6 different oral sites, and by comparing phage communities of individuals living across the globe, we were able to study the effect of spatial separation, ranging from several millimeters to thousands of kilometers. We found that the spatial separation of just a few centimeters (the distance between two oral sites) can already result in highly distinct phage community compositions. For larger distances, spanning the phage communities of different individuals living in different parts of the world, we did not observe any correlation between spatial distance and phage community composition as individuals residing in the same city did not have any more similar phage communities than individuals living on different continents. Additionally, we found that neither genetics nor cohabitation seem to play a role in the relatedness of phage community compositions across individuals. Cohabitating siblings and even identical twins did not have phage community compositions that were any more similar than those of unrelated individuals. The primary factor contributing to phage community composition relatedness is direct contact between two habitats, as is demonstrated by the similarity between oral phage community compositions of partners. Furthermore, by exploring phage communities across the span of a month, and in some cases several years, we observed highly stable community compositions. These studies consistently point to the existence of remarkably diverse and personal phage families that are stable in time and apparently present in people around the world