45 research outputs found
ORFs in families and cassettes.
<p>The number and proportion of ORFs predicted in each dataset that belong to protein-coding families (i.e. are not unique), and/or belong to cassettes (groups of protein-coding families that are found on the same set of contigs.</p
Conservation of Gene Cassettes among Diverse Viruses of the Human Gut
<div><p>Viruses are a crucial component of the human microbiome, but large population sizes, high sequence diversity, and high frequencies of novel genes have hindered genomic analysis by high-throughput sequencing. Here we investigate approaches to metagenomic assembly to probe genome structure in a sample of 5.6 Gb of gut viral DNA sequence from six individuals. Tests showed that a new pipeline based on DeBruijn graph assembly yielded longer contigs that were able to recruit more reads than the equivalent non-optimized, single-pass approach. To characterize gene content, the database of viral RefSeq proteins was compared to the assembled viral contigs, generating a bipartite graph with functional cassettes linking together viral contigs, which revealed a high degree of connectivity between diverse genomes involving multiple genes of the same functional class. In a second step, open reading frames were grouped by their co-occurrence on contigs in a database-independent manner, revealing conserved cassettes of co-oriented ORFs. These methods reveal that free-living bacteriophages, while usually dissimilar at the nucleotide level, often have significant similarity at the level of encoded amino acid motifs, gene order, and gene orientation. These findings thus connect contemporary metagenomic analysis with classical studies of bacteriophage genomic cassettes. Software is available at <a href="https://sourceforge.net/projects/optitdba/">https://sourceforge.net/projects/optitdba/</a>.</p> </div
Variation in the Fitness Effects of Mutations with Population Density and Size in <i>Escherichia coli</i>
<div><p>The fitness effects of mutations are context specific and depend on both external (e.g., environment) and internal (e.g., cellular stress, genetic background) factors. The influence of population size and density on fitness effects are unknown, despite the central role population size plays in the supply and fixation of mutations. We addressed this issue by comparing the fitness of 92 Keio strains (<i>Escherichia coli</i> K12 single gene knockouts) at comparatively high (1.2Γ10<sup>7</sup> CFUs/mL) and low (2.5Γ10<sup>2</sup> CFUs/mL) densities, which also differed in population size (high: 1.2Γ10<sup>8</sup>; low: 1.25Γ10<sup>3</sup>). Twenty-eight gene deletions (30%) exhibited a fitness difference, ranging from 5 to 174% (median: 35%), between the high and low densities. Our analyses suggest this variation among gene deletions in fitness responses reflected in part both gene orientation and function, of the gene properties we examined (genomic position, length, orientation, and function). Although we could not determine the relative effects of population density and size, our results suggest fitness effects of mutations vary with these two factors, and this variation is gene-specific. Besides being a mechanism for density-dependent selection (<i>r</i>-<i>K</i> selection), the dependence of fitness effects on population density and size has implications for any population that varies in size over time, including populations undergoing evolutionary rescue, species invasions into novel habitats, and cancer progression and metastasis. Further, combined with recent advances in understanding the roles of other context-specific factors in the fitness effects of mutations, our results will help address theoretical and applied biological questions more realistically.</p></div
The characteristics of the deleted genes used in this study.
<p>This Circos map was plotted based on the genome of <i>Escherichia coli</i> K12 MG1655, which is the relative of the progenitor of all Keio knockouts, BW25113. The gene labels in bold indicate the eight mutants tested but not included in final fitness analyses due to slow growth during conditioning. Blue bars indicate the genes are on the coding strand and red bars the template strand. The thickness of the bars represents gene length; the numbers on the outer ticks shows the scale of the genome in megabases.</p
Two examples of phage cassettes.
<p>Contigs are shown as horizontal black lines, ORFs on those contigs are shown by black arrows above and below those lines, and the organization of those ORFs into protein-coding families is shown with colored boxes. The subject that each contig was assembled from is shown on the left of each panel. When a protein-coding family was functionally annotated according to its similarity with the CDD, that annotation is listed in the legend. Otherwise a unique identification number is shown (e. g. Family 591). The co-orientation score describes the proportion of gene pairs that, when occurring together on multiple contigs, do so in the same relative orientation.</p
The fitness distribution of the studied Keio strains at in the High and Low treatments.
<p>The fitness distribution of the studied Keio strains at in the High and Low treatments.</p
The de Bruijn graph assembly method and the influence of genomic variation on de Bruijn graph complexity.
<p>A) Shotgun sequences are produced from two different genomes (shown in blue and red at the top). Those sequences are used to construct a de Bruijn graph, where nodes are formed by all possible sequences of length k-1 (in this case 4 bases), which are connected by edges of length k (5 bases). Since there are no 4mers shared between these two example genomes, the resulting de Bruijn subgraphs are separate. B) Nucleotide polymorphisms are better resolved by short kmers. We consider a mixture of four genomes, each with three polymorphic positions separated by 25 bp. The identity at each polymorphic position is represented by either blue or red to indicate different nucleotides. At all other positions the genomes are identical. The de Bruijn graph that is constructed from this mixture of genomes using a kmer of 23 is shown on the left, where three independent bubbles form around each polymorphic position. The equivalent graph at kβ=β27 is shown on the right, where three independent sets of bubbles overlap, forming a more complex and suboptimal graph structure. C) Short regions of similarity are better resolved by long kmers. We consider a mixture of two genomes which are entirely different except for a 25 bp region of sequence identity (shown in black). The de Bruijn graph that is constructed from this mixture at kβ=β23 is shown on the left, where the two resulting subgraphs intersect at the 23mer of similarity. The de Bruijn graph at kβ=β27 is shown on the right, where the two resulting subgraphs (corresponding to the two genomes) do not intersect, since they have no 26mer in common. The examples in B and C together illustrate how different kmers can be optimal for assembling graphs with different types of polymorphisms.</p
The fitness change in the studied Keio strains between in the High and Low treatments.
<p>The changes were either significantly different (filled circles), equivalent with high standard deviation (open triangles), or equivalent with low standard deviation (open circles).</p
The functional Clusters of Orthologous Groups (COGs) of the deleted genes in all 100 Keio strains studied and the proportions of genes showing treatment-dependent fitness difference in each functional group.
a<p>two genes are dual and triple functional.</p
Contigs and reads that form cassettes.
<p>The number of contigs, and the number of reads that align to those contigs, that contain at least 1 ORF, more than 1 ORF, at least 1 ORF family, and/or at least 1 cassette. The percentage of the total number of reads that align to contigs with at least 1 ORF is shown in parentheses.</p