38 research outputs found

    Bias in third position.

    No full text
    <p>The bias in third codon position is visualized for each of the 6 complete genomes. The bias was defined as −1 in the case of 100% A or T in third position, +1 is the case of 100% G or C.</p

    BLAST matrix.

    No full text
    <p>An all against all protein comparison was performed using BLAST to define homologs. A BLAST hit is considered significant if 50% of the alignment consists of identical matches and the length of the alignment is 50% of the longest gene. Internal homology (paralogs) is defined as proteins within a genome matching the same 50–50 requirement as for between-proteome comparisons. Self-matches are here ignored. A comparison of 31 <i>Negativicutes</i> genomes was performed on the CMG-biotools system (9 hours). A high resolution figure can be found as supplemental <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0060120#pone.0060120.s002" target="_blank">Figure S2</a>.</p

    Genome information.

    No full text
    <p>Table listing the genomes used in the analysis. Data was downloaded from NCBI GenBank database. Abbreviations: <i>Tax</i>: NCBI taxonomy id number, <i>Organism</i>: Name of organism, <i>INSDC</i>: NCBI GenBank Accession number, <i>WGS</i>: NCBI Whole Genome Sequence Project number, <i>Status</i>: status of sequencing project. The WGS number can be used for downloading whole genome sequencing projects by removing the last two numbers and adding 6 zeros (ACGB01 is downloaded using the number ACGB000000).</p

    Genome statistics.

    No full text
    <p>Basic genome statistics for genome DNA sequences. Values of zero are marked by “−”. Abbreviations: <i>Organism</i>: Name of organism. <i>Status</i>: sequencing status of published project. <i>bp</i>: total number of base pairs in all DNA. <i>AT</i>: Percent of AT in DNA. <i>Std. AT</i>: Standard deviation in AT across DNA fragments. <i>Contig</i>: number of DNA fragments corresponding to replicons or contigs. <i>Unknown</i>: percentage of unknown bases (not A, T, C or G). <i>Largest</i>: size of largest contig as a percentage of total length. <i>N50</i>: weighted median statistic such that 50% of the entire assembly is contained in contigs or scaffolds equal to or larger than this value.</p

    16S rRNA tree.

    No full text
    <p>Each genome sequence was searched for 16S rRNA patterns and candidate sequences were extracted. The best sequence from each genome was selected. For two genomes, no sequences were found, <i>Centipeda periodontii</i> DSM 2778, <i>Megamonas hypermegale</i> ART12 1. For 6 additional genomes, the located sequences were shorter than the default acceptable length. The short sequences sequences are marked with a “*”. Length criteria was changed from minimum 1 400 to 1 100 and maximum 1 800 unchanged. The distance tree was made with 1 000 bootstraps.</p

    Genome atlases, DNA structures.

    No full text
    <p>A DNA structural atlas was generated for each of the 6 complete genomes. DNA, RNA and gene annotations are from the published GenBank data. Each lane of the circular atlas shows a different DNA feature. From the innermost circle: size of genome (axis), percent AT (red = high AT), GC skew (blue = most G’s), inverted and direct repeats (color = repeat), position preference, stacking energy and intrinsic curvature. Orange arrows indicate changes in the skew of G and C, which frequently indicate origin and terminus of replication. Blue arrows show the location of rRNA operons, as annotated in the GenBank file. Dark red arrows highlight areas of the genome that show significantly different DNA structures than the rest of the genome. A higher resolution pdf is available as a supplemental figure. A high resolution figure can be found as supplemental <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0060120#pone.0060120.s001" target="_blank">Figure S1</a>.</p

    Genefinding and published genes.

    No full text
    <p>Table listing genome name, number of published proteins (<i>GenBank</i>) and number of proteins found using Prodigal for genefinding (<i>Prodigal</i>). The column labeled <i>“ID”</i> refers to the INSDC or WGS id number as described in <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0060120#pone-0060120-t001" target="_blank">Table 1</a>.</p

    Ribosomal RNA analysis using RNAmmer.

    No full text
    <p>The total number of identified 16S rRNA sequences is shown for each genome sequence. Length of highest scoring sequence and corresponding RNAmmer score is given. Default settings is to select the sequence with the highest RNAmmer score and a length between 1 400–1 800 bases. For this analysis the criteria were changed to a length range of 1 100–1 800, to include sequences from all genomes with 16S rRNA matches. Sequences with lengths shorter than the default acceptance threshold are marked with a “*”. Two organisms did not have any hits to the RNAmmer models, values of zero are marked by “−”.</p

    Additional file 1: of NucDiff: in-depth characterization and annotation of differences between two sets of DNA sequences

    No full text
    Figure S1. Reference fragments placement order depending on query fragment orientations during detection of local differences. Figure S2. Circular genome alignment alternatives. Figure S3. Number of differences in each category obtained by NucDiff with the default parameter settings for all assemblers. Figure S4. Comparison of multiple assemblies against one reference using NucDiff. Figure S5. Examples of detection of long deletions located in all assemblies at the same place in the reference sequence. Table S1. Alignment fragmentation cases caused by simple differences. Table S2. Genome modifications implemented during the simulation process. Table S3. List of E. coli genomes used in the Comparison of genomes from different strains of the same species section. Table S4. Parameter values used for each parameter settings. Table S5. Correspondence between the QUAST difference types and the simulated difference types. Table S6. Correspondence between the QUAST, dnadiff and NucDiff difference types and the expected difference types. (PDF 989 kb
    corecore