14 research outputs found

    Additional file 2: Figure S2. of cis-regulatory analysis of the Drosophila pdm locus reveals a diversity of neural enhancers

    No full text
    The pdm2-26 enhancer contains ultraconserved sequences detected in multiple Diptera. A) Shown is a 12-drosophilid EvoPrint of the pdm2-26 enhancer sequence together with colored highlights that indicate conserved D. melanogaster sequences shared with both the housefly and medfly (Musca domestica and Ceratitis capitata, respectively; orange) or with the medfly only (blue). Black capital letters represent D. melanogaster bases conserved in D. simulans, D. sechellia, D. yakuba, D. erecta, D. ananassae, D. persimilis, D. pseudoobscura, D. virilis, and D. mojavensis. B) and C) Raw BLAST results of the pdm2-26 conserved elements aligned to the housefly (Musca domesticus) and medfly (Ceratitis capitata) genomes, respectively. The colored underlines correspond to the colored highlighted conserved sequences in panel A). (TIFF 7785 kb

    Additional file 1: Figure S1. of cis-regulatory analysis of the Drosophila pdm locus reveals a diversity of neural enhancers

    No full text
    Three-way alignment of ultraconserved sequences in conserved sequence clusters identified in Drosophila, housefly, and medfly. Shown are conserved sequences shared in Drosophila conserved sequence clusters (nub-35, nub-37, pdm2-6, pdm2-12, and pdm2-21) and detected in the housefly (Musca domestica) and medfly (Ceratitis capitata). Vertical lines indicate agreement among all three Diptera. (TIFF 6544 kb

    Flavivirus and Filovirus EvoPrinters: New alignment tools for the comparative analysis of viral evolution

    No full text
    <div><p>Background</p><p>Flavivirus and Filovirus infections are serious epidemic threats to human populations. Multi-genome comparative analysis of these evolving pathogens affords a view of their essential, conserved sequence elements as well as progressive evolutionary changes. While phylogenetic analysis has yielded important insights, the growing number of available genomic sequences makes comparisons between hundreds of viral strains challenging. We report here a new approach for the comparative analysis of these hemorrhagic fever viruses that can superimpose an unlimited number of one-on-one alignments to identify important features within genomes of interest.</p><p>Methodology/Principal finding</p><p>We have adapted EvoPrinter alignment algorithms for the rapid comparative analysis of Flavivirus or Filovirus sequences including Zika and Ebola strains. The user can input a full genome or partial viral sequence and then view either individual comparisons or generate color-coded readouts that superimpose hundreds of one-on-one alignments to identify unique or shared identity SNPs that reveal ancestral relationships between strains. The user can also opt to select a database genome in order to access a library of pre-aligned genomes of either 1,094 Flaviviruses or 460 Filoviruses for rapid comparative analysis with all database entries or a select subset. Using EvoPrinter search and alignment programs, we show the following: 1) superimposing alignment data from many related strains identifies lineage identity SNPs, which enable the assessment of sublineage complexity within viral outbreaks; 2) whole-genome SNP profile screens uncover novel Dengue2 and Zika recombinant strains and their parental lineages; 3) differential SNP profiling identifies host cell A-to-I hyper-editing within Ebola and Marburg viruses, and 4) hundreds of superimposed one-on-one Ebola genome alignments highlight ultra-conserved regulatory sequences, invariant amino acid codons and evolutionarily variable protein-encoding domains within a single genome.</p><p>Conclusions/Significance</p><p>EvoPrinter allows for the assessment of lineage complexity within Flavivirus or Filovirus outbreaks, identification of recombinant strains, highlights sequences that have undergone host cell A-to-I editing, and identifies unique input and database SNPs within highly conserved sequences. EvoPrinter’s ability to superimpose alignment data from hundreds of strains onto a single genome has allowed us to identify unique Zika virus sublineages that are currently spreading in South, Central and North America, the Caribbean, and in China. This new set of integrated alignment programs should serve as a useful addition to existing tools for the comparative analysis of these viruses.</p></div

    A multi-genome Ebola virus EvoPrint reveals conserved gene regulatory elements, essential amino acid codons and sequence variability within the glycoprotein surface domain ORF.

    No full text
    <p>The Zaire_lin6_Kissidougou_GIN_C15_KJ660346.2_2014 genome was EvoPrinted with 271 non-redundant Ebola virus genomes including 269 Zaire isolates and the TaïForest_lin1_Cote_dIvoire_FJ217162.1_1994 and Bundibugyo_lin1_Uga_FJ217161.1_2008 strains (EvoPrint database Zaire strains used to generate the print are available upon request). The EvoPrint highlights sequences within the input that are shared by all database genomes included in the analysis (bold black) and those bases that are different in one or more of the aligning genomes (gray). Shown are 5,475 bases of the full genome <i>EvoPrint</i>, starting in the 3’UTR of the NP gene and ending within the 5’UTR of VP30 and covering the VP35, VP40 and GP genes. Blue vertical bars indicate protein encoding ORFs. To highlight conserved codons and their variable wobble positions, the readout uses 75 bases per line. Conserved transcription start and stop sites are noted with blue and red underlining, respectively. The EvoPrint <i>also</i> identified a third conserved repeat element positioned 3’ of the transcription start signals (yellow underlined). Secondary structure predictions indicate that the sequence may form a stem-loop structure by base pairing to its reverse complement sequence within the transcription start signal (indicated by yellow over-lines) (reviewed in [<a href="http://www.plosntds.org/article/info:doi/10.1371/journal.pntd.0005673#pntd.0005673.ref047" target="_blank">47</a>]). The conserved GP mRNA translational editing sequence is underlined green and its mucin-like domain coding sequence is boxed in red. ORF translational start and stop codons are boxed green and red, respectively. While the initiation ATG methionine codon for the VP40 and GP genes are conserved in all genomes, expanding the readout line that contains the translation start for VP35 reveals that both the Bundibugyo and Taï Forest species differ from the Zaire strains by base substitutions that generate start codons flanking the 5’ end of the Zaire ORF start (the positions of both are indicated by the elongated green box). Expanding sequence lines that contain termination codons reveals that while their positions are conserved, the three species use different stop codon combinations: for VP35, Zaire strains have TGA, Bundibugyo has TAA, and Taï Forest has TAG; for VP40, Zaire strains have TAA, while both Bundibugyo and Taï Forest have TGA; for the GP gene, Zaire strains have TAG; Bundibugyo has TAA; and Taï Forest has TGA. Underlined sequence line numbers indicate that sequence gaps were inserted in one or more of the genomes to optimize alignments: these can be viewed by clicking on the line number and then selecting the underlined genomes to view one-on-one alignments with the input sequence.</p

    ZIKV evolutionary divergence inferred from shared base substitutions at multiple ancestral nucleotide positions.

    No full text
    <p>An EvoDifference print of the Zika_KX832731.1_Florida_2016 strain with 71 ZIKV strains from 24 different countries identifies nucleotide positions that differ only in a subset of isolates, while the other strains included in the analysis have maintained an ancestral base at these positions. Shown are 11 of 19 ancestral base positions that have undergone identical substitutions in subsets of genomes. For example, at position 3508 in the reference genome, there was an A->G substitution within all members of the Brazil Br1 sublineage and isolates from Ecuador, Guadeloupe, Dominican Republic and Florida. The readout also revealed other positions with fewer and fewer strains with the same identity SNP and base changes that are unique to the KX832731.1_Florida_2016 strain that are not found in other Florida strains nor in any of the other Asian lineage strains. Genomic position designations within the input reference sequences are shown above the horizontal KX832731.1_Florida_2016 sequences. Strains are grouped according their countries of isolation and the different Brazil sublineages (Br1-4) are grouped with vertical bars. Only database sequences that differ from the reference sequence are shown.</p

    Filovirus A-to-I host cell hyper-editing detected with EvoDifference prints.

    No full text
    <p>One-on-one alignments that highlight clusters of T/U->C base changes within Marburg and Ebola (Zaire and Bundibugyo) genomes. (<b>A</b>) The Marburg_lin2_Popp_Cercopithecus _Human_Z29337.1_1967 reference sequence from bases 2,282 to 2,838 of the NP gene 3’ UTR and flanking sequence is aligned to the orthologous region of Marburg_lin2_LakeVictoria _GQ433353.1_2011. Note that all of the 40 base differences are T/U->C transformations, indicate that the Lake Victoria genome was most likely modified by host-cell RNA adenosine deaminases. (<b>B</b>) The Ebola Zaire_lin6_Kissidougou_GIN_C15_KJ660346_2014 VP40 3’UTR reference sequence, from bases 2742 to 2894 aligned with orthologous region from three different Zaire/Makona strains (listed in panel B). Given that the three lin6 strains have the same T/U->C base changes, host cell editing most likely occurred in an earlier member of this lineage. (<b>C</b>) Host cell A-to-I editing may contribute to the antigenic diversity of Filovirus spike proteins. Shown is the glycoprotein encoding mucin-like domain ORF of the Bundibugyo_lin2_DRC _112_KC545393.1_2012 isolate (bases 7,375 to 7,516) aligned to the orthologous region of the Bundibugyo_lin1_Uga_FJ217161.1_2008 genome. Note that four of the 13 T/U->C transitions result in amino acid changes (shown below the base substitutions). Color-coding is as described in <a href="http://www.plosntds.org/article/info:doi/10.1371/journal.pntd.0005673#pntd.0005673.g001" target="_blank">Fig 1</a>.</p

    Putative recombination events within multiple African Zika virus strains identified by one-on-one EvoDifference prints.

    No full text
    <p>Shown are six pairwise polyprotein ORF alignments between four different African strains. Starting with the first codon, each alignment covers 9,975 bases (3,325 codons). Gray bases represent alignment identity and red highlighted bases identify sequence differences. The input reference sequence is listed first followed by the aligning database genome: (<b>A</b>) Zika_KF383119.1_Senegal_2001 aligned with Zika_LC002520.1_Uganda_S.M._1947; (<b>B</b>) Zika_KF383119.1_Senegal_2001 aligned with Zika_KF383118.1_Senegal_2001; (<b>C</b>) Zika_KF383118.1_Senegal_2001 aligned with Zika_LC002520.1_Uganda_S.M._1947; (<b>D</b>) Zika_KF383119.1_Senegal_2001 aligned with Zika_KF383116.1_Senegal, 1968; (<b>E</b>) Zika_KF383119.1_Senegal_2001 aligned with Zika_KF383120.1_Senegal_2000; (<b>F</b>) Zika_KF383116.1_Senegal_1968 aligned with Zika_KF383120.1_Senegal_2000. Vertical black bars to the right of each panel highlight regions with significant changes in SNP density indicating putative recombinant exchanges. Flanking vertical color bars indicate ORF positions of the encoded proteins (Capsid, green bases 1–366; Pre-Membrane, yellow 367–900; Envelope, dark blue 901–2400; NS1, gray 2401–3426; NS2A, brown 3427–4494; NS2B, green 4494–5885; NS3, red 5886–6345; NS4A, light blue 6345–6726; 2K, black 6727–6795; NS4B, tan 6796–7501; NS5, orange 7502–9975).</p

    SNP patterns resolve different China and Brazil ZIKV sublineages that share identity SNPs with isolates from different countries.

    No full text
    <p>Shown are 10 identity SNPs within the Zika_KX447510.1_FrenchPolynesia_2014 genome that were identified from an EvoDifference print of 39 strains from China, Brazil, Ecuador, Florida, Dominican Republic, Puerto Rico, Suriname and French Guiana. The genomic positions of the identity SNPs are indicated above the French Polynesian reference sequences (horizontal sequence line). Gray-colored bases indicate that all genomes agree with the reference sequence and only database genome sequences that differ from the input sequence are shown and are color-coded to match the font color of the database genome name. Vertical bars highlight different China (Ch1 and Ch2) and Brazil (Br1-3) sublineages. Database genomes are grouped according to their shared SNPs. Note, the KX66028_Dominican Republic_2016 strain differs from members of the Chinese Ch2 sublineage by only 14 to 17 bases.</p

    EvoDifference prints identify a recombinant exchange between New Guinea and Puerto Rico Dengue2 viral strains.

    No full text
    <p>Differential SNP patterns reveal that the Dengue2_GQ398269.1_PuertoRico_1994 isolate is a recombinant made up of genomic fragments from different parental sublineages. Starting with their 5’ ends, each alignment covers 10,724 bases. Gray-colored bases indicate sequence identity and red highlighted sequences identify base differences. The input reference sequence is listed first followed by aligning database genomes. (<b>A</b>) Alignment of Dengue2_KF955363_PuertoRico_1986 (major parental lineage) and Dengue2_PuertoRico GQ398269.1_1994 (recombinant); (<b>B</b>) Dengue2_AF038403.1_NewGuinea_1988 (minor parental lineage) and GQ398269.1_PuertoRico (recombinant); (<b>C</b>) Pairwise alignment of the major and minor parental lineage members KF955363_PuertoRico and AF038403.1_NewGuinea isolates, respectfully. Horizontal lines serve as approximate guides for recombination boundaries. Flanking vertical color bars indicate the ORF encoding positions of the poly-protein.</p

    Zika virus EvoDifference prints highlight conserved bases and sequence polymorphisms within Asian and African lineages.

    No full text
    <p>(<b>A</b>) An EvoDifference printout of the first 2,400 bases of the Zika_KU321639.1_Brazil_2015 polyprotein ORF that spans the Capsid, Pre-Membrane and Envelope encoding regions was generated with 22 Asian/Western Hemisphere and seven African isolates (listed in panel B). Pair-wise alignments between the input sequence (KU321639.1_Brazil) and the database genomes are superimposed to identify: 1) bases identical in all examined genomes (gray); 2) bases that differed in only one of the genomes (colored coded to match the font color of that genome name listed in panel B); 3) bases that differ in two or more database genomes (black); and 4) bases that are unique to the input sequence (red highlighted, black). Line numbers indicate the last base of each line. Seventy-five bases per line were selected to vertically stack codons to highlight the frequent codon wobble position differences for essential amino acids. The boxed sequence (bases 826 to 1,125) is shown in panel B. (<b>B</b>) To reveal alignment details, sequence line number 975 was expanded by clicking on the number. Database genomes are ordered by their total number of base differences from the input sequence (least to greatest). Base differences are shown for each pair-wise alignment. Note that the more evolutionary divergent African isolates (positioned below the horizontal line) have the highest number of SNP differences with the Brazilian reference sequence. Individual one-on-one alignments of the input reference sequence with database genomes can be accessed by double-clicking on the genome name of interest.</p
    corecore