8 research outputs found

    Gene duplication and paleopolyploidy in soybean and the implications for whole genome sequencing

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Soybean, <it>Glycine max </it>(L.) Merr., is a well documented paleopolyploid. What remains relatively under characterized is the level of sequence identity in retained homeologous regions of the genome. Recently, the Department of Energy Joint Genome Institute and United States Department of Agriculture jointly announced the sequencing of the soybean genome. One of the initial concerns is to what extent sequence identity in homeologous regions would have on whole genome shotgun sequence assembly.</p> <p>Results</p> <p>Seventeen BACs representing ~2.03 Mb were sequenced as representative potential homeologous regions from the soybean genome. Genetic mapping of each BAC shows that 11 of the 20 chromosomes are represented. Sequence comparisons between homeologous BACs shows that the soybean genome is a mosaic of retained paleopolyploid regions. Some regions appear to be highly conserved while other regions have diverged significantly. Large-scale "batch" reassembly of all 17 BACs combined showed that even the most homeologous BACs with upwards of 95% sequence identity resolve into their respective homeologous sequences. Potential assembly errors were generated by tandemly duplicated pentatricopeptide repeat containing genes and long simple sequence repeats. Analysis of a whole-genome shotgun assembly of 80,000 randomly chosen JGI-DOE sequence traces reveals some new soybean-specific repeat sequences.</p> <p>Conclusion</p> <p>This analysis investigated both the structure of the paleopolyploid soybean genome and the potential effects retained homeology will have on assembling the whole genome shotgun sequence. Based upon these results, homeologous regions similar to those characterized here will not cause major assembly issues.</p

    Identification and Characterization of Nucleotide-Binding Site-Leucine-Rich Repeat Genes in the Model Plant Medicago truncatula1[W][OA]

    No full text
    The nucleotide-binding site (NBS)-Leucine-rich repeat (LRR) gene family accounts for the largest number of known disease resistance genes, and is one of the largest gene families in plant genomes. We have identified 333 nonredundant NBS-LRRs in the current Medicago truncatula draft genome (Mt1.0), likely representing 400 to 500 NBS-LRRs in the full genome, or roughly 3 times the number present in Arabidopsis (Arabidopsis thaliana). Although many characteristics of the gene family are similar to those described on other plant genomes, several evolutionary features are particularly pronounced in M. truncatula, including a high degree of clustering, evidence of significant numbers of ectopic translocations from clusters to other parts of the genome, a small number of more evolutionarily stable NBS-LRRs, and numerous truncations and fusions leading to novel domain compositions. The gene family clearly has had a large impact on the structure of the genome, both through ectopic translocations (potentially, a means of seeding new NBS-LRR clusters), and through two extraordinarily large superclusters. Chromosome 6 encodes approximately 34% of all TIR-NBS-LRRs, while chromosome 3 encodes approximately 40% of all coiled-coil-NBS-LRRs. Almost all atypical domain combinations are in the TIR-NBS-LRR subfamily, with many occurring within one genomic cluster. This analysis shows the gene family not only is important functionally and agronomically, but also plays a structural role in the genome

    Gene duplication and paleopolyploidy in soybean and the implications for whole genome sequencing-1

    No full text
    <p><b>Copyright information:</b></p><p>Taken from "Gene duplication and paleopolyploidy in soybean and the implications for whole genome sequencing"</p><p>http://www.biomedcentral.com/1471-2164/8/330</p><p>BMC Genomics 2007;8():330-330.</p><p>Published online 19 Sep 2007</p><p>PMCID:PMC2077340.</p><p></p>Consed. Grey boxes represent the assembled contigs and are scaled in base pairs across each contig. Contig numbers are shown in pink boxes and are arbitrarily assigned by Phred/Phrap during sequence assembly. The blue and green boxes above each assembly show the predicted gene positions for gmw1-15k6 and gmw1-105h23, respectively. The green line-plot above each contig shows the average clone pair consistency. Sequence matches within and between contigs were determined with Cross-Match as part of Consed. Black lines within and between contigs show sequence matches that are in reverse orientation, while the orange lines show sequence matches in the same orientation. The bars between sequence matches correspond to the length of the match. Purple peak-shaped lines between contigs show clone pairs that span a gap. Below each contig is a purple line containing either blue (gmw1-15k6) or green (gmw1-105h23) tick marks; these are the tags that distinguish between traces from each BAC

    Gene duplication and paleopolyploidy in soybean and the implications for whole genome sequencing-3

    No full text
    <p><b>Copyright information:</b></p><p>Taken from "Gene duplication and paleopolyploidy in soybean and the implications for whole genome sequencing"</p><p>http://www.biomedcentral.com/1471-2164/8/330</p><p>BMC Genomics 2007;8():330-330.</p><p>Published online 19 Sep 2007</p><p>PMCID:PMC2077340.</p><p></p>me shotgun trace files. BAC corresponds to any contig that showed greatest identity to already assembled soybean BAC sequence. Mdh refers to a previously sequenced region of soybean containing repetitive sequence. No hit means that there was no blast-based match to the nonredundant database. Other was a best match to a sequence (BAC or genomic) from another organism that was not characterized. Satellite refers to known Sb92 or Str120 centromeric repeat sequences. The rest of the categories are as described in the figure legend

    Gene duplication and paleopolyploidy in soybean and the implications for whole genome sequencing-2

    No full text
    <p><b>Copyright information:</b></p><p>Taken from "Gene duplication and paleopolyploidy in soybean and the implications for whole genome sequencing"</p><p>http://www.biomedcentral.com/1471-2164/8/330</p><p>BMC Genomics 2007;8():330-330.</p><p>Published online 19 Sep 2007</p><p>PMCID:PMC2077340.</p><p></p>e11. Predicted gene structures are shown as green boxes and arrows, with the boxes representing exons and lines being introns. Black tick marks on a gene show the start position of a repeated PPR domains within the gene. The blue boxes show the repetitive sequences identified by Vmatch. Orange gene alignments reflect the realignment of predicted gene structures back to the genomics sequence

    Differential Accumulation of Retroelements and Diversification of NB-LRR Disease Resistance Genes in Duplicated Regions following Polyploidy in the Ancestor of Soybean1[W][OA]

    No full text
    The genomes of most, if not all, flowering plants have undergone whole genome duplication events during their evolution. The impact of such polyploidy events is poorly understood, as is the fate of most duplicated genes. We sequenced an approximately 1 million-bp region in soybean (Glycine max) centered on the Rpg1-b disease resistance gene and compared this region with a region duplicated 10 to 14 million years ago. These two regions were also compared with homologous regions in several related legume species (a second soybean genotype, Glycine tomentella, Phaseolus vulgaris, and Medicago truncatula), which enabled us to determine how each of the duplicated regions (homoeologues) in soybean has changed following polyploidy. The biggest change was in retroelement content, with homoeologue 2 having expanded to 3-fold the size of homoeologue 1. Despite this accumulation of retroelements, over 77% of the duplicated low-copy genes have been retained in the same order and appear to be functional. This finding contrasts with recent analyses of the maize (Zea mays) genome, in which only about one-third of duplicated genes appear to have been retained over a similar time period. Fluorescent in situ hybridization revealed that the homoeologue 2 region is located very near a centromere. Thus, pericentromeric localization, per se, does not result in a high rate of gene inactivation, despite greatly accelerated retrotransposon accumulation. In contrast to low-copy genes, nucleotide-binding-leucine-rich repeat disease resistance gene clusters have undergone dramatic species/homoeologue-specific duplications and losses, with some evidence for partitioning of subfamilies between homoeologues
    corecore