3 research outputs found
Methylation-sensitive linking libraries enhance gene-enriched sequencing of complex genomes and map DNA methylation domains
<p>Abstract</p> <p>Background</p> <p>Many plant genomes are resistant to whole-genome assembly due to an abundance of repetitive sequence, leading to the development of gene-rich sequencing techniques. Two such techniques are hypomethylated partial restriction (HMPR) and methylation spanning linker libraries (MSLL). These libraries differ from other gene-rich datasets in having larger insert sizes, and the MSLL clones are designed to provide reads localized to "epigenetic boundaries" where methylation begins or ends.</p> <p>Results</p> <p>A large-scale study in maize generated 40,299 HMPR sequences and 80,723 MSLL sequences, including MSLL clones exceeding 100 kb. The paired end reads of MSLL and HMPR clones were shown to be effective in linking existing gene-rich sequences into scaffolds. In addition, it was shown that the MSLL clones can be used for anchoring these scaffolds to a BAC-based physical map. The MSLL end reads effectively identified epigenetic boundaries, as indicated by their preferential alignment to regions upstream and downstream from annotated genes. The ability to precisely map long stretches of fully methylated DNA sequence is a unique outcome of MSLL analysis, and was also shown to provide evidence for errors in gene identification. MSLL clones were observed to be significantly more repeat-rich in their interiors than in their end reads, confirming the correlation between methylation and retroelement content. Both MSLL and HMPR reads were found to be substantially gene-enriched, with the <it>Sal</it>I MSLL libraries being the most highly enriched (31% align to an EST contig), while the HMPR clones exhibited exceptional depletion of repetitive DNA (to ~11%). These two techniques were compared with other gene-enrichment methods, and shown to be complementary.</p> <p>Conclusion</p> <p>MSLL technology provides an unparalleled approach for mapping the epigenetic status of repetitive blocks and for identifying sequences mis-identified as genes. Although the types and natures of epigenetic boundaries are barely understood at this time, MSLL technology flags both approximate boundaries and methylated genes that deserve additional investigation. MSLL and HMPR sequences provide a valuable resource for maize genome annotation, and are a uniquely valuable complement to any plant genome sequencing project. In order to make these results fully accessible to the community, a web display was developed that shows the alignment of MSLL, HMPR, and other gene-rich sequences to the BACs; this display is continually updated with the latest ESTs and BAC sequences.</p
Advancing Eucalyptus genomics: identification and sequencing of lignin biosynthesis genes from deep-coverage BAC libraries
<p>Abstract</p> <p>Background</p> <p><it>Eucalyptus </it>species are among the most planted hardwoods in the world because of their rapid growth, adaptability and valuable wood properties. The development and integration of genomic resources into breeding practice will be increasingly important in the decades to come. Bacterial artificial chromosome (BAC) libraries are key genomic tools that enable positional cloning of important traits, synteny evaluation, and the development of genome framework physical maps for genetic linkage and genome sequencing.</p> <p>Results</p> <p>We describe the construction and characterization of two deep-coverage BAC libraries EG_Ba and EG_Bb obtained from nuclear DNA fragments of <it>E. grandis </it>(clone BRASUZ1) digested with <it>Hind</it>III and <it>BstY</it>I, respectively. Genome coverages of 17 and 15 haploid genome equivalents were estimated for EG_Ba and EG_Bb, respectively. Both libraries contained large inserts, with average sizes ranging from 135 Kb (Eg_Bb) to 157 Kb (Eg_Ba), very low extra-nuclear genome contamination providing a probability of finding a single copy gene ≥ 99.99%. Libraries were screened for the presence of several genes of interest <it>via </it>hybridizations to high-density BAC filters followed by PCR validation. Five selected BAC clones were sequenced and assembled using the Roche GS FLX technology providing the whole sequence of the <it>E. grandis </it>chloroplast genome, and complete genomic sequences of important lignin biosynthesis genes.</p> <p>Conclusions</p> <p>The two <it>E. grandis </it>BAC libraries described in this study represent an important milestone for the advancement of <it>Eucalyptus </it>genomics and forest tree research. These BAC resources have a highly redundant genome coverage (> 15×), contain large average inserts and have a very low percentage of clones with organellar DNA or empty vectors. These publicly available BAC libraries are thus suitable for a broad range of applications in genetic and genomic research in <it>Eucalyptus </it>and possibly in related species of <it>Myrtaceae</it>, including genome sequencing, gene isolation, functional and comparative genomics. Because they have been constructed using the same tree (<it>E. grandis </it>BRASUZ1) whose full genome is being sequenced, they should prove instrumental for assembly and gap filling of the upcoming <it>Eucalyptus </it>reference genome sequence.</p