14 research outputs found

    BioNano genome mapping of individual chromosomes supports physical mapping and sequence assembly in complex plant genomes

    Get PDF
    The assembly of a reference genome sequence of bread wheat is challenging due to its specific features such as the genome size of 17 Gbp, polyploid nature and prevalence of repetitive sequences. BAC-by-BAC sequencing based on chromosomal physical maps, adopted by the International Wheat Genome Sequencing Consortium as the key strategy, reduces problems caused by the genome complexity and polyploidy, but the repeat content still hampers the sequence assembly. Availability of a high-resolution genomic map to guide sequence scaffolding and validate physical map and sequence assemblies would be highly beneficial to obtaining an accurate and complete genome sequence. Here, we chose the short arm of chromosome 7D (7DS) as a model to demonstrate for the first time that it is possible to couple chromosome flow sorting with genome mapping in nanochannel arrays and create a de novo genome map of a wheat chromosome. We constructed a high-resolution chromosome map composed of 371 contigs with an N50 of 1.3 Mb. Long DNA molecules achieved by our approach facilitated chromosome-scale analysis of repetitive sequences and revealed a ~800-kb array of tandem repeats intractable to current DNA sequencing technologies. Anchoring 7DS sequence assemblies obtained by clone-by-clone sequencing to the 7DS genome map provided a valuable tool to improve the BAC-contig physical map and validate sequence assembly on a chromosome-arm scale. Our results indicate that creating genome maps for the whole wheat genome in a chromosome-by-chromosome manner is feasible and that they will be an affordable tool to support the production of improved pseudomolecules

    Integrated physical map of bread wheat chromosome arm 7DS to facilitate gene cloning and comparative studies

    Get PDF
    Bread wheat (Triticum aestivum L.) is a staple food for a significant part of the world’s population. The growing demand on its production can be satisfied by improving yield and resistance to biotic and abiotic stress. Knowledge of the genome sequence would aid in discovering genes and QTLs underlying these traits and provide a basis for genomics-assisted breeding. Physical maps and BAC clones associated with them have been valuable resources from which to generate a reference genome of bread wheat and to assist map-based gene cloning. As a part of a joint effort coordinated by the International Wheat Genome Sequencing Consortium, we have con- structed a BAC-based physical map of bread wheat chromosome arm 7DS consisting of 895 contigs and covering 94% of its estimated length. By anchoring BAC contigs to one radiation hybrid map and three high resolution genetic maps, we assigned 73% of the assembly to a distinct genomic position. This map integration, inter- connecting a total of 1713 markers with ordered and sequenced BAC clones from a minimal tiling path, provides a tool to speed up gene cloning in wheat. The process of physical map assembly included the integration of the 7DS physical map with a whole-genome physical map of Aegilops tauschii and a 7DS Bionano genome map, which together enabled efficient scaffolding of physical-map contigs, even in the non-recombining region of the genetic centromere. Moreover, this approach facilitated a comparison of bread wheat and its ancestor at BAC-contig level and revealed a reconstructed region in the 7DS pericentromere

    Shifting the limits in wheat research and breeding using a fully annotated reference genome

    Get PDF
    Introduction: Wheat (Triticum aestivum L.) is the most widely cultivated crop on Earth, contributing about a fifth of the total calories consumed by humans. Consequently, wheat yields and production affect the global economy, and failed harvests can lead to social unrest. Breeders continuously strive to develop improved varieties by fine-tuning genetically complex yield and end-use quality parameters while maintaining stable yields and adapting the crop to regionally specific biotic and abiotic stresses. Rationale: Breeding efforts are limited by insufficient knowledge and understanding of wheat biology and the molecular basis of central agronomic traits. To meet the demands of human population growth, there is an urgent need for wheat research and breeding to accelerate genetic gain as well as to increase and protect wheat yield and quality traits. In other plant and animal species, access to a fully annotated and ordered genome sequence, including regulatory sequences and genome-diversity information, has promoted the development of systematic and more time-efficient approaches for the selection and understanding of important traits. Wheat has lagged behind, primarily owing to the challenges of assembling a genome that is more than five times as large as the human genome, polyploid, and complex, containing more than 85% repetitive DNA. To provide a foundation for improvement through molecular breeding, in 2005, the International Wheat Genome Sequencing Consortium set out to deliver a high-quality annotated reference genome sequence of bread wheat. Results: An annotated reference sequence representing the hexaploid bread wheat genome in the form of 21 chromosome-like sequence assemblies has now been delivered, giving access to 107,891 high-confidence genes, including their genomic context of regulatory sequences. This assembly enabled the discovery of tissue- and developmental stage–related gene coexpression networks using a transcriptome atlas representing all stages of wheat development. The dynamics of change in complex gene families involved in environmental adaptation and end-use quality were revealed at subgenome resolution and contextualized to known agronomic single-gene or quantitative trait loci. Aspects of the future value of the annotated assembly for molecular breeding and research were exemplarily illustrated by resolving the genetic basis of a quantitative trait locus conferring resistance to abiotic stress and insect damage as well as by serving as the basis for genome editing of the flowering-time trait. Conclusion: This annotated reference sequence of wheat is a resource that can now drive disruptive innovation in wheat improvement, as this community resource establishes the foundation for accelerating wheat research and application through improved understanding of wheat biology and genomics-assisted breeding. Importantly, the bioinformatics capacity developed for model-organism genomes will facilitate a better understanding of the wheat genome as a result of the high-quality chromosome-based genome assembly. By necessity, breeders work with the genome at the whole chromosome level, as each new cross involves the modification of genome-wide gene networks that control the expression of complex traits such as yield. With the annotated and ordered reference genome sequence in place, researchers and breeders can now easily access sequence-level information to precisely define the necessary changes in the genomes for breeding programs. This will be realized through the implementation of new DNA marker platforms and targeted breeding technologies, including genome editing

    The Dark Matter of Large Cereal Genomes: Long Tandem Repeats

    No full text
    Reference genomes of important cereals, including barley, emmer wheat and bread wheat, were released recently. Their comparison with genome size estimates obtained by flow cytometry indicated that the assemblies represent not more than 88−98% of the complete genome. This work is aimed at identifying the missing parts in two cereal genomes and proposing techniques to make the assemblies more complete. We focused on tandemly organised repetitive sequences, known to be underrepresented in genome assemblies generated from short-read sequence data. Our study found arrays of three tandem repeats with unit sizes of 1242 to 2726 bp present in the bread wheat reference genome generated from short reads. However, this and another wheat genome assembly employing long PacBio reads failed in integrating correctly the 2726-bp repeat in the pseudomolecule context. This suggests that tandem repeats of this size, frequently incorporated in unassigned scaffolds, may contribute to shrinking of pseudomolecules without reducing size of the entire assembly. We demonstrate how this missing information may be added to the pseudomolecules with the aid of nanopore sequencing of individual BAC clones and optical mapping. Using the latter technique, we identified and localised a 470-kb long array of 45S ribosomal DNA absent from the reference genome of barley

    Prospects of telomere-to-telomere assembly in barley: analysis of sequence gaps in the MorexV3 reference genome

    No full text
    The first gapless, telomere-to-telomere (T2T) sequence assemblies of plant chromosomes were reported recently. However, sequence assemblies of most plant genomes remain fragmented. Only recent breakthroughs in accurate long-read sequencing have made it possible to achieve highly contiguous sequence assemblies with a few tens of contigs per chromosome, i.e. a number small enough to allow for a systematic inquiry into the causes of the remaining sequence gaps and the approaches and resources needed to close them. Here, we analyze sequence gaps in the current reference genome sequence of barley cv. Morex (MorexV3). Optical map and sequence raw data, complemented by ChIP-seq data for centromeric histone variant CENH3, were used to estimate the abundance of centromeric, ribosomal DNA and subtelomeric repeats in the barley genome. These estimates were compared with copy numbers in the MorexV3 pseudomolecule sequence. We found that almost all centromeric sequences and 45S ribosomal DNA repeat arrays were absent from the MorexV3 pseudomolecules and that the majority of sequence gaps can be attributed to assembly breakdown in long stretches of satellite repeats. However, missing sequences cannot fully account for the difference between assembly size and flow cytometric genome size estimates. We discuss the prospects of gap closure with ultra-long sequence reads

    Long-read sequence assembly: a technical evaluation in barley

    Get PDF
    Sequence assembly of large and repeat-rich plant genomes has been challenging, requiring substantial computational resources and often several complementary sequence assembly and genome mapping approaches. The recent development of fast and accurate long-read sequencing by circular consensus sequencing (CCS) on the PacBio platform may greatly increase the scope of plant pan-genome projects. Here, we compare current long-read sequencing platforms regarding their ability to rapidly generate contiguous sequence assemblies in pan-genome studies of barley (Hordeum vulgare). Most long-read assemblies are clearly superior to the current barley reference sequence based on short-reads. Assemblies derived from accurate long reads excel in most metrics, but the CCS approach was the most cost-effective strategy for assembling tens of barley genomes. A downsampling analysis indicated that 20-fold CCS coverage can yield very good sequence assemblies, while even five-fold CCS data may capture the complete sequence of most genes. We present an updated reference genome assembly for barley with near-complete representation of the repeat-rich intergenic space. Long-read assembly can underpin the construction of accurate and complete sequences of multiple genomes of a species to build pan-genome infrastructures in Triticeae crops and their wild relatives
    corecore