11 research outputs found

    Genome Resources for Climate‐Resilient Cowpea, an Essential Crop for Food Security

    Get PDF
    Cowpea (Vigna unguiculata L. Walp.) is a legume crop that is resilient to hot and drought‐prone climates, and a primary source of protein in sub‐Saharan Africa and other parts of the developing world. However, genome resources for cowpea have lagged behind most other major crops. Here we describe foundational genome resources and their application to the analysis of germplasm currently in use in West African breeding programs. Resources developed from the African cultivar IT97K‐499‐35 include a whole‐genome shotgun (WGS) assembly, a bacterial artificial chromosome (BAC) physical map, and assembled sequences from 4355 BACs. These resources and WGS sequences of an additional 36 diverse cowpea accessions supported the development of a genotyping assay for 51 128 SNPs, which was then applied to five bi‐parental RIL populations to produce a consensus genetic map containing 37 372 SNPs. This genetic map enabled the anchoring of 100 Mb of WGS and 420 Mb of BAC sequences, an exploration of genetic diversity along each linkage group, and clarification of macrosynteny between cowpea and common bean. The SNP assay enabled a diversity analysis of materials from West African breeding programs. Two major subpopulations exist within those materials, one of which has significant parentage from South and East Africa and more diversity. There are genomic regions of high differentiation between subpopulations, one of which coincides with a cluster of nodulin genes. The new resources and knowledge help to define goals and accelerate the breeding of improved varieties to address food security issues related to limited‐input small‐holder farming and climate stress

    The genome of cowpea (Vigna unguiculata [L.] Walp.)

    Get PDF
    [EN] Cowpea (Vigna unguiculata [L.] Walp.) is a major crop for worldwide food and nutritional security, especially in sub-Saharan Africa, that is resilient to hot and drought-prone environments. An assembly of the single-haplotype inbred genome of cowpea IT97K-499-35 was developed by exploiting the synergies between single-molecule real-time sequencing, optical and genetic mapping, and an assembly reconciliation algorithm. A total of 519 Mb is included in the assembled sequences. Nearly half of the assembled sequence is composed of repetitive elements, which are enriched within recombination-poor pericentromeric regions. A comparative analysis of these elements suggests that genome size differences between Vigna species are mainly attributable to changes in the amount of Gypsy retrotransposons. Conversely, genes are more abundant in more distal, high-recombination regions of the chromosomes; there appears to be more duplication of genes within the NBS-LRR and the SAUR-like auxin superfamilies compared with other warm-season legumes that have been sequenced. A surprising outcome is the identification of an inversion of 4.2 Mb among landraces and cultivars, which includes a gene that has been associated in other plants with interactions with the parasitic weed Striga gesnerioides. The genome sequence facilitated the identification of a putative syntelog for multiple organ gigantism in legumes. A revised numbering system has been adopted for cowpea chromosomes based on synteny with common bean (Phaseolus vulgaris). An estimate of nuclear genome size of 640.6 Mbp based on cytometry is presentedS

    Alhakami/calculate-gene-percentage v1.0

    No full text
    A perl script to calculate gene percentage in a genome assembly

    Algorithms and Data Structures for de novo Sequence Assembly

    No full text
    Despite the prodigious throughput of the sequencing instruments currently on the market, the assembly problem remains very challenging, mainly due to the repetitive content of large genomes, uneven sequencing coverage, and the presence of (non-uniform) sequencing errors and chimeric reads. The third generation of sequencing technology such as Pacific Biosciences and Oxford Nanopore offers very long reads at a higher cost per base, but sequencing error rate is much higher. As a consequence, the final assembly is very rarely entirely finished, with one solid sequence per chromosome. Instead, the typical output is an unordered/unoriented set of contiguous regions called contigs. We examine two different but related problems in this study; merging multiple assemblies produced using different assemblers/parameters, and stitching assembled BACs to create a genome-wide assembly.The contribution of this dissertation is twofold. First, compact encoding of finite sets of strings is a classic problem. The manipulation of large sets requires compact data structures that allow for efficient set operations. We defined sequence decision diagrams (SeqDDs), which can encode arbitrary finite sets of strings over an alphabet.Second, reassembly of existing overlapping contigs with the intent to produce a higher quality genome-wide assembly. Second, merge multiple assemblies to produce a higher quality consensus is a compelling problem. We conducted a comparative study of state of the art assembly reconciliation tools, with the intent to use them in assembling a set of approximately four thosands Vigna unguiculata (cowpea) assembled BACs. To accomplish this task, we developed Colored-Positioned de bruijn graph, a variant of the classic de bruijn graph to stitch overlapped assemblies.In this Dissertation we studied and developed data structures and algorithms to merge overlapping assemblies. In particular: (1) Introduced sequence decision diagrams (SeqDDs) to enable compact encoding of finite sets of strings that allow for efficient set operations, among which detecting overlaps. (2) carried a comparative study of state of the art assembly reconciliation tools. and (3) developed tools to cluster overlapped BACs and assemble said clusters. Our assembler implements colored-positioned de bruijn graph, an augmented variant of the classic de bruijn graph, defined in this study

    Additional file 1 of A comparative evaluation of genome assembly reconciliation tools

    No full text
    Contains Supplementary Notes 1ñ€“7, Supplementary Tables 1ñ€“19 and Supplementary Figures 1ñ€“13. (PDF 1250 kb

    The genome of cowpea (Vigna unguiculata [L.] Walp.)

    Get PDF
    Cowpea (Vigna unguiculata [L.] Walp.) is a major crop for worldwide food and nutritional security, especially in sub-Saharan Africa, that is resilient to hot and drought-prone environments. An assembly of the single-haplotype inbred genome of cowpea IT97K-499-35 was developed by exploiting the synergies between single-molecule real-time sequencing, optical and genetic mapping, and an assembly reconciliation algorithm. A total of 519 Mb is included in the assembled sequences. Nearly half of the assembled sequence is composed of repetitive elements, which are enriched within recombination-poor pericentromeric regions. A comparative analysis of these elements suggests that genome size differences between Vigna species are mainly attributable to changes in the amount of Gypsy retrotransposons. Conversely, genes are more abundant in more distal, high-recombination regions of the chromosomes; there appears to be more duplication of genes within the NBS-LRR and the SAUR-like auxin superfamilies compared with other warm-season legumes that have been sequenced. A surprising outcome is the identification of an inversion of 4.2 Mb among landraces and cultivars, which includes a gene that has been associated in other plants with interactions with the parasitic weed Striga gesnerioides. The genome sequence facilitated the identification of a putative syntelog for multiple organ gigantism in legumes. A revised numbering system has been adopted for cowpea chromosomes based on synteny with common bean (Phaseolus vulgaris). An estimate of nuclear genome size of 640.6 Mbp based on cytometry is presented.Peer reviewe
    corecore