225 research outputs found

    Genome sequence and analysis of the tuber crop potato

    Get PDF
    Potato (Solanum tuberosum L.) is the world’s most important non-grain food crop and is central to global food security. It is clonally propagated, highly heterozygous, autotetraploid, and suffers acute inbreeding depression. Here we use a homozygous doubled-monoploid potato clone to sequence and assemble 86% of the 844-megabase genome. We predict 39,031 protein-coding genes and present evidence for at least two genome duplication events indicative of a palaeopolyploid origin. As the first genome sequence of an asterid, the potato genome reveals 2,642 genes specific to this large angiosperm clade. We also sequenced a heterozygous diploid clone and show that gene presence/absence variants and other potentially deleterious mutations occur frequently and are a likely cause of inbreeding depression. Gene family expansion, tissue-specific expression and recruitment of genes to new pathways contributed to the evolution of tuber development. The potato genome sequence provides a platform for genetic improvement of this vital cro

    Genome sequence and analysis of the tuber crop potato

    Get PDF
    Potato (Solanum tuberosum L.) is the world's most important non-grain food crop and is central to global food security. It is clonally propagated, highly heterozygous, autotetraploid, and suffers acute inbreeding depression. Here we use a homozygous doubled-monoploid potato clone to sequence and assemble 86% of the 844-megabase genome. We predict 39,031 protein-coding genes and present evidence for at least two genome duplication events indicative of a palaeopolyploid origin. As the first genome sequence of an asterid, the potato genome reveals 2,642 genes specific to this large angiosperm clade. We also sequenced a heterozygous diploid clone and show that gene presence/absence variants and other potentially deleterious mutations occur frequently and are a likely cause of inbreeding depression. Gene family expansion, tissue-specific expression and recruitment of genes to new pathways contributed to the evolution of tuber development. The potato genome sequence provides a platform for genetic improvement of this vital crop.La lista completa de autores que integran el documento puede consultarse en el archivo.Facultad de Ciencias Exacta

    Genome sequence and analysis of the tuber crop potato

    Get PDF
    Potato (Solanum tuberosum L.) is the world's most important non-grain food crop and is central to global food security. It is clonally propagated, highly heterozygous, autotetraploid, and suffers acute inbreeding depression. Here we use a homozygous doubled-monoploid potato clone to sequence and assemble 86% of the 844-megabase genome. We predict 39,031 protein-coding genes and present evidence for at least two genome duplication events indicative of a palaeopolyploid origin. As the first genome sequence of an asterid, the potato genome reveals 2,642 genes specific to this large angiosperm clade. We also sequenced a heterozygous diploid clone and show that gene presence/absence variants and other potentially deleterious mutations occur frequently and are a likely cause of inbreeding depression. Gene family expansion, tissue-specific expression and recruitment of genes to new pathways contributed to the evolution of tuber development. The potato genome sequence provides a platform for genetic improvement of this vital crop.La lista completa de autores que integran el documento puede consultarse en el archivo.Facultad de Ciencias Exacta

    Sequencing of 6.7 Mb of the melon genome using a BAC pooling strategy

    Get PDF
    <p>Abstract</p> <p>Background</p> <p><it>Cucumis melo </it>(melon) belongs to the Cucurbitaceae family, whose economic importance among horticulture crops is second only to Solanaceae. Melon has a high intra-specific genetic variation, morphologic diversity and a small genome size (454 Mb), which make it suitable for a great variety of molecular and genetic studies. A number of genetic and genomic resources have already been developed, such as several genetic maps, BAC genomic libraries, a BAC-based physical map and EST collections. Sequence information would be invaluable to complete the picture of the melon genomic landscape, furthering our understanding of this species' evolution from its relatives and providing an important genetic tool. However, to this day there is little sequence data available, only a few melon genes and genomic regions are deposited in public databases. The development of massively parallel sequencing methods allows envisaging new strategies to obtain long fragments of genomic sequence at higher speed and lower cost than previous Sanger-based methods.</p> <p>Results</p> <p>In order to gain insight into the structure of a significant portion of the melon genome we set out to perform massive sequencing of pools of BAC clones. For this, a set of 57 BAC clones from a double haploid line was sequenced in two pools with the 454 system using both shotgun and paired-end approaches. The final assembly consists of an estimated 95% of the actual size of the melon BAC clones, with most likely complete sequences for 50 of the BACs, and a total sequence coverage of 39x. The accuracy of the assembly was assessed by comparing the previously available Sanger sequence of one of the BACs against its 454 sequence, and the polymorphisms found involved only 1.7 differences every 10,000 bp that were localized in 15 homopolymeric regions and two dinucleotide tandem repeats. Overall, the study provides approximately 6.7 Mb or 1.5% of the melon genome. The analysis of this new data has allowed us to gain further insight into characteristics of the melon genome such as gene density, average protein length, or microsatellite and transposon content. The annotation of the BAC sequences revealed a high degree of collinearity and protein sequence identity between melon and its close relative <it>Cucumis sativus </it>(cucumber). Transposon content analysis of the syntenic regions suggests that transposition activity after the split of both cucurbit species has been low in cucumber but very high in melon.</p> <p>Conclusions</p> <p>The results presented here show that the strategy followed, which combines shotgun and BAC-end sequencing together with anchored marker information, is an excellent method for sequencing specific genomic regions, especially from relatively compact genomes such as that of melon. However, in agreement with other results, this map-based, BAC approach is confirmed to be an expensive way of sequencing a whole plant genome. Our results also provide a partial description of the melon genome's structure. Namely, our analysis shows that the melon genome is highly collinear with the smaller one of cucumber, the size difference being mainly due to the expansion of intergenic regions and proliferation of transposable elements.</p

    A haploid pseudo-chromosome genome assembly for a keystone sagebrush species of western North American rangelands

    Get PDF
    Increased ecological disturbances, species invasions, and climate change are creating severe conservation problems for several plant species that are widespread and foundational. Understanding the genetic diversity of these species and how it relates to adaptation to these stressors are necessary for guiding conservation and restoration efforts. This need is particularly acute for big sagebrush (Artemisia tridentata; Asteraceae), which was once the dominant shrub over 1,000,000 km2 in western North America but has since retracted by half and thus has become the target of one of the largest restoration seeding efforts globally. Here, we present the first reference-quality genome assembly for an ecologically important subspecies of big sagebrush (A. tridentata subsp. tridentata) based on short and long reads, as well as chromatin proximity ligation data analyzed using the HiRise pipeline. The final 4.2-Gb assembly consists of 5,492 scaffolds, with nine pseudo-chromosomal scaffolds (nine scaffolds comprising at least 90% of the assembled genome; n = 9). The assembly contains an estimated 43,377 genes based on ab initio gene discovery and transcriptional data analyzed using the MAKER pipeline, with 91.37% of BUSCOs being completely assembled. The final assembly was highly repetitive, with repeat elements comprising 77.99% of the genome, making the Artemisia tridentata subsp. tridentata genome one of the most highly repetitive plant genomes to be sequenced and assembled. This genome assembly advances studies on plant adaptation to drought and heat stress and provides a valuable tool for future genomic research.This research was made possible by 2 NSF Idaho EPSCoR grants (award numbers OIA-1757324 and OIA-1826801), as well as a Dovetail Genomics Tree of Life Award.Introduction Materials and methods Sample collection, in vitro tissue propagation, and biomass production Flow cytometry and genome complexity analysis PacBio and Omni-C sequence data generation PacBio long-read de novo assembly and validation Pseudomolecule construction with HiRise Genome annotation RNA sequencing Repeat identification Functional annotation Results and discussion Validation of genome assembly and annotation Genome complexity and evidence of past polyploidization Comparing the A. tridentata and A. annua genome assemblies Applications of the sagebrush reference genome Data availability Acknowledgments Literature cite

    A Haploid Pseudo-Chromosome Genome Assembly for a Keystone Sagebrush Species of Western North American Rangelands

    Get PDF
    Increased ecological disturbances, species invasions, and climate change are creating severe conservation problems for several plant species that are widespread and foundational. Understanding the genetic diversity of these species and how it relates to adaptation to these stressors are necessary for guiding conservation and restoration efforts. This need is particularly acute for big sagebrush (Artemisia tridentata; Asteraceae), which was once the dominant shrub over 1,000,000 km2 in western North America but has since retracted by half and thus has become the target of one of the largest restoration seeding efforts globally. Here, we present the first reference-quality genome assembly for an ecologically important subspecies of big sagebrush (A. tridentata subsp. tridentata) based on short and long reads, as well as chromatin proximity ligation data analyzed using the HiRise pipeline. The final 4.2-Gb assembly consists of 5,492 scaffolds, with nine pseudo-chromosomal scaffolds (nine scaffolds comprising at least 90% of the assembled genome; n = 9). The assembly contains an estimated 43,377 genes based on ab initio gene discovery and transcriptional data analyzed using the MAKER pipeline, with 91.37% of BUSCOs being completely assembled. The final assembly was highly repetitive, with repeat elements comprising 77.99% of the genome, making the Artemisia tridentata subsp. tridentata genome one of the most highly repetitive plant genomes to be sequenced and assembled. This genome assembly advances studies on plant adaptation to drought and heat stress and provides a valuable tool for future genomic research

    Empirical comparison of ab initio repeat finding programs

    Get PDF
    Identification of dispersed repetitive elements can be difficult, especially when elements share little or no homology with previously described repeats. Consequently, a growing number of computational tools have been designed to identify repetitive elements in an ab initio manner, i.e. without using prior sequence data. Here we present the results of side-by-side evaluations of six of the most widely used ab initio repeat finding programs. Using sequence from rice chromosome 12, tools were compared with regard to time requirements, ability to find known repeats, utility in identifying potential novel repeats, number and types of repeat elements recognized and compactness of family descriptions. The study reveals profound differences in the utility of the tools with some identifying virtually their entire substrate as repetitive, others making reasonable estimates of repetition, and some missing almost all repeats. Of note, even when tools recognized similar numbers of repeats they often showed marked differences in the nature and number of repeat families identified. Within the context of this comparative study, ReAS and RepeatScout showed the most promise in analysis of sequence reads and assembled genomic regions, respectively. Our results should help biologists identify the program(s), if any, that is best suited for their needs

    Are we there yet? : reliably estimating the completeness of plant genome sequences

    Get PDF
    Genome sequencing is becoming cheaper and faster thanks to the introduction of next-generation sequencing techniques. Dozens of new plant genome sequences have been released in recent years, ranging from small to gigantic repeat-rich or polyploid genomes. Most genome projects have a dual purpose: delivering a contiguous, complete genome assembly and creating a full catalog of correctly predicted genes. Frequently, the completeness of a species' gene catalog is measured using a set of marker genes that are expected to be present. This expectation can be defined along an evolutionary gradient, ranging from highly conserved genes to species-specific genes. Large-scale population resequencing studies have revealed that gene space is fairly variable even between closely related individuals, which limits the definition of the expected gene space, and, consequently, the accuracy of estimates used to assess genome and gene space completeness. We argue that, based on the desired applications of a genome sequencing project, different completeness scores for the genome assembly and/or gene space should be determined. Using examples from several dicot and monocot genomes, we outline some pitfalls and recommendations regarding methods to estimate completeness during different steps of genome assembly and annotation

    Refined annotation and assembly of the Tetrahymena thermophila genome sequence through EST analysis, comparative genomic hybridization, and targeted gap closure

    Get PDF
    <p>Abstract</p> <p>Background</p> <p><it>Tetrahymena thermophila</it>, a widely studied model for cellular and molecular biology, is a binucleated single-celled organism with a germline micronucleus (MIC) and somatic macronucleus (MAC). The recent draft MAC genome assembly revealed low sequence repetitiveness, a result of the epigenetic removal of invasive DNA elements found only in the MIC genome. Such low repetitiveness makes complete closure of the MAC genome a feasible goal, which to achieve would require standard closure methods as well as removal of minor MIC contamination of the MAC genome assembly. Highly accurate preliminary annotation of <it>Tetrahymena</it>'s coding potential was hindered by the lack of both comparative genomic sequence information from close relatives and significant amounts of cDNA evidence, thus limiting the value of the genomic information and also leaving unanswered certain questions, such as the frequency of alternative splicing.</p> <p>Results</p> <p>We addressed the problem of MIC contamination using comparative genomic hybridization with purified MIC and MAC DNA probes against a whole genome oligonucleotide microarray, allowing the identification of 763 genome scaffolds likely to contain MIC-limited DNA sequences. We also employed standard genome closure methods to essentially finish over 60% of the MAC genome. For the improvement of annotation, we have sequenced and analyzed over 60,000 verified EST reads from a variety of cellular growth and development conditions. Using this EST evidence, a combination of automated and manual reannotation efforts led to updates that affect 16% of the current protein-coding gene models. By comparing EST abundance, many genes showing apparent differential expression between these conditions were identified. Rare instances of alternative splicing and uses of the non-standard amino acid selenocysteine were also identified.</p> <p>Conclusion</p> <p>We report here significant progress in genome closure and reannotation of <it>Tetrahymena thermophila</it>. Our experience to date suggests that complete closure of the MAC genome is attainable. Using the new EST evidence, automated and manual curation has resulted in substantial improvements to the over 24,000 gene models, which will be valuable to researchers studying this model organism as well as for comparative genomics purposes.</p
    corecore