26 research outputs found

    Development of Genomic Markers and Mapping Tools for Assembling the Allotetraploid Gossypium hirsutum L. Draft Genome Sequence

    Get PDF
    Cotton (Gossypium spp.) is the largest producer of natural textile fibers. Most worldwide and domestic cotton fiber production is based on cultivars of G. hirsutum L., an allotetraploid. Genetic improvement of cotton remains constrained by alarmingly low levels of genetic diversity, inadequate genomic tools for genetic analysis and manipulation, and the difficulty of effectively harnessing the vastly greater genetic diversity harbored by other Gossypium species. Development of large numbers of single nucleotide polymorphisms (SNPs) for use in intraspecific and interspecific populations will allow for cotton germplasm diversity characterization, high-throughput genotyping, marker-assisted breeding, germplasm introgression of advantageous traits from wild species, and high-density genetic mapping. My research has been focused on utilizing next generation sequencing data for intraspecific and interspecific SNP marker development, validation, and creation of high-throughput genotyping methods to advance cotton research. I used transcriptome sequencing to develop and map the first gene-associated SNPs for five species, G. barbadense (Pima cotton), G. tomentosum, G. mustelinum, G. armourianum, and G. longicalyx. A total of 62,832 non-redundant SNPs were developed. These can be utilized for interspecific germplasm introgression into cultivated G. hirsutum, as well as for subsequent genetic analysis and manipulation. To create SNP-based resources for integrated physical mapping, I used BAC-end sequences (BESs) and resequecing data for 12 G. hirsutum lines, a Pima line and G. longicalyx to derive 132,262 intraspecific and 693,769 interspecific SNPs located in BESs. These SNP data sets were used to help build the first high-throughput genotyping array for cotton, the CottonSNP63K, which now provides a standardized platform for global cotton research. I applied the array to two F2 populations and produced the first two high-density SNP maps for cotton, one intraspecific and one interspecific. By resequencing two interspecific F1 hypo-aneuploids, I also demonstrated that the chromosome-wide changes in SNP genotypes enable highly effective mass-localization of BACs to individual cotton chromosomes. These efforts provide additional validation and placement methods that can be directly integrated with the physical map being constructed for G. hirsutum and enable the production of a high-quality draft genome sequence for cultivated cotton. I used transcriptome sequencing to develop and map the first gene-associated SNPs for five species, G. barbadense (Pima cotton), G. tomentosum, G. mustelinum, G. armourianum, and G. longicalyx. A total of 62,832 non-redundant SNPs were developed. These can be utilized for interspecific germplasm introgression into cultivated G. hirsutum, as well as for subsequent genetic analysis and manipulation. To create SNP-based resources for integrated physical mapping, I used BAC-end sequences (BESs) and resequecing data for 12 G. hirsutum lines, a Pima line and G. longicalyx to derive 132,262 intraspecific and 693,769 interspecific SNPs located in BESs. These SNP data sets were used to help build the first high-throughput genotyping array for cotton, the CottonSNP63K, which now provides a standardized platform for global cotton research. I applied the array to two F2 populations and produced the first two high-density SNP maps for cotton, one intraspecific and one interspecific. By resequencing two interspecific F1 hypo-aneuploids, I also demonstrated that the chromosome-wide changes in SNP genotypes enable highly effective mass-localization of BACs to individual cotton chromosomes. These efforts provide additional validation and placement methods that can be directly integrated with the physical map being constructed for G. hirsutum and enable the production of a high-quality draft genome sequence for cultivated cotton

    Genomic diversifications of five Gossypium allopolyploid species and their impact on cotton improvement

    Get PDF
    Polyploidy is an evolutionary innovation for many animals and all flowering plants, but its impact on selection and domestication remains elusive. Here we analyze genome evolution and diversification for all five allopolyploid cotton species, including economically important Upland and Pima cottons. Although these polyploid genomes are conserved in gene content and synteny, they have diversified by subgenomic transposon exchanges that equilibrate genome size, evolutionary rate heterogeneities and positive selection between homoeologs within and among lineages. These differential evolutionary trajectories are accompanied by gene-family diversification and homoeolog expression divergence among polyploid lineages. Selection and domestication drive parallel gene expression similarities in fibers of two cultivated cottons, involving coexpression networks and N6-methyladenosine RNA modifications. Furthermore, polyploidy induces recombination suppression, which correlates with altered epigenetic landscapes and can be overcome by wild introgression. These genomic insights will empower efforts to manipulate genetic recombination and modify epigenetic landscapes and target genes for crop improvement

    An anchored chromosome-scale genome assembly of spinach improves annotation and reveals extensive gene rearrangements in euasterids.

    Get PDF
    Spinach (Spinacia oleracea L.) is a member of the Caryophyllales family, a basal eudicot asterid that consists of sugar beet (Beta vulgaris L. subsp. vulgaris), quinoa (Chenopodium quinoa Willd.), and amaranth (Amaranthus hypochondriacus L.). With the introduction of baby leaf types, spinach has become a staple food in many homes. Production issues focus on yield, nitrogen-use efficiency and resistance to downy mildew (Peronospora effusa). Although genomes are available for the above species, a chromosome-level assembly exists only for quinoa, allowing for proper annotation and structural analyses to enhance crop improvement. We independently assembled and annotated genomes of the cultivar Viroflay using short-read strategy (Illumina) and long-read strategies (Pacific Biosciences) to develop a chromosome-level, genetically anchored assembly for spinach. Scaffold N50 for the Illumina assembly was 389 kb, whereas that for Pacific BioSciences was 4.43 Mb, representing 911 Mb (93% of the genome) in 221 scaffolds, 80% of which are anchored and oriented on a sequence-based genetic map, also described within this work. The two assemblies were 99.5% collinear. Independent annotation of the two assemblies with the same comprehensive transcriptome dataset show that the quality of the assembly directly affects the annotation with significantly more genes predicted (26,862 vs. 34,877) in the long-read assembly. Analysis of resistance genes confirms a bias in resistant gene motifs more typical of monocots. Evolutionary analysis indicates that Spinacia is a paleohexaploid with a whole-genome triplication followed by extensive gene rearrangements identified in this work. Diversity analysis of 75 lines indicate that variation in genes is ample for hypothesis-driven, genomic-assisted breeding enabled by this work

    There and back again: historical perspective and future directions for Vaccinium breeding and research studies

    Get PDF
    The genus Vaccinium L. (Ericaceae) contains a wide diversity of culturally and economically important berry crop species. Consumer demand and scientific research in blueberry (Vaccinium spp.) and cranberry (Vaccinium macrocarpon) have increased worldwide over the crops' relatively short domestication history (~100 years). Other species, including bilberry (Vaccinium myrtillus), lingonberry (Vaccinium vitis-idaea), and ohelo berry (Vaccinium reticulatum) are largely still harvested from the wild but with crop improvement efforts underway. Here, we present a review article on these Vaccinium berry crops on topics that span taxonomy to genetics and genomics to breeding. We highlight the accomplishments made thus far for each of these crops, along their journey from the wild, and propose research areas and questions that will require investments by the community over the coming decades to guide future crop improvement efforts. New tools and resources are needed to underpin the development of superior cultivars that are not only more resilient to various environmental stresses and higher yielding, but also produce fruit that continue to meet a variety of consumer preferences, including fruit quality and health related trait

    Large expert-curated database for benchmarking document similarity detection in biomedical literature search

    Get PDF
    Document recommendation systems for locating relevant literature have mostly relied on methods developed a decade ago. This is largely due to the lack of a large offline gold-standard benchmark of relevant documents that cover a variety of research fields such that newly developed literature search techniques can be compared, improved and translated into practice. To overcome this bottleneck, we have established the RElevant LIterature SearcH consortium consisting of more than 1500 scientists from 84 countries, who have collectively annotated the relevance of over 180 000 PubMed-listed articles with regard to their respective seed (input) article/s. The majority of annotations were contributed by highly experienced, original authors of the seed articles. The collected data cover 76% of all unique PubMed Medical Subject Headings descriptors. No systematic biases were observed across different experience levels, research fields or time spent on annotations. More importantly, annotations of the same document pairs contributed by different scientists were highly concordant. We further show that the three representative baseline methods used to generate recommended articles for evaluation (Okapi Best Matching 25, Term Frequency-Inverse Document Frequency and PubMed Related Articles) had similar overall performances. Additionally, we found that these methods each tend to produce distinct collections of recommended articles, suggesting that a hybrid method may be required to completely capture all relevant articles. The established database server located at https://relishdb.ict.griffith.edu.au is freely available for the downloading of annotation data and the blind testing of new methods. We expect that this benchmark will be useful for stimulating the development of new powerful techniques for title and title/abstract-based search engines for relevant articles in biomedical research.Peer reviewe

    Outlook for Implementation of Genomics-Based Selection in Public Cotton Breeding Programs

    No full text
    Researchers have used quantitative genetics to map cotton fiber quality and agronomic performance loci, but many alleles may be population or environment-specific, limiting their usefulness in a pedigree selection, inbreeding-based system. Here, we utilized genotypic and phenotypic data on a panel of 80 important historical Upland cotton (Gossypium hirsutum L.) lines to investigate the potential for genomics-based selection within a cotton breeding program’s relatively closed gene pool. We performed a genome-wide association study (GWAS) to identify alleles correlated to 20 fiber quality, seed composition, and yield traits and looked for a consistent detection of GWAS hits across 14 individual field trials. We also explored the potential for genomic prediction to capture genotypic variation for these quantitative traits and tested the incorporation of GWAS hits into the prediction model. Overall, we found that genomic selection programs for fiber quality can begin immediately, and the prediction ability for most other traits is lower but commensurate with heritability. Stably detected GWAS hits can improve prediction accuracy, although a significance threshold must be carefully chosen to include a marker as a fixed effect. We place these results in the context of modern public cotton line-breeding and highlight the need for a community-based approach to amass the data and expertise necessary to launch US public-sector cotton breeders into the genomics-based selection era
    corecore