60 research outputs found
A sorghum practical haplotype graph facilitates genome‐wide imputation and cost‐effective genomic prediction
Successful management and utilization of increasingly large genomic datasets is
essential for breeding programs to accelerate cultivar development. To help with
this, we developed a Sorghum bicolor Practical Haplotype Graph (PHG) pangenome
database that stores haplotypes and variant information. We developed two PHGs
in sorghum that were used to identify genome-wide variants for 24 founders of the
Chibas sorghum breeding program from 0.01x sequence coverage. The PHG called
single nucleotide polymorphisms (SNPs) with 5.9% error at 0.01x coverage—only
3% higher than PHG error when calling SNPs from 8x coverage sequence. Additionally,
207 progenies from the Chibas genomic selection (GS) training population
were sequenced and processed through the PHG. Missing genotypes were imputed
from PHG parental haplotypes and used for genomic prediction. Mean prediction
accuracies with PHG SNP calls range from .57–.73 and are similar to prediction
accuracies obtained with genotyping-by-sequencing or targeted amplicon sequencing
(rhAmpSeq) markers. This study demonstrates the use of a sorghum PHG to impute SNPs from low-coverage sequence data and shows that the PHG can unify
genotype calls across multiple sequencing platforms. By reducing input sequence
requirements, the PHG can decrease the cost of genotyping, make GS more feasible,
and facilitate larger breeding populations. Our results demonstrate that the PHG is a
useful research and breeding tool that maintains variant information from a diverse
group of taxa, stores sequence data in a condensed but readily accessible format, unifies
genotypes across genotyping platforms, and provides a cost-effective option for
genomic selection
Floral gene resources from basal angiosperms for comparative genomics research
BACKGROUND: The Floral Genome Project was initiated to bridge the genomic gap between the most broadly studied plant model systems. Arabidopsis and rice, although now completely sequenced and under intensive comparative genomic investigation, are separated by at least 125 million years of evolutionary time, and cannot in isolation provide a comprehensive perspective on structural and functional aspects of flowering plant genome dynamics. Here we discuss new genomic resources available to the scientific community, comprising cDNA libraries and Expressed Sequence Tag (EST) sequences for a suite of phylogenetically basal angiosperms specifically selected to bridge the evolutionary gaps between model plants and provide insights into gene content and genome structure in the earliest flowering plants. RESULTS: Random sequencing of cDNAs from representatives of phylogenetically important eudicot, non-grass monocot, and gymnosperm lineages has so far (as of 12/1/04) generated 70,514 ESTs and 48,170 assembled unigenes. Efficient sorting of EST sequences into putative gene families based on whole Arabidopsis/rice proteome comparison has permitted ready identification of cDNA clones for finished sequencing. Preliminarily, (i) proportions of functional categories among sequenced floral genes seem representative of the entire Arabidopsis transcriptome, (ii) many known floral gene homologues have been captured, and (iii) phylogenetic analyses of ESTs are providing new insights into the process of gene family evolution in relation to the origin and diversification of the angiosperms. CONCLUSION: Initial comparisons illustrate the utility of the EST data sets toward discovery of the basic floral transcriptome. These first findings also afford the opportunity to address a number of conspicuous evolutionary genomic questions, including reproductive organ transcriptome overlap between angiosperms and gymnosperms, genome-wide duplication history, lineage-specific gene duplication and functional divergence, and analyses of adaptive molecular evolution. Since not all genes in the floral transcriptome will be associated with flowering, these EST resources will also be of interest to plant scientists working on other functions, such as photosynthesis, signal transduction, and metabolic pathways
Defining Loci in Restriction-Based Reduced Representation Genomic Data from Nonmodel Species: Sources of Bias and Diagnostics for Optimal Clustering
Next generation sequencing holds great promise for applications of phylogeography, landscape genetics, and population genomics in wild populations of nonmodel species, but the robustness of inferences hinges on careful experimental design and effective bioinformatic removal of predictable artifacts. Addressing this issue, we use published genomes from a tunicate, stickleback, and soybean to illustrate the potential for bioinformatic artifacts and introduce a protocol to minimize two sources of error expected from similarity-based de-novo clustering of stacked reads: the splitting of alleles into different clusters, which creates false homozygosity, and the grouping of paralogs into the same cluster, which creates false heterozygosity. We present an empirical application focused on Ciona savignyi, a tunicate with very high SNP heterozygosity (~0.05), because high diversity challenges the computational efficiency of most existing nonmodel pipelines while also potentially exacerbating paralog artifacts. The simulated and empirical data illustrate the advantages of using higher sequence difference clustering thresholds than is typical and demonstrate the utility of our protocol for efficiently identifying an optimum threshold from data without prior knowledge of heterozygosity. The empirical Ciona savignyi data also highlight null alleles as a potentially large source of false homozygosity in restriction-based reduced representation genomic data
Recommended from our members
Transcriptomic and evolutionary analysis of the mechanisms by which P. argentatum, a rubber producing perennial, responds to drought
Background Guayule (Parthenium argentatum Gray) is a drought tolerant, rubber producing perennial shrub native to northern Mexico and the US Southwest. Hevea brasiliensis, currently the world's only source of natural rubber, is grown as a monoculture, leaving it vulnerable to both biotic and abiotic stressors. Isolation of rubber from guayule occurs by mechanical harvesting of the entire plant. It has been reported that environmental conditions leading up to harvest have a profound impact on rubber yield. The link between rubber biosynthesis and drought, a common environmental condition in guayule's native habitat, is currently unclear. Results We took a transcriptomic and comparative genomic approach to determine how drought impacts rubber biosynthesis in guayule. We compared transcriptional profiles of stem tissue, the location of guayule rubber biosynthesis, collected from field-grown plants subjected to water-deficit (drought) and well-watered (control) conditions. Plants subjected to the imposed drought conditions displayed an increase in production of transcripts associated with defense responses and water homeostasis, and a decrease in transcripts associated with rubber biosynthesis. An evolutionary and comparative analysis of stress-response transcripts suggests that more anciently duplicated transcripts shared among the Asteraceae, rather than recently derived duplicates, are contributing to the drought response observed in guayule. In addition, we identified several deeply conserved long non-coding RNAs (lncRNAs) containing microRNA binding motifs. One lncRNA in particular, with origins at the base of Asteraceae, may be regulating the vegetative to reproductive transition observed in water-stressed guayule by acting as a miRNA sponge for miR166. Conclusions These data represent the first genomic analyses of how guayule responds to drought like conditions in agricultural production settings. We identified an inverse relationship between stress-responsive transcripts and those associated with precursor pathways to rubber biosynthesis suggesting a physiological trade-off between maintaining homeostasis and plant productivity. We also identify a number of regulators of abiotic responses, including transcription factors and lncRNAs, that are strong candidates for future projects aimed at modulating rubber biosynthesis under water-limiting conditions common to guayules' native production environment.USDA Biomass Research and Development Initiative (BRDI) [USDA-NIFA 201210006-19391 OH]; USDA-NIFA Coordinated Agricultural Program; Sustainable Bioeconomy for Arid Regions (SBAR) [2017-68005-26867]; National Science FoundationNational Science Foundation (NSF) [IOS-1758532]; University of Arizona; SBAROpen access journalThis item from the UA Faculty Publications collection is made available by the University of Arizona with support from the University of Arizona Libraries. If you have questions, please contact us at [email protected]
EST database for early flower development in California poppy (Eschscholzia californica Cham., Papaveraceae) tags over 6000 genes from a basal eudicot
The Floral Genome Project (FGP) selected California poppy (Eschscholzia californica Cham. ssp. Californica) to help identify new florally-expressed genes related to floral diversity in basal eudicots. A large, non-normalized cDNA library was constructed from premeiotic and meiotic floral buds and sequenced to generate a database of 9079 high quality Expressed Sequence Tags (ESTs). These sequences clustered into 5713 unigenes, including 1414 contigs and 4299 singletons. Homologs of genes regulating many aspects of flower development were identified, including those for organ identity and development, cell and tissue differentiation, cell cycle control, and secondary metabolism. Over 5% of the transcriptome consisted of homologs to known floral gene families. Most are the first representatives of their respective gene families in basal eudicots and their conservation suggests they are important for floral development and/or function. App. 10% of the transcripts encoded transcription factors and other regulatory genes, including nine genes from the seven major lineages of the important MADS-box family of developmental regulators. Homologs of alkaloid pathway genes were also recovered, providing opportunities to explore adaptive evolution in secondary products. Furthermore, comparison of the poppy ESTs with the Arabidopsis genome provided support for putative Arabidopsis genes that previously lacked annotation. Finally, over 1800 unique sequences had no observable homology in the public databases. The California poppy EST database and library will help bridge our understanding of flower initiation and development among higher eudicot and monocot model plants and provide new opportunities for comparative analysis of gene families across angiosperm species
- …