40 research outputs found
Recommended from our members
Genomic Investigations of Diversity within the Milkweed Genus Asclepias, at Multiple Scales
At a time when the biodiversity on Earth is being rapidly lost, new technologies and methods in genomic analysis are fortunately allowing scientists to catalog and explore the diversity that remains more efficiently and precisely. The studies in this dissertation investigate genomic diversity within the milkweed genus, Asclepias, at multiple scales, including diversity found within a single individual, diversity within and among populations in a species, and diversity across the entire genus. These investigations contribute to our understanding of the genomic content and architecture within Asclepias and the Gentianales, patterns of population diversification in the western United States, and the evolutionary history of select loci within Asclepias, with implications across flowering plants.
Chapter 2 investigates patterns of polymorphisms among paralogous copies of nuclear ribosomal DNA (nrDNA) within individual genomes, and presents a bioinformatic pipeline for characterizing polymorphisms among copies of a high-copy locus. Results are presented for intragenomic nrDNA polymorphisms across Asclepias. The 18S-26S portion of the nrDNA cistron of Asclepias syriaca served as a reference for assembly of the region from 124 samples representing 90 species of Asclepias. Reads were mapped back to each individual’s consensus and at each position reads differing from the consensus were tallied using a custom Perl script. Low frequency polymorphisms existed in all individuals (mean = 5.8%). Most nrDNA positions (91%) were polymorphic in at least one individual, with polymorphic sites being less frequent in subunit regions and loops. Highly polymorphic sites existed in each individual, with highest abundance in the “noncoding” ITS regions. Phylogenetic signal was present in the distribution of intragenomic polymorphisms across the genus. Intragenomic polymorphisms in nrDNA are common in Asclepias, being found at higher frequency than any other study to date. The high and variable frequency of polymorphisms across species highlights concerns that phylogenetic applications of nrDNA may be error-prone. The new analytical approach provided in this chapter is applicable to other taxa and other high-copy regions characterized by low coverage genome sequencing (genome skimming).
Chapter 3 presents Hyb-Seq, a new method combining target enrichment and genome skimming to allow simultaneous data collection for low-copy nuclear genes and high-copy genomic targets for plant systematics and evolution studies. A program is presented that takes genome and transcriptome assemblies and locates loci likely to be low copy and phylogenetically informative, to be used for probe development and enrichment in sequence libraries. A workflow is presented for processing data, from raw sequence reads to assembled exons and reconstructed trees.
Genome and transcriptome assemblies for Asclepias syriaca were used to design enrichment probes for 3385 exons from 768 genes (>1.6 Mbp) followed by Illumina sequencing of enriched libraries. Hyb-Seq of 12 individuals (10 Asclepias species and two related genera) resulted in at least partial assembly of 92.6% of exons and 99.7% of genes and an average assembly length >2 Mbp. Importantly, complete plastomes and nrDNA cistrons were assembled using off-target reads. Phylogenomic analyses demonstrated signal conflict between genomes. The Hyb-Seq approach enables targeted sequencing of thousands of low-copy nuclear exons and flanking regions, as well as genome skimming of high-copy repeats and organellar genomes, to efficiently produce genome-scale data sets for phylogenomics.
Chapter 4 presents an assembly of the genome of the common milkweed, Asclepias syriaca. It uses principles from Chapter 3 to target SNPs and reconstruct linkage groups, enabling an analysis of chromosomal evolution within Gentianales, the order containing Asclepias. Asclepias syriaca is the first species in Apocynaceae with reconstructions of the nuclear, chloroplast, and mitochondrial genomes, and the first to have linkage group information incorporated into the nuclear assembly.
The final assembly of Asclepias syriaca contains 54,266 scaffolds ≥1 kbp, with N50 = 3415 bp, representing 37% (156.6 Mbp) of the estimated 420 Mbp genome. Scaffolds ≥200 bp sum to 229.7 Mbp, with N50 = 1904 bp. A total of 14,474 protein coding genes were identified based on transcript evidence, closely related proteins, and ab initio models, and 95% of genes were annotated based on genes from Coffea canephora and Catharanthus rosea. A large proportion of gene space is represented in the assembly, with 96.7% of Asclepias transcripts, 88.4% of transcripts from the related genus Calotropis, and 90.6% of proteins from Coffea mapping to the assembly. Analyses were performed for three gene families, involved in rubber production, light sensing, and cardenolide production, with the finding that the cardenolide related progesterone 5β-reductase gene family is likely reduced in Asclepias relative to other Apocynaceae. Scaffolds covering 75 Mbp of the Asclepias assembly were grouped into eleven linkage groups. Comparisons of these groups with pseudochromosomes in Coffea found that six chromosome show consistent stability in gene content, while one may have a long history of fragmentation and rearrangement.
Finally, in Chapter 5, diversity within a species across its entire range is investigated with a phylogeographic study of the jewel milkweed, Asclepias cryptoceras. This study applies the SNP targets developed from Chapter 4 to populations of A. cryptoceras, asking whether two recognized subspecies are genetically distinct, and searching for the origin of populations that are morphologically intermediate between the two. A total of 54,673 SNPs were found on 7372 contigs, across 96 individuals from ten populations. Principal component analysis and measures of allelic differentiation indicate a clear disjunction between subspecies cryptoceras and davisii (F[subscript ST] = 0.092 between geographic regions). For intermediate populations, estimates of hybrid index below 0.25 and measures of allelic diversity and private alleles, argue against a hybrid origin due to secondary contact, and instead support their origin as stepping stone populations during expansion along a southern corridor from east to west
Recommended from our members
Intragenomic polymorphisms among high-copy loci: a genus-wide study of nuclear ribosomal DNA in Asclepias (Apocynaceae)
Despite knowledge that concerted evolution of high-copy loci is often imperfect,
studies that investigate the extent of intragenomic polymorphisms and comparisons
across a large number of species are rarely made. We present a bioinformatic pipeline
for characterizing polymorphisms within an individual among copies of a high-copy
locus. Results are presented for nuclear ribosomal DNA (nrDNA) across the milkweed
genus, Asclepias. The 18S-26S portion of the nrDNA cistron of Asclepias syriaca
served as a reference for assembly of the region from 124 samples representing 90
species of Asclepias. Reads were mapped back to each individual’s consensus and at
each position reads differing from the consensus were tallied using a custom perl
script. Low frequency polymorphisms existed in all individuals (mean = 5.8%). Most
nrDNA positions (91%) were polymorphic in at least one individual, with polymorphic
sites being less frequent in subunit regions and loops. Highly polymorphic sites
existed in each individual, with highest abundance in the “noncoding” ITS regions.
Phylogenetic signal was present in the distribution of intragenomic polymorphisms
across the genus. Intragenomic polymorphisms in nrDNA are common in Asclepias,
being found at higher frequency than any other study to date. The high and variable
frequency of polymorphisms across species highlights concerns that phylogenetic
applications of nrDNA may be error-prone. The new analytical approach provided
here is applicable to other taxa and other high-copy regions characterized by low
coverage genome sequencing (genome skimming).This is the publisher’s final pdf. The published article is copyrighted by the author(s) and published by PeerJ. The published article can be found at: https://peerj.com/.Keywords: Evolutionary studies, Bioinformatics, Intragenomic polymorphism, Plant science, Asclepias, Genomics, High-copy, Partial SNP (pSNP), ITS, Genetics, Concerted evolution, Nuclear ribosomal DNA (nrDNA), Intra-individual site polymorphism, Genome skimming, 2IS
Recommended from our members
Hyb-Seq: Combining target enrichment and genome skimming for plant phylogenomics
PREMISE OF THE STUDY: Hyb-Seq, the combination of target enrichment and genome skimming
allows simultaneous data collection for low-copy nuclear genes and high-copy genomic targets
for plant systematics and evolution studies.
METHODS AND RESULTS: Genome and transcriptome assemblies for milkweed (Asclepias syriaca)
were utilized to design enrichment probes for 3385 exons from 768 genes (>1.6 Mbp) followed
by Illumina sequencing of enriched libraries. Hyb-Seq of twelve individuals (ten Asclepias
species and two related genera) resulted in at least partial assembly of 92.6% of exons and 99.7%
of genes and an average assembly length >2 Mbp. Importantly, complete plastomes and nrDNA
cistrons were assembled using off-target reads. Phylogenomic analyses demonstrated signal
conflict between genomes.
CONCLUSIONS: The Hyb-Seq approach enables targeted sequencing of thousands of low-copy
nuclear exons and flanking regions, as well as genome skimming of high-copy repeats and
organellar genomes, to efficiently produce genome-scale datasets for phylogenomics.This is an author's peer-reviewed final manuscript, as accepted by the publisher. The article is in press and will be published in Applications in Plant Sciences, Vol. 2, no. 9, September 2014.Keywords: Genome skimming, Hyb-Seq, Target enrichment, Phylogenomics, Species tree, Nuclear lociKeywords: Genome skimming, Hyb-Seq, Target enrichment, Phylogenomics, Species tree, Nuclear loc
Recommended from our members
eDNA as a tool for identifying freshwater species in sustainable forestry: A critical review and potential future applications
Environmental DNA (eDNA) is an emerging biological monitoring tool that can aid in assessing the effects of forestry and forest manufacturing activities on biota. Monitoring taxa across broad spatial and temporal scales is necessary to ensure forest management and forest manufacturing activities meet their environmental goals of maintaining biodiversity. Our objectives are to describe potential applications of eDNA across the wood products supply chain extending from regenerating forests, harvesting, and wood transport, to manufacturing facilities, and to review the current state of the science in this context. To meet our second objective, we summarize the taxa examined with targeted (PCR, qPCR or ddPCR) or metagenomic eDNA methods (eDNA metabarcoding), evaluate how estimated species richness compares between traditional field sampling and eDNA metabarcoding approaches, and compare the geographical representation of prior eDNA studies in freshwater ecosystems to global wood baskets. Potential applications of eDNA include evaluating the effects of forestry and forest manufacturing activities on aquatic biota, delineating fish-bearing versus non fish-bearing reaches, evaluating effectiveness of constructed road crossings for freshwater organism passage, and determining the presence of at-risk species. Studies using targeted eDNA approaches focused on fish, amphibians, and invertebrates, while metagenomic studies focused on fish, invertebrates, and microorganisms. Rare, threatened, or endangered species received the least attention in targeted eDNA research, but are arguably of greatest interest to sustainable forestry and forest manufacturing that seek to preserve freshwater biodiversity. Ultimately, using eDNA methods will enable forestry and forest manufacturing managers to have data-driven prioritization for conservation actions for all freshwater species
Recommended from our members
Phylogenetic marker development for target enrichment from transcriptome and genome skim data: the pipeline and its application in southern African Oxalis (Oxalidaceae)
Phylogenetics benefits from using a large number of putatively independent nuclear loci and their combination with other sources of information, such as the plastid and mitochondrial genomes. To facilitate the selection of orthologous low‐copy nuclear (LCN) loci for phylogenetics in nonmodel organisms, we created an automated and interactive script to select hundreds of LCN loci by a comparison between transcriptome and genome skim data. We used our script to obtain LCN genes for southern African Oxalis (Oxalidaceae), a speciose plant lineage in the Greater Cape Floristic Region. This resulted in 1164 LCN genes greater than 600 bp. Using target enrichment combined with genome skimming (Hyb‐Seq), we obtained on average 1141 LCN loci, nearly the whole plastid genome and the nrDNA cistron from 23 southern African Oxalis species. Despite a wide range of gene trees, the phylogeny based on the LCN genes was very robust, as retrieved through various gene and species tree reconstruction methods as well as concatenation. Cytonuclear discordance was strong. This indicates that organellar phylogenies alone are unlikely to represent the species tree and stresses the utility of Hyb‐Seq in phylogenetics
Recommended from our members
Reconciling Conflicting Phylogenies in the Origin of Sweet Potato and Dispersal to Polynesia
The sweet potato is one of the world’s most widely consumed crops, yet its evolutionary history is poorly understood. In this paper, we present a comprehensive phylogenetic study of all species closely related to the sweet potato and address several questions pertaining to the sweet potato that remained unanswered. Our research combined genome skimming and target DNA capture to sequence whole chloroplasts and 605 single-copy nuclear regions from 199 specimens representing the sweet potato and all of its crop wild relatives (CWRs). We present strongly supported nuclear and chloroplast phylogenies demonstrating that the sweet potato had an autopolyploid origin and that Ipomoea trifida is its closest relative, confirming that no other extant species were involved in its origin. Phylogenetic analysis of nuclear and chloroplast genomes shows conflicting topologies regarding the monophyly of the sweet potato. The process of chloroplast capture explains these conflicting patterns, showing that I. trifida had a dual role in the origin of the sweet potato, first as its progenitor and second as the species with which the sweet potato introgressed so one of its lineages could capture an I. trifida chloroplast. In addition, we provide evidence that the sweet potato was present in Polynesia in pre-human times. This, together with several other examples of long-distance dispersal in Ipomoea, negates the need to invoke ancient human-mediated transport as an explanation for its presence in Polynesia. These results have important implications for understanding the origin and evolution of a major global food crop and question the existence of pre-Columbian contacts between Polynesia and the American continent
Building a model: developing genomic resources for common milkweed (Asclepias syriaca) with low coverage genome sequencing
<p>Abstract</p> <p>Background</p> <p>Milkweeds (<it>Asclepias </it>L.) have been extensively investigated in diverse areas of evolutionary biology and ecology; however, there are few genetic resources available to facilitate and compliment these studies. This study explored how low coverage genome sequencing of the common milkweed (<it>Asclepias syriaca </it>L.) could be useful in characterizing the genome of a plant without prior genomic information and for development of genomic resources as a step toward further developing <it>A. syriaca </it>as a model in ecology and evolution.</p> <p>Results</p> <p>A 0.5× genome of <it>A. syriaca </it>was produced using Illumina sequencing. A virtually complete chloroplast genome of 158,598 bp was assembled, revealing few repeats and loss of three genes: <it>accD, clpP</it>, and <it>ycf1</it>. A nearly complete rDNA cistron (18S-5.8S-26S; 7,541 bp) and 5S rDNA (120 bp) sequence were obtained. Assessment of polymorphism revealed that the rDNA cistron and 5S rDNA had 0.3% and 26.7% polymorphic sites, respectively. A partial mitochondrial genome sequence (130,764 bp), with identical gene content to tobacco, was also assembled. An initial characterization of repeat content indicated that Ty1/<it>copia</it>-like retroelements are the most common repeat type in the milkweed genome. At least one <it>A. syriaca </it>microread hit 88% of <it>Catharanthus roseus </it>(Apocynaceae) unigenes (median coverage of 0.29×) and 66% of single copy orthologs (COSII) in asterids (median coverage of 0.14×). From this partial characterization of the <it>A. syriaca </it>genome, markers for population genetics (microsatellites) and phylogenetics (low-copy nuclear genes) studies were developed.</p> <p>Conclusions</p> <p>The results highlight the promise of next generation sequencing for development of genomic resources for any organism. Low coverage genome sequencing allows characterization of the high copy fraction of the genome and exploration of the low copy fraction of the genome, which facilitate the development of molecular tools for further study of a target species and its relatives. This study represents a first step in the development of a community resource for further study of plant-insect co-evolution, anti-herbivore defense, floral developmental genetics, reproductive biology, chemical evolution, population genetics, and comparative genomics using milkweeds, and <it>A. syriaca </it>in particular, as ecological and evolutionary models.</p
Recommended from our members
Supplemental data to "A draft genome and transcriptome of common milkweed (Asclepias syriaca) as resources for evolutionary, ecological, and molecular studies in milkweeds and Apocynaceae"
This dataset contains a draft assembly of the common milkweed (Asclepias syriaca) nuclear genome, linkage group information, and gene family counts for Asclepias and related species. The genome assembly is accompanied by annotation of gene models, repeat models, transfer RNAs, and open reading frames, and mapping information of Asclepias transcripts, Calotropis transcripts, and Coffea proteins onto the assembled scaffolds. The linkage group information includes data input into the linkage group analysis, R scripts for processing, and a final list of scaffolds assigned to linkage groups. Additional data includes the coding sequence alignment of P5βR paralogs described in the article and a table of gene family counts in Asclepias, other Apocynaceae, coffee (Coffea), and grape (Vitis)
Phylogeographic Patterns and Intervarietal Relationships within \u3ci\u3eLupinus lepidus\u3c/i\u3e: Morphological Differences, Genetic Similarities
Lupinus lepidus (Fabaceae) contains many morphologically divergent varieties and was restricted in its range during the last period of glaciation. A combination of phylogenetic (with the trnDT and LEGCYC1A loci) and population genetics approaches (with microsatellites and LEGCYC1A are used here to characterize intervarietal relationships and examine hypotheses of recolonization of areas in the Pacific Northwest affected by glaciation. Sequenced loci are not found to form a clade exclusive to L. lepidus, nor are any of the varieties found to form clades. Population genetics analyses reveal only negligible genetic structure within L. lepidus, with the majority of variation being found within populations. Isolation-by-distance analysis reveals some correlation between population genetic distances and geographic distance. Microsatellite and sequence results are consistent with a scenario whereby the Oregon and Washington regions were rapidly colonized from the south, with independent invasions along the eastern and western sides of the Cascade Mountains. A predicted disjunction between northern and southern populations is found within the microsatellite data but not the sequence data, suggesting that northern populations were recolonized via a process involving the spread of novel microsatellite mutations, perhaps through the persistence of a glacial refuge isolated from southern populations. Varieties are not shown to be genetically isolated, and are interpreted as representing ecotypes, with local selection outpacing the effects of migration
Recommended from our members
Supplemental data to "Estimating the Genetic Diversity of Pacific salmon and trout using Multi-gene eDNA Metabarcoding"
This dataset contains DNA sequence data of Oncorhynchus species, isolated from environmental DNA (eDNA) from Pacific Northwest streams via microfluidic eDNA metabarcoding and high-throughput (Ilummina) sequencing (samples collected from 2017-09-22 to 2017-10-10). It is accompanied by scripts and commands for data analyses including: sequence denoising, calculation of entropy values by codon position, and calculation of diversity statistics and haplotype mapping. Intermediate outputs include denoised haplotypes, entropy calculations, and haplotype summaries following chimera removal.Keywords: Environmental monitoring, species diversity, DNA, freshwater biodiversity conservation, stream health, environmental DNA, metabarcoding, genetic diversity, oncorhynchus, biodiversity, aquatic communit