10 research outputs found

    A High-Quality Reference Genome for the Invasive Mosquitofish Gambusia affinis Using a Chicago Library

    No full text
    The western mosquitofish, Gambusia affinis, is a freshwater poecilid fish native to the southeastern United States but with a global distribution due to widespread human introduction. Gambusia affinis has been used as a model species for a broad range of evolutionary and ecological studies. We sequenced the genome of a male G. affinis to facilitate genetic studies in diverse fields including invasion biology and comparative genetics. We generated Illumina short read data from paired-end libraries and in vitro proximity-ligation libraries. We obtained 54.9× coverage, N50 contig length of 17.6 kb, and N50 scaffold length of 6.65 Mb. Compared to two other species in the Poeciliidae family, G. affinis has slightly fewer genes that have shorter total, exon, and intron length on average. Using a set of universal single-copy orthologs in fish genomes, we found 95.5% of these genes were complete in the G. affinis assembly. The number of transposable elements in the G. affinis assembly is similar to those of closely related species. The high-quality genome sequence and annotations we report will be valuable resources for scientists to map the genetic architecture of traits of interest in this species

    Adapterama IV: Sequence Capture of Dual-digest RADseq Libraries with Identifiable Duplicates (RADcap)

    No full text
    AbstractMolecular ecologists seek to genotype hundreds to thousands of loci from hundreds to thousands of individuals at minimal cost per sample. Current methods such as restriction site associated DNA sequencing (RADseq) and sequence capture are constrained by costs associated with inefficient use of sequencing data and sample preparation, respectively. Here, we demonstrate RADcap, an approach that combines the major benefits of RADseq (low cost with specific start positions) with those of sequence capture (repeatable sequencing of specific loci) to significantly increase efficiency and reduce costs relative to current approaches. The RADcap approach uses a new version of dual-digest RADseq (3RAD) to identify candidate SNP loci for capture bait design, and subsequently uses custom sequence capture baits to consistently enrich candidate SNP loci across many individuals. We combined this approach with a new library preparation method for identifying and removing PCR duplicates from 3RAD libraries, which allows researchers to process RADseq data using traditional pipelines, and we tested the RADcap method by genotyping sets of 96 to 384Wisteriaplants. Our results demonstrate that our RADcap method: 1) can methodologically reduce (to &lt;5%) and computationally remove PCR duplicate reads from data; (2) achieves 80-90% reads-on-target in 11 of 12 enrichments; (3) returns consistent coverage (≥4x) across &gt;90% of individuals at up to 99.9% of the targeted loci; (4) produces consistently high occupancy matrices of genotypes across hundreds of individuals; and (5) is inexpensive, with reagent and sequencing costs totaling &lt;$6/sample and adapter and primer costs of only a few hundred dollars.</jats:p

    Adapterama II: universal amplicon sequencing on Illumina platforms (TaggiMatrix)

    No full text
    Next-generation sequencing (NGS) of amplicons is used in a wide variety of contexts. In many cases, NGS amplicon sequencing remains overly expensive and inflexible, with library preparation strategies relying upon the fusion of locus-specific primers to full-length adapter sequences with a single identifying sequence or ligating adapters onto PCR products. In , we presented universal stubs and primers to produce thousands of unique index combinations and a modifiable system for incorporating them into Illumina libraries. Here, we describe multiple ways to use the system and other approaches for amplicon sequencing on Illumina instruments. In the variant we use most frequently for large-scale projects, we fuse partial adapter sequences (TruSeq or Nextera) onto the 5\u27 end of locus-specific PCR primers with variable-length tag sequences between the adapter and locus-specific sequences. These fusion primers can be used combinatorially to amplify samples within a 96-well plate (8 forward primers + 12 reverse primers yield 8 × 12 = 96 combinations), and the resulting amplicons can be pooled. The initial PCR products then serve as template for a second round of PCR with dual-indexed iTru or iNext primers (also used combinatorially) to make full-length libraries. The resulting quadruple-indexed amplicons have diversity at most base positions and can be pooled with any standard Illumina library for sequencing. The number of sequencing reads from the amplicon pools can be adjusted, facilitating deep sequencing when required or reducing sequencing costs per sample to an economically trivial amount when deep coverage is not needed. We demonstrate the utility and versatility of our approaches with results from six projects using different implementations of our protocols. Thus, we show that these methods facilitate amplicon library construction for Illumina instruments at reduced cost with increased flexibility. A simple web page to design fusion primers compatible with iTru primers is available at: http://baddna.uga.edu/tools-taggi.html. A fast and easy to use program to demultiplex amplicon pools with internal indexes is available at: https://github.com/lefeverde/Mr_Demuxy

    Adapterama III: Quadruple-indexed, double/triple-enzyme RADseq libraries (2RAD/3RAD)

    No full text
    Molecular ecologists frequently use genome reduction strategies that rely upon restriction enzyme digestion of genomic DNA to sample consistent portions of the genome from many individuals (e.g., RADseq, GBS). However, researchers often find the existing methods expensive to initiate and/or difficult to implement consistently, especially because it is difficult to multiplex sufficient numbers of samples to fill entire sequencing lanes. Here, we introduce a low-cost and highly robust approach for the construction of dual-digest RADseq libraries that build on adapters and primers designed in . Major features of our method include: (1) minimizing the number of processing steps; (2) focusing on a single strand of sample DNA for library construction, allowing the use of a non-phosphorylated adapter on one end; (3) ligating adapters in the presence of active restriction enzymes, thereby reducing chimeras; (4) including an optional third restriction enzyme to cut apart adapter-dimers formed by the phosphorylated adapter, thus increasing the efficiency of adapter ligation to sample DNA, which is particularly effective when only low quantity/quality DNA samples are available; (5) interchangeable adapter designs; (6) incorporating variable-length internal indexes within the adapters to increase the scope of sample indexing, facilitate pooling, and increase sequence diversity; (7) maintaining compatibility with universal dual-indexed primers and thus, Illumina sequencing reagents and libraries; and, (8) easy modification for the identification of PCR duplicates. We present eight adapter designs that work with 72 restriction enzyme combinations. We demonstrate the efficiency of our approach by comparing it with existing methods, and we validate its utility through the discovery of many variable loci in a variety of non-model organisms. Our 2RAD/3RAD method is easy to perform, has low startup costs, has increased utility with low-concentration input DNA, and produces libraries that can be highly-multiplexed and pooled with other Illumina libraries

    Supplemental Material for Hoffberg et al., 2018

    No full text
    Figure S1: Comparison of the size distribution of library inserts in the Meraculous and HiRise assemblies.<div><br></div><div>Figure S2: The frequency of kmers at each kmer length. </div><div><br></div><div>Figure S3: The distribution of scaffold lengths in the HiRise assembly. </div><div><br></div><div>Figure S4: The cumulative percent of the assembly for a given scaffold size in the Meraculous and HiRise assemblies. </div><div><br></div><div>Table S1: A detailed list of the number of copies and percent of the assembly of transposons and repeatable elements. </div><div><br></div><p>File S1: Submission script for MAKER.</p><p><br></p> <p>File S2: MAKER executable file (maker_exe.ctl).</p><p><br></p> <p>File S3: Specifications for downstream filtering of BLAST and Exonerate alignments (maker_bopts.ctl).</p><p><br></p> <p>File S4: Primary configuration of MAKER specific options (maker_opts.ctl).</p><p><br></p> <p>File S5: Commands for training SNAP.</p> <p><br></p><p>File S6: Submission script for BLAST comparing <i>Gambusia affinis</i> with related fish.</p> <p><br></p><p>File S7: Submission script for BUSCO.</p> <p><br></p><p>File S8: Submission script for predicting ncRNAs.</p> <p><br></p><p>File S9: Illumina reads mapped to the reference in BAM format.</p><p><br></p><p>File S10: Sequence of tRNAs.</p> <p><br></p><p>File S11: Structure of tRNAs.</p> <div><br></div><div>File S12: rRNA, snRNA, snoRNA, and miRNA sequences.</div><div><br></div

    Adapterama III: Quadruple-indexed, double/triple-enzyme RADseq libraries (2RAD/3RAD)

    No full text
    Molecular ecologists frequently use genome reduction strategies that rely upon restriction enzyme digestion of genomic DNA to sample consistent portions of the genome from many individuals (e.g., RADseq, GBS). However, researchers often find the existing methods expensive to initiate and/or difficult to implement consistently, especially because it is difficult to multiplex sufficient numbers of samples to fill entire sequencing lanes. Here, we introduce a low-cost and highly robust approach for the construction of dual-digest RADseq libraries that build on adapters and primers designed in Adapterama I. Major features of our method include: (1) minimizing the number of processing steps; (2) focusing on a single strand of sample DNA for library construction, allowing the use of a non-phosphorylated adapter on one end; (3) ligating adapters in the presence of active restriction enzymes, thereby reducing chimeras; (4) including an optional third restriction enzyme to cut apart adapter-dimers formed by the phosphorylated adapter, thus increasing the efficiency of adapter ligation to sample DNA, which is particularly effective when only low quantity/quality DNA samples are available; (5) interchangeable adapter designs; (6) incorporating variable-length internal indexes within the adapters to increase the scope of sample indexing, facilitate pooling, and increase sequence diversity; (7) maintaining compatibility with universal dual-indexed primers and thus, Illumina sequencing reagents and libraries; and, (8) easy modification for the identification of PCR duplicates. We present eight adapter designs that work with 72 restriction enzyme combinations. We demonstrate the efficiency of our approach by comparing it with existing methods, and we validate its utility through the discovery of many variable loci in a variety of non-model organisms. Our 2RAD/3RAD method is easy to perform, has low startup costs, has increased utility with low-concentration input DNA, and produces libraries that can be highly-multiplexed and pooled with other Illumina libraries

    Adapterama II: universal amplicon sequencing on Illumina platforms (TaggiMatrix)

    Get PDF
    Next-generation sequencing (NGS) of amplicons is used in a wide variety of contexts. In many cases, NGS amplicon sequencing remains overly expensive and inflexible, with library preparation strategies relying upon the fusion of locus-specific primers to full-length adapter sequences with a single identifying sequence or ligating adapters onto PCR products. In Adapterama I, we presented universal stubs and primers to produce thousands of unique index combinations and a modifiable system for incorporating them into Illumina libraries. Here, we describe multiple ways to use the Adapterama system and other approaches for amplicon sequencing on Illumina instruments. In the variant we use most frequently for large-scale projects, we fuse partial adapter sequences (TruSeq or Nextera) onto the 5′ end of locus-specific PCR primers with variable-length tag sequences between the adapter and locus-specific sequences. These fusion primers can be used combinatorially to amplify samples within a 96-well plate (8 forward primers + 12 reverse primers yield 8 × 12 = 96 combinations), and the resulting amplicons can be pooled. The initial PCR products then serve as template for a second round of PCR with dual-indexed iTru or iNext primers (also used combinatorially) to make full-length libraries. The resulting quadruple-indexed amplicons have diversity at most base positions and can be pooled with any standard Illumina library for sequencing. The number of sequencing reads from the amplicon pools can be adjusted, facilitating deep sequencing when required or reducing sequencing costs per sample to an economically trivial amount when deep coverage is not needed. We demonstrate the utility and versatility of our approaches with results from six projects using different implementations of our protocols. Thus, we show that these methods facilitate amplicon library construction for Illumina instruments at reduced cost with increased flexibility. A simple web page to design fusion primers compatible with iTru primers is available at: http://baddna.uga.edu/tools-taggi.html. A fast and easy to use program to demultiplex amplicon pools with internal indexes is available at: https://github.com/lefeverde/Mr_Demuxy
    corecore