11 research outputs found
entropy_variants0.6
Input file for analysis with the "entropy" model. This file is a filtered version of the VCF file, including genotype likelihoods for loci with MAF>1%. Additionally, we randomly selected 1 SNP per contig for contigs with multiple variant sites, to ensure independence of SNPs
split_scaffold_burbot_ref_long_94
Because there are no existing genomic resources for burbot, we constructed an artificial reference genome using smng (Seq- Man NGen, DNAstar), using 25 million reads. This artificial reference genome contains 53,789 contigs. We used this assembly as a template to for a reference-based assembly with bwa
VCF file
This is a .vcf file produced by calling variant genetic sites using samtools and bcftools. For each variant site identified, we required that >60% of all individuals had 1 or more read at that locus
lib2_sucker_barcode_key
Barcode-ID pairs for individual fish for lane 2 of sequencing. (Please note that there are individuals included in this barcode file that belong to a different project, and are not discussed in the related paper.
Locations
Approximate latitude and longitude of samped populations. NOTE: these are estimated from maps, and do not represent exact sampling locations, as GPS coordinates were not taken in the field
Assemblies
This is a tarred (compressed) file that contains assemblies to an artificial reference genome. When untarred, this file will contain >800 files in .bam format, one for each individual fish. These files were produced using bwa (Burrows-Wheeler Aligner, Li and Durbin 2009)
artificial reference genome
This artificial reference genome was constructed via a de novo assembly (using DNAstar's smng) of a subset of 25 million short sequence reads, distributed across all taxa. We then assembled short reads to thi
names_common_variants_0.5
This file is the formatted input file for entropy, the hierarchical Bayesian model used in this study. The file contains data for a subset of SNPs, in a simplified genotype likelihood format. SNPs in this file have a minor allele frequency >0.05, and a maximum of 1 alternate allele. This file also contains only one locus per contig
lib1_sucker_barcode_key
Barcode-ID pairs for individual fish for lane 1 of sequencing. (Please note that there are individuals included in this barcode file that belong to a different project, and are not discussed in the related paper.
VCF file
This is the raw VCF file containing all 32,978 variant genetic sites initially identified from assembled reads