11 research outputs found

    entropy_variants0.6

    No full text
    Input file for analysis with the "entropy" model. This file is a filtered version of the VCF file, including genotype likelihoods for loci with MAF>1%. Additionally, we randomly selected 1 SNP per contig for contigs with multiple variant sites, to ensure independence of SNPs

    split_scaffold_burbot_ref_long_94

    No full text
    Because there are no existing genomic resources for burbot, we constructed an artificial reference genome using smng (Seq- Man NGen, DNAstar), using 25 million reads. This artificial reference genome contains 53,789 contigs. We used this assembly as a template to for a reference-based assembly with bwa

    VCF file

    No full text
    This is a .vcf file produced by calling variant genetic sites using samtools and bcftools. For each variant site identified, we required that >60% of all individuals had 1 or more read at that locus

    lib2_sucker_barcode_key

    No full text
    Barcode-ID pairs for individual fish for lane 2 of sequencing. (Please note that there are individuals included in this barcode file that belong to a different project, and are not discussed in the related paper.

    Locations

    No full text
    Approximate latitude and longitude of samped populations. NOTE: these are estimated from maps, and do not represent exact sampling locations, as GPS coordinates were not taken in the field

    Assemblies

    No full text
    This is a tarred (compressed) file that contains assemblies to an artificial reference genome. When untarred, this file will contain >800 files in .bam format, one for each individual fish. These files were produced using bwa (Burrows-Wheeler Aligner, Li and Durbin 2009)

    artificial reference genome

    No full text
    This artificial reference genome was constructed via a de novo assembly (using DNAstar's smng) of a subset of 25 million short sequence reads, distributed across all taxa. We then assembled short reads to thi

    names_common_variants_0.5

    No full text
    This file is the formatted input file for entropy, the hierarchical Bayesian model used in this study. The file contains data for a subset of SNPs, in a simplified genotype likelihood format. SNPs in this file have a minor allele frequency >0.05, and a maximum of 1 alternate allele. This file also contains only one locus per contig

    lib1_sucker_barcode_key

    No full text
    Barcode-ID pairs for individual fish for lane 1 of sequencing. (Please note that there are individuals included in this barcode file that belong to a different project, and are not discussed in the related paper.

    VCF file

    No full text
    This is the raw VCF file containing all 32,978 variant genetic sites initially identified from assembled reads
    corecore