Comparison of variant calling methods for whole genome sequencing data in dairy cattle

Abstract

Accurate identification of SNPs from next-generation sequencing data is crucial for high-quality downstream analysis. Whole genome sequence data of 65 key ancestors of genotyped Swiss dairy populations were available for investigation (24 billion reads, 96.8% mapped to UMD31, 12x coverage). Four publically available variant calling programmes were assessed and different levels of pre-calling handling for each method were tested and compared. SNP concordance was examined with Illumina’s BovineHD Genotyping BeadChip®. Depending on variant calling software used, between 16,894,054 and 22,048,382 SNP were identified (multi-sample calling). A total of 14,644,310 SNP were identified by all four variant callers (multi-sample calling). InDel counts ranged from 1,997,791 to 2,857,754; 1,708,649 InDels were identified by all four variant callers. A minimum of pre-calling data handling resulted in the highest non-reference sensitivity and the lowest non-reference discrepancy rates

    Similar works