11 research outputs found

    Detection of identity by descent using next-generation whole genome sequencing data

    Get PDF
    BACKGROUND: Identity by descent (IBD) has played a fundamental role in the discovery of genetic loci underlying human diseases. Both pedigree-based and population-based linkage analyses rely on estimating recent IBD, and evidence of ancient IBD can be used to detect population structure in genetic association studies. Various methods for detecting IBD, including those implemented in the soft- ware programs fastIBD and GERMLINE, have been developed in the past several years using population genotype data from microarray platforms. Now, next-generation DNA sequencing data is becoming increasingly available, enabling the comprehensive analysis of genomes, in- cluding identifying rare variants. These sequencing data may provide an opportunity to detect IBD with higher resolution than previously possible, potentially enabling the detection of disease causing loci that were previously undetectable with sparser genetic data. RESULTS: Here, we investigate how different levels of variant coverage in sequencing and microarray genotype data influences the resolution at which IBD can be detected. This includes microarray genotype data from the WTCCC study, denser genotype data from the HapMap Project, low coverage sequencing data from the 1000 Genomes Project, and deep coverage complete genome data from our own projects. With high power (78%), we can detect segments of length 0.4 cM or larger using fastIBD and GERMLINE in sequencing data. This compares to similar power to detect segments of length 1.0 cM or higher with microarray genotype data. We find that GERMLINE has slightly higher power than fastIBD for detecting IBD segments using sequencing data, but also has a much higher false positive rate. CONCLUSION: We further quantify the effect of variant density, conditional on genetic map length, on the power to resolve IBD segments. These investigations into IBD resolution may help guide the design of future next generation sequencing studies that utilize IBD, including family-based association studies, association studies in admixed populations, and homozygosity mapping studies

    Detecting Identity by Descent and Estimating Genotype Error Rates in Sequence Data

    Get PDF
    Existing methods for identity by descent (IBD) segment detection were designed for SNP array data, not sequence data. Sequence data have a much higher density of genetic variants and a different allele frequency distribution, and can have higher genotype error rates. Consequently, best practices for IBD detection in SNP array data do not necessarily carry over to sequence data. We present a method, IBDseq, for detecting IBD segments in sequence data and a method, SEQERR, for estimating genotype error rates at low-frequency variants by using detected IBD. The IBDseq method estimates probabilities of genotypes observed with error for each pair of individuals under IBD and non-IBD models. The ratio of estimated probabilities under the two models gives a LOD score for IBD. We evaluate several IBD detection methods that are fast enough for application to sequence data (IBDseq, Beagle Refined IBD, PLINK, and GERMLINE) under multiple parameter settings, and we show that IBDseq achieves high power and accuracy for IBD detection in sequence data. The SEQERR method estimates genotype error rates by comparing observed and expected rates of pairs of homozygote and heterozygote genotypes at low-frequency variants in IBD segments. We demonstrate the accuracy of SEQERR in simulated data, and we apply the method to estimate genotype error rates in sequence data from the UK10K and 1000 Genomes projects

    Beyond broad strokes: sociocultural insights from the study of ancient genomes

    Get PDF
    The amount of sequence data obtained from ancient samples has dramatically expanded in the last decade, and so have the type of questions that can now be addressed using ancient DNA. In the field of human history, while ancient DNA has provided answers to long-standing debates about major movements of people, it has also recently begun to inform on other important facets of the human experience. The field is now moving from mostly fixating on large-scale supra-regional studies to also taking a more local perspective, shedding light on socioeconomic processes, inheritance rules, marriage practices and technological diffusion. In this review, we summarize recent studies showcasing these types of insights, focusing on methods used to infer sociocultural aspects of human behaviour. This often involves working across disciplines that have, until recently, evolved in separation. We argue that multidisciplinary dialogue is crucial for a more integrated and richer reconstruction of human history, as it can yield extraordinary insights about past societies, reproductive behaviors and even lifestyle habits that would not have been possible to obtain otherwise

    Detection of identity by descent using next-generation whole genome sequencing data

    No full text
    Abstract Background Identity by descent (IBD) has played a fundamental role in the discovery of genetic loci underlying human diseases. Both pedigree-based and population-based linkage analyses rely on estimating recent IBD, and evidence of ancient IBD can be used to detect population structure in genetic association studies. Various methods for detecting IBD, including those implemented in the soft- ware programs fastIBD and GERMLINE, have been developed in the past several years using population genotype data from microarray platforms. Now, next-generation DNA sequencing data is becoming increasingly available, enabling the comprehensive analysis of genomes, in- cluding identifying rare variants. These sequencing data may provide an opportunity to detect IBD with higher resolution than previously possible, potentially enabling the detection of disease causing loci that were previously undetectable with sparser genetic data. Results Here, we investigate how different levels of variant coverage in sequencing and microarray genotype data influences the resolution at which IBD can be detected. This includes microarray genotype data from the WTCCC study, denser genotype data from the HapMap Project, low coverage sequencing data from the 1000 Genomes Project, and deep coverage complete genome data from our own projects. With high power (78%), we can detect segments of length 0.4 cM or larger using fastIBD and GERMLINE in sequencing data. This compares to similar power to detect segments of length 1.0 cM or higher with microarray genotype data. We find that GERMLINE has slightly higher power than fastIBD for detecting IBD segments using sequencing data, but also has a much higher false positive rate. Conclusion We further quantify the effect of variant density, conditional on genetic map length, on the power to resolve IBD segments. These investigations into IBD resolution may help guide the design of future next generation sequencing studies that utilize IBD, including family-based association studies, association studies in admixed populations, and homozygosity mapping studies.</p

    SNP discovery and selection in Cape buffalo for bTB association study, using an African buffalo genome reference

    Get PDF
    Thesis (MSc)--Stellenbosch University, 2021.ENGLISH ABSTRACT: The African buffalo (Syncerus caffer) is an important herd-based bovid in Africa, which is ubiquitous across almost the entire continent. These animals also act as a maintenance host for the ever-present threat that is bovine tuberculosis (bTB). The animal facilitates the spread and continued existence of the health problem that is bTB amongst wildlife and domestic cattle populations throughout Africa, causing problems in terms of conservation and economic loss. The disease is endemic to the southern part of Africa, especially South Africa, where two major national parks, The Kruger National Park (KNP) and Hluhluwe-iMfolozi Park (HiP), are host to it. There are also spill-over events of the disease from animals to humans, which is especially problematic in South Africa where tuberculosis (TB) in humans is already a major health concern. This study aimed to use 40 high-quality low-coverage African buffalo whole genome sequences in conjunction with a species-specific reference genome to create a panel of single nucleotide polymorphisms (SNPs) for use in further research in genetic association in buffalo bTB susceptibility. The sequences were from 40 Cape buffalo from 4 South African national parks, namely KNP, HiP and two bTB unexposed regions, the Mokala National Park (MNP) and Addo Elephant National Park (AENP). From this we produced a panel of 3698 high quality SNPs across 26 immune related genes in the African buffalo genome. One hundred and forty-three of these SNPs in three genes from the panel was used in a preliminary targeted association test with bTB exposure, which produced 10 SNPs associated with TB exposure. This may aid in future research and subsequent association studies.AFRIKAANSE OPSOMMING: Geen opsomming beskikbaarMaster
    corecore