9 research outputs found

    Efficient genome ancestry inference in complex pedigrees with inbreeding

    Get PDF
    Motivation: High-density SNP data of model animal resources provides opportunities for fine-resolution genetic variation studies. These genetic resources are generated through a variety of breeding schemes that involve multiple generations of matings derived from a set of founder animals. In this article, we investigate the problem of inferring the most probable ancestry of resulting genotypes, given a set of founder genotypes. Due to computational difficulty, existing methods either handle only small pedigree data or disregard the pedigree structure. However, large pedigrees of model animal resources often contain repetitive substructures that can be utilized in accelerating computation

    Isomorphism and Similarity for 2-Generation Pedigrees

    Get PDF
    We consider the emerging problem of comparing the similarity between (unlabeled) pedigrees. More specifically, we focus on the simplest pedigrees, namely, the 2-generation pedigrees. We show that the isomorphism testing for two 2-generation pedigrees is GI-hard. If the 2-generation pedigrees are monogamous (i.e., each individual at level-1 can mate with exactly one partner) then the isomorphism testing problem can be solved in polynomial time. We then consider the problem by relaxing it into an NP-complete decomposition problem which can be formulated as the Minimum Common Integer Pair Partition (MCIPP) problem, which we show to be FPT by exploiting a property of the optimal solution. While there is still some difficulty to overcome, this lays down a solid foundation for this research

    Haplotypes versus genotypes on pedigrees

    Get PDF
    Abstract. Genome sequencing will soon produce haplotype data for individuals. For pedigrees of related individuals, sequencing appears to be an attractive alternative to genotyping. However, methods for pedigree analysis with haplotype data have not yet been developed, and the computational complexity of such problems has been an open question. Furthermore, it is not clear in which scenarios haplotype data would provide better estimates than genotype data for quantities such as recombination rates. To answer these questions, a reduction is given from genotype problem instances to haplotype problem instances, and it is shown that solving the haplotype problem yields the solution to the genotype problem, up to constant factors or coefficients. The pedigree analysis problems we will consider are the likelihood, maximum probability haplotype, and minimum recombination haplotype problems. Two algorithms are introduced: an exponential-time hidden Markov model (HMM) for haplotype data where some individuals are untyped, and a linear-time algorithm for pedigrees having haplotype data for all individuals. Recombination estimates from the general haplotype HMM algorithm are compared to recombination estimates produced by a genotype HMM. Having haplotype data on all individuals produces better estimates. However, having several untyped individuals can drastically reduce the utility of haplotype data. Pedigree analysis, both linkage and association studies, has a long history of important contributions to genetics, including disease-gene finding and some of the first genetic maps for humans. Recent contributions include fine-scale recombination maps in humans [4], regions linked to Schizophrenia that might be missed by genome-wide association studies [11], and insights into the relationship between cystic fibrosis and fertility [13]. Algorithms for pedigree problems are of great interest to the computer science community, in part because of connections to machine learning algorithms, optimization methods, and combinatorics [7, 16

    On Counting the Number of Consistent Genotype Assignments for Pedigrees

    Get PDF
    Consistency checking of genotype information in pedigrees plays an important role in genetic analysis and for complex pedigrees the computational complexity is critical. We present here a detailed complexity analysis for the problem of counting the number of complete consistent genotype assignments. Our main result is a polynomial time algorithm for counting the number of complete consistent assignments for non-looping pedigrees. We further classify pedigrees according to a number of natural parameters like the number of generations, the number of children per individual and the cardinality of the set of alleles. We show that even if we assume all these parameters as bounded by reasonably small constants, the counting problem becomes computationally hard (#P-complete) for looping pedigrees. The border line for counting problems computable in polynomial time (i.e. belonging to the class FP) and #P-hard problems is completed by showing that even for general pedigrees with unlimited number of generations and alleles but with at most one child per individual and for pedigrees with at most two generations and two children per individual the counting problem is in FP

    Fast and Accurate Haplotype Inference with Hidden Markov Model

    Get PDF
    The genome of human and other diploid organisms consists of paired chromosomes. The haplotype information (DNA constellation on one single chromosome), which is crucial for disease association analysis and population genetic inference among many others, is however hidden in the data generated for diploid organisms (including human) by modern high-throughput technologies which cannot distinguish information from two homologous chromosomes. Here, I consider the haplotype inference problem in two common scenarios of genetic studies: 1. Model organisms (such as laboratory mice): Individuals are bred through prescribed pedigree design. 2. Out-bred organisms (such as human): Individuals (mostly unrelated) are drawn from one or more populations or continental groups. In the two scenarios, one individual may share short blocks of chromosomes with other individual(s) or with founder(s) if available. I have developed and implemented methods, by identifying the shared blocks statistically, to accurately and more rapidly reconstruct the haplotypes for individuals under study and to solve important related problems including genotype imputation and ancestry inference. My methods, based on hidden Markov model, can scale up to tens of thousands of individuals. Analysis based on my method leads to a new genetic map in mouse population which reveals important biological properties of the recombination process. I have also explored the study design and empirical quality control for imputation tasks with large scale datasets from admixed population.Doctor of Philosoph
    corecore