1,369 research outputs found

    An FPT haplotyping algorithm on pedigrees with a small number of sites

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Genetic disease studies investigate relationships between changes in chromosomes and genetic diseases. Single haplotypes provide useful information for these studies but extracting single haplotypes directly by biochemical methods is expensive. A computational method to infer haplotypes from genotype data is therefore important. We investigate the problem of computing the minimum number of recombination events for general pedigrees with a small number of sites for all members.</p> <p>Results</p> <p>We show that this NP-hard problem can be parametrically reduced to the Bipartization by Edge Removal problem with additional parity constraints. We solve this problem with an exact algorithm that runs in <inline-formula><graphic file="1748-7188-6-8-i1.gif"/></inline-formula> time, where <it>n </it>is the number of members, <it>m </it>is the number of sites, and <it>k </it>is the number of recombination events.</p> <p>Conclusions</p> <p>This algorithm infers haplotypes for a small number of sites, which can be useful for genetic disease studies to track down how changes in haplotypes such as recombinations relate to genetic disease.</p

    Haplotype inference in general pedigrees with two sites

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Genetic disease studies investigate relationships between changes in chromosomes and genetic diseases. Single haplotypes provide useful information for these studies but extracting single haplotypes directly by biochemical methods is expensive. A computational method to infer haplotypes from genotype data is therefore important. We investigate the problem of computing the minimum number of recombination events for general pedigrees with two sites for all members.</p> <p>Results</p> <p>We show that this NP-hard problem can be parametrically reduced to the Bipartization by Edge Removal problem and therefore can be solved by an <it>O</it>(2<it><sup>k</sup></it> · <it>n</it><sup>2</sup>) exact algorithm, where <it>n</it> is the number of members and <it>k</it> is the number of recombination events.</p> <p>Conclusions</p> <p>Our work can therefore be useful for genetic disease studies to track down how changes in haplotypes such as recombinations relate to genetic disease.</p

    Haplotype Inference on Pedigrees with Recombinations, Errors, and Missing Genotypes via SAT solvers

    Full text link
    The Minimum-Recombinant Haplotype Configuration problem (MRHC) has been highly successful in providing a sound combinatorial formulation for the important problem of genotype phasing on pedigrees. Despite several algorithmic advances and refinements that led to some efficient algorithms, its applicability to real datasets has been limited by the absence of some important characteristics of these data in its formulation, such as mutations, genotyping errors, and missing data. In this work, we propose the Haplotype Configuration with Recombinations and Errors problem (HCRE), which generalizes the original MRHC formulation by incorporating the two most common characteristics of real data: errors and missing genotypes (including untyped individuals). Although HCRE is computationally hard, we propose an exact algorithm for the problem based on a reduction to the well-known Satisfiability problem. Our reduction exploits recent progresses in the constraint programming literature and, combined with the use of state-of-the-art SAT solvers, provides a practical solution for the HCRE problem. Biological soundness of the phasing model and effectiveness (on both accuracy and performance) of the algorithm are experimentally demonstrated under several simulated scenarios and on a real dairy cattle population.Comment: 14 pages, 1 figure, 4 tables, the associated software reHCstar is available at http://www.algolab.eu/reHCsta

    Design and Association Methods for Next-generation Sequencing Studies for Quantitative Traits.

    Full text link
    Advances in exome sequencing and the development of exome genotyping arrays are enabling explorations of association between rare coding variants and complex traits using sequencing-based GWAS. However, the cost of sequencing remains high, optimal study design for sequencing-based association studies is an open question, powerful association methods and software to detect trait-associated rare and low-frequency variants are in great need. Containing 5% of information in human genome, chromosome X analysis has been largely neglected in routine GWAS analysis. In this dissertation, I focus on three topics: First, I describe a computationally efficient approach to re-construct gene-level association test statistics from single-variant summary statistics and their covariance matrices for single studies and meta-analyses. By simulation and real data examples, I evaluate our methods under the null, investigate scenarios when family samples have larger power than population samples, compare power of different types of gene-level tests under various trait-generating models, and demonstrate the usage of our methods and the C++ software, RAREMETAL, by meta-analyzing SardiNIA and HUNT data on lipids levels. Second, I describe a variance component approach and a series of gene-level tests for X-linked rare variants analysis. By simulations, I demonstrate that our methods are well controlled under the null. I evaluate power to detect an autosomal or X-linked gene of same effect size, and investigate the effect of sex ratio in a sample to power of detecting an X-linked gene. Finally I demonstrate usage of our method and the C++ software by analyzing various quantitative traits measured in the SardiNIA study and report detected X-linked variants and genes. Third, I describe a novel likelihood-based approach and the C++ software, RAREFY, to prioritize samples that are more likely to be carriers of trait-associated variants in a sample, with limited budget. I first describe the statistical method for small pedigrees and then describe an MCMC approach to make our method computationally feasible for large pedigrees. By simulations and real data analysis, I compare our approach with other methods in both trait-associated allele discovery power and association power, and demonstrate the usage of our method on pedigrees from the SardiNIA study.PhDBiostatisticsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/113521/1/sfengsph_1.pd

    KELVIN: A Software Package for Rigorous Measurement of Statistical Evidence in Human Genetics

    Get PDF
    This paper describes the software package KELVIN, which supports the PPL (posterior probability of linkage) framework for the measurement of statistical evidence in human (or more generally, diploid) genetic studies. In terms of scope, KELVIN supports two-point (trait-marker or marker-marker) and multipoint linkage analysis, based on either sex-averaged or sex-specific genetic maps, with an option to allow for imprinting; trait-marker linkage disequilibrium (LD), or association analysis, in case-control data, trio data, and/or multiplex family data, with options for joint linkage and trait-marker LD or conditional LD given linkage; dichotomous trait, quantitative trait and quantitative trait threshold models; and certain types of gene-gene interactions and covariate effects. Features and data (pedigree) structures can be freely mixed and matched within analyses. The statistical framework is specifically tailored to accumulate evidence in a mathematically rigorous way across multiple data sets or data subsets while allowing for multiple sources of heterogeneity, and KELVIN itself utilizes sophisticated software engineering to provide a powerful and robust platform for studying the genetics of complex disorders

    Conflation of short identity-by-descent segments bias their inferred length distribution

    Full text link
    Identity-by-descent (IBD) is a fundamental concept in genetics with many applications. In a common definition, two haplotypes are said to contain an IBD segment if they share a segment that is inherited from a recent shared common ancestor without intervening recombination. Long IBD segments (> 1cM) can be efficiently detected by a number of algorithms using high-density SNP array data from a population sample. However, these approaches detect IBD based on contiguous segments of identity-by-state, and such segments may exist due to the conflation of smaller, nearby IBD segments. We quantified this effect using coalescent simulations, finding that nearly 40% of inferred segments 1-2cM long are results of conflations of two or more shorter segments, under demographic scenarios typical for modern humans. This biases the inferred IBD segment length distribution, and so can affect downstream inferences. We observed this conflation effect universally across different IBD detection programs and human demographic histories, and found inference of segments longer than 2cM to be much more reliable (less than 5% conflation rate). As an example of how this can negatively affect downstream analyses, we present and analyze a novel estimator of the de novo mutation rate using IBD segments, and demonstrate that the biased length distribution of the IBD segments due to conflation can lead to inflated estimates if the conflation is not modeled. Understanding the conflation effect in detail will make its correction in future methods more tractable
    • …
    corecore