323 research outputs found

    Haplotype inference in general pedigrees with two sites

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Genetic disease studies investigate relationships between changes in chromosomes and genetic diseases. Single haplotypes provide useful information for these studies but extracting single haplotypes directly by biochemical methods is expensive. A computational method to infer haplotypes from genotype data is therefore important. We investigate the problem of computing the minimum number of recombination events for general pedigrees with two sites for all members.</p> <p>Results</p> <p>We show that this NP-hard problem can be parametrically reduced to the Bipartization by Edge Removal problem and therefore can be solved by an <it>O</it>(2<it><sup>k</sup></it> · <it>n</it><sup>2</sup>) exact algorithm, where <it>n</it> is the number of members and <it>k</it> is the number of recombination events.</p> <p>Conclusions</p> <p>Our work can therefore be useful for genetic disease studies to track down how changes in haplotypes such as recombinations relate to genetic disease.</p

    An FPT haplotyping algorithm on pedigrees with a small number of sites

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Genetic disease studies investigate relationships between changes in chromosomes and genetic diseases. Single haplotypes provide useful information for these studies but extracting single haplotypes directly by biochemical methods is expensive. A computational method to infer haplotypes from genotype data is therefore important. We investigate the problem of computing the minimum number of recombination events for general pedigrees with a small number of sites for all members.</p> <p>Results</p> <p>We show that this NP-hard problem can be parametrically reduced to the Bipartization by Edge Removal problem with additional parity constraints. We solve this problem with an exact algorithm that runs in <inline-formula><graphic file="1748-7188-6-8-i1.gif"/></inline-formula> time, where <it>n </it>is the number of members, <it>m </it>is the number of sites, and <it>k </it>is the number of recombination events.</p> <p>Conclusions</p> <p>This algorithm infers haplotypes for a small number of sites, which can be useful for genetic disease studies to track down how changes in haplotypes such as recombinations relate to genetic disease.</p

    Parsimony-based genetic algorithm for haplotype resolution and block partitioning

    Get PDF
    This dissertation proposes a new algorithm for performing simultaneous haplotype resolution and block partitioning. The algorithm is based on genetic algorithm approach and the parsimonious principle. The multiloculs LD measure (Normalized Entropy Difference) is used as a block identification criterion. The proposed algorithm incorporates missing data is a part of the model and allows blocks of arbitrary length. In addition, the algorithm provides scores for the block boundaries which represent measures of strength of the boundaries at specific positions. The performance of the proposed algorithm was validated by running it on several publicly available data sets including the HapMap data and comparing results to those of the existing state-of-the-art algorithms. The results show that the proposed genetic algorithm provides the accuracy of haplotype decomposition within the range of the same indicators shown by the other algorithms. The block structure output by our algorithm in general agrees with the block structure for the same data provided by the other algorithms. Thus, the proposed algorithm can be successfully used for block partitioning and haplotype phasing while providing some new valuable features like scores for block boundaries and fully incorporated treatment of missing data. In addition, the proposed algorithm for haplotyping and block partitioning is used in development of the new clustering algorithm for two-population mixed genotype samples. The proposed clustering algorithm extracts from the given genotype sample two clusters with substantially different block structures and finds haplotype resolution and block partitioning for each cluster

    Modelling dependencies in genetic-marker data and its application to haplotype analysis

    Get PDF
    The objective of this thesis is to develop new methods to reconstruct haplotypes from phaseunknown genotypes. The need for new methodologies is motivated by the increasing avail¬ ability of high-resolution marker data for many species. Such markers typically exhibit correlations, a phenomenon known as Linkage Disequilibrium (LD). It is believed that re¬ constructed haplotypes for markers in high LD can be valuable for a variety of application areas in population genetics, including reconstructing population history and identifying genetic disease variantsTraditionally, haplotype reconstruction methods can be categorized according to whether they operate on a single pedigree or a collection of unrelated individuals. The thesis begins with a critical assessment of the limitations of existing methods, and then presents a uni¬ fied statistical framework that can accommodate pedigree data, unrelated individuals and tightly linked markers. The framework makes use of graphical models, where inference entails representing the relevant joint probability distribution as a graph and then using associated algorithms to facilitate computation. The graphical model formalism provides invaluable tools to facilitate model specification, visualization, and inference.Once the unified framework is developed, a broad range of simulation studies are conducted using previously published haplotype data. Important contributions include demonstrating the different ways in which the haplotype frequency distribution can impact the accuracy of both the phase assignments and haplotype frequency estimates; evaluating the effectiveness of using family data to improve accuracy for different frequency profiles; and, assessing the dangers of treating related individuals as unrelated in an association study

    Statistical physics methods in computational biology

    Get PDF
    The interest of statistical physics for combinatorial optimization is not new, it suffices to think of a famous tool as simulated annealing. Recently, it has also resorted to statistical inference to address some "hard" optimization problems, developing a new class of message passing algorithms. Three applications to computational biology are presented in this thesis, namely: 1) Boolean networks, a model for gene regulatory networks; 2) haplotype inference, to study the genetic information present in a population; 3) clustering, a general machine learning tool

    Haplotype estimation in polyploids using DNA sequence data

    Get PDF
    Polyploid organisms possess more than two copies of their core genome and therefore contain k>2 haplotypes for each set of ordered genomic variants. Polyploidy occurs often within the plant kingdom, among others in important corps such as potato (k=4) and wheat (k=6). Current sequencing technologies enable us to read the DNA and detect genomic variants, but cannot distinguish between the copies of the genome, each inherited from one of the parents. To detect inheritance patterns in populations, it is necessary to know the haplotypes, as alleles that are in linkage over the same chromosome tend to be inherited together. In this work, we develop mathematical optimisation algorithms to indirectly estimate haplotypes by looking into overlaps between the sequence reads of an individual, as well as into the expected inheritance of the alleles in a population. These algorithm deal with sequencing errors and random variations in the counts of reads observed from each haplotype. These methods are therefore of high importance for studying the genetics of polyploid crops. </p
    • …
    corecore