654 research outputs found

    Speeding up HMM algorithms for genetic linkage analysis via chain reductions of the state space

    Get PDF
    We develop an hidden Markov model (HMM)-based algorithm for computing exact parametric and non-parametric linkage scores in larger pedigrees than was possible before. The algorithm is applicable whenever there are chains of persons in the pedigree with no genetic measurements and with unknown affection status. The algorithm is based on shrinking the state space of the HMM considerably using such chains. In a two g-degree cousins pedigree the reduction drops the state space from being exponential in g to being linear in g. For a Finnish family in which two affected children suffer from a rare cold-inducing sweating syndrome, we were able to reduce the state space by more than five orders of magnitude from 250 to 232. In another pedigree of state-space size of 227, used for a study of pituitary adenoma, the state space reduced by a factor of 8.5 and consequently exact linkage scores can now be computed, rather than approximated

    The Cluster Variation Method for Efficient Linkage Analysis on Extended Pedigrees

    Get PDF
    BACKGROUND: Computing exact multipoint LOD scores for extended pedigrees rapidly becomes infeasible as the number of markers and untyped individuals increase. When markers are excluded from the computation, significant power may be lost. Therefore accurate approximate methods which take into account all markers are desirable. METHODS: We present a novel method for efficient estimation of LOD scores on extended pedigrees. Our approach is based on the Cluster Variation Method, which deterministically estimates likelihoods by performing exact computations on tractable subsets of variables (clusters) of a Bayesian network. First a distribution over inheritances on the marker loci is approximated with the Cluster Variation Method. Then this distribution is used to estimate the LOD score for each location of the trait locus. RESULTS: First we demonstrate that significant power may be lost if markers are ignored in the multi-point analysis. On a set of pedigrees where exact computation is possible we compare the estimates of the LOD scores obtained with our method to the exact LOD scores. Secondly, we compare our method to a state of the art MCMC sampler. When both methods are given equal computation time, our method is more efficient. Finally, we show that CVM scales to large problem instances. CONCLUSION: We conclude that the Cluster Variation Method is as accurate as MCMC and generally is more efficient. Our method is a promising alternative to approaches based on MCMC sampling

    Computing Individual Risks based on Family History in Genetic Disease in the Presence of Competing Risks

    Full text link
    When considering a genetic disease with variable age at onset (ex: diabetes , familial amyloid neuropathy, cancers, etc.), computing the individual risk of the disease based on family history (FH) is of critical interest both for clinicians and patients. Such a risk is very challenging to compute because: 1) the genotype X of the individual of interest is in general unknown; 2) the posterior distribution P(X|FH, T > t) changes with t (T is the age at disease onset for the targeted individual); 3) the competing risk of death is not negligible. In this work, we present a modeling of this problem using a Bayesian network mixed with (right-censored) survival outcomes where hazard rates only depend on the genotype of each individual. We explain how belief propagation can be used to obtain posterior distribution of genotypes given the FH, and how to obtain a time-dependent posterior hazard rate for any individual in the pedigree. Finally, we use this posterior hazard rate to compute individual risk, with or without the competing risk of death. Our method is illustrated using the Claus-Easton model for breast cancer (BC). This model assumes an autosomal dominant genetic risk factor such as non-carriers (genotype 00) have a BC hazard rate λ\lambda 0 (t) while carriers (genotypes 01, 10 and 11) have a (much greater) hazard rate λ\lambda 1 (t). Both hazard rates are assumed to be piecewise constant with known values (cuts at 20, 30,. .. , 80 years). The competing risk of death is derived from the national French registry

    Parallel computations on pedigree data through mapping to configurable computing devices

    Get PDF
    Pedigree data structures have a number of applications in genetics, including the estimation of allelic or haplotype probabilities in humans and agricultural species, and the estimation of breeding values in agricultural species. Sequential algorithms for general purpose CPU-based computers are commonly used, but are inadequate for some tasks on large data sets. We show that pedigree data can be directly represented on Field Programmable Gate Arrays (FPGA), allowing highly efficient massively parallel simulation of the flow of genes. Operating on the whole pedigree in parallel, the transmission of genes can occur for all individuals in a single clock cycle. By using FPGA, the algorithms to estimate inbreeding coefficients and allelic probabilities are shown to operate hundreds to thousands of times faster than the corresponding sequentially based algorithms. Where problems can be largely represented in an integer form, FPGA provide an efficient platform for computations on pedigree data

    An efficient algorithm to compute marginal posterior genotype probabilities for every member of a pedigree with loops

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Marginal posterior genotype probabilities need to be computed for genetic analyses such as geneticcounseling in humans and selective breeding in animal and plant species.</p> <p>Methods</p> <p>In this paper, we describe a peeling based, deterministic, exact algorithm to compute efficiently genotype probabilities for every member of a pedigree with loops without recourse to junction-tree methods from graph theory. The efficiency in computing the likelihood by peeling comes from storing intermediate results in multidimensional tables called cutsets. Computing marginal genotype probabilities for individual <it>i </it>requires recomputing the likelihood for each of the possible genotypes of individual <it>i</it>. This can be done efficiently by storing intermediate results in two types of cutsets called anterior and posterior cutsets and reusing these intermediate results to compute the likelihood.</p> <p>Examples</p> <p>A small example is used to illustrate the theoretical concepts discussed in this paper, and marginal genotype probabilities are computed at a monogenic disease locus for every member in a real cattle pedigree.</p

    Most parsimonious haplotype allele sharing determination

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The "common disease – common variant" hypothesis and genome-wide association studies have achieved numerous successes in the last three years, particularly in genetic mapping in human diseases. Nevertheless, the power of the association study methods are still low, in particular on quantitative traits, and the description of the full allelic spectrum is deemed still far from reach. Given increasing density of single nucleotide polymorphisms available and suggested by the block-like structure of the human genome, a popular and prosperous strategy is to use haplotypes to try to capture the correlation structure of SNPs in regions of little recombination. The key to the success of this strategy is thus the ability to unambiguously determine the haplotype allele sharing status among the members. The association studies based on haplotype sharing status would have significantly reduced degrees of freedom and be able to capture the combined effects of tightly linked causal variants.</p> <p>Results</p> <p>For pedigree genotype datasets of medium density of SNPs, we present two methods for haplotype allele sharing status determination among the pedigree members. Extensive simulation study showed that both methods performed nearly perfectly on breakpoint discovery, mutation haplotype allele discovery, and shared chromosomal region discovery.</p> <p>Conclusion</p> <p>For pedigree genotype datasets, the haplotype allele sharing status among the members can be deterministically, efficiently, and accurately determined, even for very small pedigrees. Given their excellent performance, the presented haplotype allele sharing status determination programs can be useful in many downstream applications including haplotype based association studies.</p

    The EM Algorithm in Genetics, Genomics and Public Health

    Full text link
    The popularity of the EM algorithm owes much to the 1977 paper by Dempster, Laird and Rubin. That paper gave the algorithm its name, identified the general form and some key properties of the algorithm and established its broad applicability in scientific research. This review gives a nontechnical introduction to the algorithm for a general scientific audience, and presents a few examples characteristic of its application.Comment: Published in at http://dx.doi.org/10.1214/08-STS270 the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Efficient Identification of Equivalences in Dynamic Graphs and Pedigree Structures

    Full text link
    We propose a new framework for designing test and query functions for complex structures that vary across a given parameter such as genetic marker position. The operations we are interested in include equality testing, set operations, isolating unique states, duplication counting, or finding equivalence classes under identifiability constraints. A motivating application is locating equivalence classes in identity-by-descent (IBD) graphs, graph structures in pedigree analysis that change over genetic marker location. The nodes of these graphs are unlabeled and identified only by their connecting edges, a constraint easily handled by our approach. The general framework introduced is powerful enough to build a range of testing functions for IBD graphs, dynamic populations, and other structures using a minimal set of operations. The theoretical and algorithmic properties of our approach are analyzed and proved. Computational results on several simulations demonstrate the effectiveness of our approach.Comment: Code for paper available at http://www.stat.washington.edu/~hoytak/code/hashreduc
    corecore