507 research outputs found

    gap: Genetic Analysis Package

    Get PDF
    A preliminary attempt at collecting tools and utilities for genetic data as an R package called gap is described. Genomewide association is then described as a specific example, linking the work of Risch and Merikangas (1996), Long and Langley (1997) for family-based and population-based studies, and the counterpart for case-cohort design established by Cai and Zeng (2004). Analysis of staged design as outlined by Skol et al. (2006) and associate methods are discussed. The package is flexible, customizable, and should prove useful to researchers especially in its application to genomewide association studies.

    Modelling dependencies in genetic-marker data and its application to haplotype analysis

    Get PDF
    The objective of this thesis is to develop new methods to reconstruct haplotypes from phaseunknown genotypes. The need for new methodologies is motivated by the increasing avail¬ ability of high-resolution marker data for many species. Such markers typically exhibit correlations, a phenomenon known as Linkage Disequilibrium (LD). It is believed that re¬ constructed haplotypes for markers in high LD can be valuable for a variety of application areas in population genetics, including reconstructing population history and identifying genetic disease variantsTraditionally, haplotype reconstruction methods can be categorized according to whether they operate on a single pedigree or a collection of unrelated individuals. The thesis begins with a critical assessment of the limitations of existing methods, and then presents a uni¬ fied statistical framework that can accommodate pedigree data, unrelated individuals and tightly linked markers. The framework makes use of graphical models, where inference entails representing the relevant joint probability distribution as a graph and then using associated algorithms to facilitate computation. The graphical model formalism provides invaluable tools to facilitate model specification, visualization, and inference.Once the unified framework is developed, a broad range of simulation studies are conducted using previously published haplotype data. Important contributions include demonstrating the different ways in which the haplotype frequency distribution can impact the accuracy of both the phase assignments and haplotype frequency estimates; evaluating the effectiveness of using family data to improve accuracy for different frequency profiles; and, assessing the dangers of treating related individuals as unrelated in an association study

    A genetic algorithm based method for stringent haplotyping of family data

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The linkage phase, or haplotype, is an extra level of information that in addition to genotype and pedigree can be useful for reconstructing the inheritance pattern of the alleles in a pedigree, and computing for example Identity By Descent probabilities. If a haplotype is provided, the precision of estimated IBD probabilities increases, as long as the haplotype is estimated without errors. It is therefore important to only use haplotypes that are strongly supported by the available data for IBD estimation, to avoid introducing new errors due to erroneous linkage phases.</p> <p>Results</p> <p>We propose a genetic algorithm based method for haplotype estimation in family data that includes a stringency parameter. This allows the user to decide the error tolerance level when inferring parental origin of the alleles. This is a novel feature compared to existing methods for haplotype estimation. We show that using a high stringency produces haplotype data with few errors, whereas a low stringency provides haplotype estimates in most situations, but with an increased number of errors.</p> <p>Conclusion</p> <p>By including a stringency criterion in our haplotyping method, the user is able to maintain the error rate at a suitable level for the particular study; one can select anything from haplotyped data with very small proportion of errors and a higher proportion of non-inferred haplotypes, to data with phase estimates for every marker, when haplotype errors are tolerable. Giving this choice makes the method more flexible and useful in a wide range of applications as it is able to fulfil different requirements regarding the tolerance for haplotype errors, or uncertain marker-phases.</p

    Quantifying the Fraction of Missing Information for Hypothesis Testing in Statistical and Genetic Studies

    Get PDF
    Many practical studies rely on hypothesis testing procedures applied to data sets with missing information. An important part of the analysis is to determine the impact of the missing data on the performance of the test, and this can be done by properly quantifying the relative (to complete data) amount of available information. The problem is directly motivated by applications to studies, such as linkage analyses and haplotype-based association projects, designed to identify genetic contributions to complex diseases. In the genetic studies the relative information measures are needed for the experimental design, technology comparison, interpretation of the data, and for understanding the behavior of some of the inference tools. The central difficulties in constructing such information measures arise from the multiple, and sometimes conflicting, aims in practice. For large samples, we show that a satisfactory, likelihood-based general solution exists by using appropriate forms of the relative Kullback--Leibler information, and that the proposed measures are computationally inexpensive given the maximized likelihoods with the observed data. Two measures are introduced, under the null and alternative hypothesis respectively. We exemplify the measures on data coming from mapping studies on the inflammatory bowel disease and diabetes. For small-sample problems, which appear rather frequently in practice and sometimes in disguised forms (e.g., measuring individual contributions to a large study), the robust Bayesian approach holds great promise, though the choice of a general-purpose "default prior" is a very challenging problem.Comment: Published in at http://dx.doi.org/10.1214/07-STS244 the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Statistical perspectives on dependencies between genomic markers

    Get PDF
    To study the genetic impact on a quantitative trait, molecular markers are used as predictor variables in a statistical model. This habilitation thesis elucidated challenges accompanied with such investigations. First, the usefulness of including different kinds of genetic effects, which can be additive or non-additive, was verified. Second, dependencies between markers caused by their proximity on the genome were studied in populations with family stratification. The resulting covariance matrix deserved special attention due to its multi-functionality in several fields of genomic evaluations

    gap: Genetic Analysis Package

    Get PDF
    A preliminary attempt at collecting tools and utilities for genetic data as an R package called gap is described. Genomewide association is then described as a specific example, linking the work of Risch and Merikangas (1996), Long and Langley (1997) for family-based and population-based studies, and the counterpart for case-cohort design established by Cai and Zeng (2004). Analysis of staged design as outlined by Skol et al. (2006) and associate methods are discussed. The package is flexible, customizable, and should prove useful to researchers especially in its application to genomewide association studies

    A similarity matrix and its application in genomic selection for hedging haplotype diversity

    Get PDF
    Mendelian sampling variance (MSV) has many breeding applications. However, its computationally intensive nature limits its widespread use. Recently proposed selection indices for long-term genetic gain combine genomic estimated breeding value and MSV. However, these indices tend to select similar parents with high MSV potential under high selection intensity, resulting in favorable haplotypes losses. Therefore, this thesis aimed to develop a faster approach for computing MSV and derive a similarity matrix for hedging haplotype diversity. The thesis first develops an efficient approach for computing MSV using marker effects, a genetic map, and phased genotypes. Then, using the same information as MSV, it derives a similarity matrix. The off-diagonal elements of this matrix represent the similarities between parental haplotypes, and diagonal elements represent the similarity of a parent to itself, which equals its MSV. A high similarity indicates that the parents share many heterozygous markers, with large effects on a trait in the same linkage phase. Similar to how covariance matrices of asset prices are used in finance to create diversified portfolios, the similarity matrix can help avoid repeated matings of similar parents and achieve expected genetic gain while hedging haplotype diversity in the next generation. The thesis then develops the Python package PyMSQ for computing MSV and similarity matrix to facilitate their use in breeding programs. Compared to gamevar (a recently published Fortran program), PyMSQ was up to 240 times faster at computing MSV in the analyzed data sets. Finally, similarity matrices for milk production and longevity traits were calculated using PyMSQ for a large German Holstein population to assess their applicability, relevance, and influencing factors. The similarity matrix presented in this thesis introduces new criteria for genomic selection, allowing for increased genetic gain while hedging haplotype diversity in breeding programs
    corecore