Search CORE

1,998 research outputs found

Recommended from our members

The quest for a donor: probability based methods offer help

Author: Bosnes V.
Braaten O.
Cowell R.
Egeland T.
Mostad P. F.
Publication venue: Faculty of Actuarial Science & Insurance, City University London
Publication date: 01/01/2005
Field of study

When a patient in need of a stem cell transplant has no compatible donor within his or her closest family, and no matched unrelated donor can be found, a remaining option is to search within the patient’s extended family. This situation often arises when the patient is of an ethnic minority, originating from a country that lacks a well-developed stem cell donor program, and has HLA haplotypes that are rare in his or her country of residence. Searching within the extended family may be time-consuming and expensive, and tools to calculate the probability of a match within groups of untested relatives would facilitate the search. We present a general approach to calculating the probability of a match in a given relative, or group of relatives, based on the pedigree, and on knowledge of the genotypes of some of the individuals. The method extends previous approaches by allowing the pedigrees to be consanguineous and arbitrarily complex, with deviations from Hardy-Weinberg equilibrium. We show how this extension has a considerable effect on results, in particular for rare haplotypes. The methods are exemplified using freeware programs to solve a case of practical importance

City Research Online

Chalmers Research

Chalmers Publication Library

Quantifying the Fraction of Missing Information for Hypothesis Testing in Statistical and Genetic Studies

Author: Kong Augustine
Meng Xiao-Li
Nicolae Dan L.
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2008
Field of study

Many practical studies rely on hypothesis testing procedures applied to data sets with missing information. An important part of the analysis is to determine the impact of the missing data on the performance of the test, and this can be done by properly quantifying the relative (to complete data) amount of available information. The problem is directly motivated by applications to studies, such as linkage analyses and haplotype-based association projects, designed to identify genetic contributions to complex diseases. In the genetic studies the relative information measures are needed for the experimental design, technology comparison, interpretation of the data, and for understanding the behavior of some of the inference tools. The central difficulties in constructing such information measures arise from the multiple, and sometimes conflicting, aims in practice. For large samples, we show that a satisfactory, likelihood-based general solution exists by using appropriate forms of the relative Kullback--Leibler information, and that the proposed measures are computationally inexpensive given the maximized likelihoods with the observed data. Two measures are introduced, under the null and alternative hypothesis respectively. We exemplify the measures on data coming from mapping studies on the inflammatory bowel disease and diabetes. For small-sample problems, which appear rather frequently in practice and sometimes in disguised forms (e.g., measuring individual contributions to a large study), the robust Bayesian approach holds great promise, though the choice of a general-purpose "default prior" is a very challenging problem.Comment: Published in at http://dx.doi.org/10.1214/07-STS244 the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

CiteSeerX

Crossref

Harvard University - DASH

An efficient parallel algorithm for haplotype inference based on rule based approach and consensus methods.

Author: Saeed Qamar
Publication venue: 'University of Windsor Leddy Library'
Publication date: 01/01/2007
Field of study

Scholarship at UWindsor

Haplotype Inference on Pedigrees with Recombinations, Errors, and Missing Genotypes via SAT solvers

Author: Biffani Stefano
Bonizzoni Paola
Della Vedova Gianluca
Pirola Yuri
Stella Alessandra
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 19/07/2011
Field of study

The Minimum-Recombinant Haplotype Configuration problem (MRHC) has been highly successful in providing a sound combinatorial formulation for the important problem of genotype phasing on pedigrees. Despite several algorithmic advances and refinements that led to some efficient algorithms, its applicability to real datasets has been limited by the absence of some important characteristics of these data in its formulation, such as mutations, genotyping errors, and missing data. In this work, we propose the Haplotype Configuration with Recombinations and Errors problem (HCRE), which generalizes the original MRHC formulation by incorporating the two most common characteristics of real data: errors and missing genotypes (including untyped individuals). Although HCRE is computationally hard, we propose an exact algorithm for the problem based on a reduction to the well-known Satisfiability problem. Our reduction exploits recent progresses in the constraint programming literature and, combined with the use of state-of-the-art SAT solvers, provides a practical solution for the HCRE problem. Biological soundness of the phasing model and effectiveness (on both accuracy and performance) of the algorithm are experimentally demonstrated under several simulated scenarios and on a real dairy cattle population.Comment: 14 pages, 1 figure, 4 tables, the associated software reHCstar is available at http://www.algolab.eu/reHCsta

arXiv.org e-Print Archive

Crossref

Modelling dependencies in genetic-marker data and its application to haplotype analysis

Author: Schouten Michael T.
Publication venue: The University of Edinburgh
Publication date: 01/01/2008
Field of study

The objective of this thesis is to develop new methods to reconstruct haplotypes from phaseunknown genotypes. The need for new methodologies is motivated by the increasing avail¬ ability of high-resolution marker data for many species. Such markers typically exhibit correlations, a phenomenon known as Linkage Disequilibrium (LD). It is believed that re¬ constructed haplotypes for markers in high LD can be valuable for a variety of application areas in population genetics, including reconstructing population history and identifying genetic disease variantsTraditionally, haplotype reconstruction methods can be categorized according to whether they operate on a single pedigree or a collection of unrelated individuals. The thesis begins with a critical assessment of the limitations of existing methods, and then presents a uni¬ fied statistical framework that can accommodate pedigree data, unrelated individuals and tightly linked markers. The framework makes use of graphical models, where inference entails representing the relevant joint probability distribution as a graph and then using associated algorithms to facilitate computation. The graphical model formalism provides invaluable tools to facilitate model specification, visualization, and inference.Once the unified framework is developed, a broad range of simulation studies are conducted using previously published haplotype data. Important contributions include demonstrating the different ways in which the haplotype frequency distribution can impact the accuracy of both the phase assignments and haplotype frequency estimates; evaluating the effectiveness of using family data to improve accuracy for different frequency profiles; and, assessing the dangers of treating related individuals as unrelated in an association study

Edinburgh Research Archive

Methods and Algorithms for Inference Problems in Population Genetics

Author: Pei Jingwen
Publication venue: OpenCommons@UConn
Publication date: 09/07/2018
Field of study

Inference of population history is a central problem of population genetics. The advent of large genetic data brings us not only opportunities on developing more accurate methods for inference problems, but also computational challenges. Thus, we aim at developing accurate method and fast algorithm for problems in population genetics. Inference of admixture proportions is a classical statistical problem. We particularly focus on the problem of ancestry inference for ancestors. Standard methods implicitly assume that both parents of an individual have the same admixture fraction. However, this is rarely the case in real data. We develop a Hidden Markov Model (HMM) framework for estimating the admixture proportions of the immediate ancestors of an individual, i.e. a type of appropriation of an individual\u27s admixture proportions into further subsets of ancestral proportions in the ancestors. Based on a genealogical model for admixture tracts, we develop an efficient algorithm for computing the sampling probability of the genome from a single individual, as a function of the admixture proportions of the ancestors of this individual. We show that the distribution and lengths of admixture tracts in a genome contain information about the admixture proportions of the ancestors of an individual. This allows us to perform probabilistic inference of admixture proportions of ancestors only using the genome of an extant individual. To better understand population, we further study the species delimitation problem. It is a problem of determining the boundary between population and species. We propose a classification-based method to assign a set of populations to a number of species. Our new method uses summary statistics generated from genetic data to classify pairwise populations as either \u27same species\u27 or \u27different species\u27. We show that machine learning can be used for species delimitation and scaled for large genomic data. It can also outperform Bayesian approaches, especially when gene flow involves in the evolutionary process

DigitalCommons@UConn

OpenCommons at University of Connecticut

Rapid haplotype inference for nuclear families

Author: A Kong
A Kong
A Kong
A Kong
AL Williams
AM Andrés
Amy L Williams
BL Browning
BN Howie
David E Housman
David K Gifford
DF Gudbjartsson
DF Gudbjartsson
ES Lander
G Coop
G Gao
GR Abecasis
J Gayán
J Li
J Li
J Marchini
JE Wigginton
JR O'Connell
K Doi
K Markianos
L Kruglyak
L Kruglyak
M Fishelson
M Fujita
M Stephens
Martin C Rinard
P Scheet
PC Sabeti
S Lin
S Lin
SR Browning
T Niu
T Niu
Y Li
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2010
Field of study

Hapi is a new dynamic programming algorithm that ignores uninformative states and state transitions in order to efficiently compute minimum-recombinant and maximum likelihood haplotypes. When applied to a dataset containing 103 families, Hapi performs 3.8 and 320 times faster than state-of-the-art algorithms. Because Hapi infers both minimum-recombinant and maximum likelihood haplotypes and applies to related individuals, the haplotypes it infers are highly accurate over extended genomic distances.National Institutes of Health (U.S.) (NIH grant 5-T90-DK070069)National Institutes of Health (U.S.) (Grant 5-P01-NS055923)National Science Foundation (U.S.) (Graduate Research Fellowship

CiteSeerX

DSpace@MIT

Crossref

Springer - Publisher Connector

PubMed Central