Search CORE

Efficient genome ancestry inference in complex pedigrees with inbreeding

Author: Abecasis
Browning
Chia
Donnelly
E. Y. Liu
F. P.-M. de Villena
Gudbjartsson
Idury
Jensen
Kruglyak
L. McMillan
Lander
Li
Piccolboni
Q. Zhang
Qian
Sobel
Tang
Valdar
W. Wang
Zhang
Publication venue: Oxford University Press
Publication date: 01/01/2010
Field of study

Motivation: High-density SNP data of model animal resources provides opportunities for fine-resolution genetic variation studies. These genetic resources are generated through a variety of breeding schemes that involve multiple generations of matings derived from a set of founder animals. In this article, we investigate the problem of inferring the most probable ancestry of resulting genotypes, given a set of founder genotypes. Due to computational difficulty, existing methods either handle only small pedigree data or disregard the pedigree structure. However, large pedigrees of model animal resources often contain repetitive substructures that can be utilized in accelerating computation

Carolina Digital Repository

Rapid haplotype inference for nuclear families

Author: A Kong
A Kong
A Kong
A Kong
AL Williams
AM Andrés
Amy L Williams
BL Browning
BN Howie
David E Housman
David K Gifford
DF Gudbjartsson
DF Gudbjartsson
ES Lander
G Coop
G Gao
GR Abecasis
J Gayán
J Li
J Li
J Marchini
JE Wigginton
JR O'Connell
K Doi
K Markianos
L Kruglyak
L Kruglyak
M Fishelson
M Fujita
M Stephens
Martin C Rinard
P Scheet
PC Sabeti
S Lin
S Lin
SR Browning
T Niu
T Niu
Y Li
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2010
Field of study

Hapi is a new dynamic programming algorithm that ignores uninformative states and state transitions in order to efficiently compute minimum-recombinant and maximum likelihood haplotypes. When applied to a dataset containing 103 families, Hapi performs 3.8 and 320 times faster than state-of-the-art algorithms. Because Hapi infers both minimum-recombinant and maximum likelihood haplotypes and applies to related individuals, the haplotypes it infers are highly accurate over extended genomic distances.National Institutes of Health (U.S.) (NIH grant 5-T90-DK070069)National Institutes of Health (U.S.) (Grant 5-P01-NS055923)National Science Foundation (U.S.) (Graduate Research Fellowship

CiteSeerX

DSpace@MIT

Haplotype inference in general pedigrees with two sites

Author: B Reed
BMY Chan
D Gusfield
D Qian
DD Doan
Duong D Doan
J Guo
J Li
J Li
J Li
J Xiao
J Xiao
K Zhang
L Liu
Patricia A Evans
R Niedermeier
RG Downey
RM Karp
S Xu
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background Genetic disease studies investigate relationships between changes in chromosomes and genetic diseases. Single haplotypes provide useful information for these studies but extracting single haplotypes directly by biochemical methods is expensive. A computational method to infer haplotypes from genotype data is therefore important. We investigate the problem of computing the minimum number of recombination events for general pedigrees with two sites for all members. Results We show that this NP-hard problem can be parametrically reduced to the Bipartization by Edge Removal problem and therefore can be solved by an <it>O</it>(2<it>k</it> · <it>n</it>2) exact algorithm, where <it>n</it> is the number of members and <it>k</it> is the number of recombination events. Conclusions Our work can therefore be useful for genetic disease studies to track down how changes in haplotypes such as recombinations relate to genetic disease.</p

Directory of Open Access Journals

A novel approach for haplotype-based association analysis using family data

Author: Chen Yixuan
Li Jing
Li Xin
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Haplotypes versus genotypes on pedigrees

Author: A Piccolboni
B Kirkpatrick
BD Thatte
Bonnie B Kirkpatrick
D Geiger
E Lander
E Sobel
G Coop
I Romero
J Barrett
J Burdick
J Eid
J Li
J Xiao
N MY
R Elston
WM Chen
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract. Genome sequencing will soon produce haplotype data for individuals. For pedigrees of related individuals, sequencing appears to be an attractive alternative to genotyping. However, methods for pedigree analysis with haplotype data have not yet been developed, and the computational complexity of such problems has been an open question. Furthermore, it is not clear in which scenarios haplotype data would provide better estimates than genotype data for quantities such as recombination rates. To answer these questions, a reduction is given from genotype problem instances to haplotype problem instances, and it is shown that solving the haplotype problem yields the solution to the genotype problem, up to constant factors or coefficients. The pedigree analysis problems we will consider are the likelihood, maximum probability haplotype, and minimum recombination haplotype problems. Two algorithms are introduced: an exponential-time hidden Markov model (HMM) for haplotype data where some individuals are untyped, and a linear-time algorithm for pedigrees having haplotype data for all individuals. Recombination estimates from the general haplotype HMM algorithm are compared to recombination estimates produced by a genotype HMM. Having haplotype data on all individuals produces better estimates. However, having several untyped individuals can drastically reduce the utility of haplotype data. Pedigree analysis, both linkage and association studies, has a long history of important contributions to genetics, including disease-gene finding and some of the first genetic maps for humans. Recent contributions include fine-scale recombination maps in humans [4], regions linked to Schizophrenia that might be missed by genome-wide association studies [11], and insights into the relationship between cystic fibrosis and fertility [13]. Algorithms for pedigree problems are of great interest to the computer science community, in part because of connections to machine learning algorithms, optimization methods, and combinatorics [7, 16

CiteSeerX

Ezid

Directory of Open Access Journals

eScholarship - University of California

A genetic algorithm based method for stringent haplotyping of family data

Author: C Lamina
D Levine
D Qian
E Sobel
Francois Besnier
GR Abecasis
J Akey
J Hernández-Sánchez
JBS Haldane
JJ Windig
L Crooks
L Excoffier
L Grapes
M Lynch
M Stephens
P Tapadar
PJ Boettcher
T Becker
THE Meuwissen
Örjan Carlborg
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background The linkage phase, or haplotype, is an extra level of information that in addition to genotype and pedigree can be useful for reconstructing the inheritance pattern of the alleles in a pedigree, and computing for example Identity By Descent probabilities. If a haplotype is provided, the precision of estimated IBD probabilities increases, as long as the haplotype is estimated without errors. It is therefore important to only use haplotypes that are strongly supported by the available data for IBD estimation, to avoid introducing new errors due to erroneous linkage phases. Results We propose a genetic algorithm based method for haplotype estimation in family data that includes a stringency parameter. This allows the user to decide the error tolerance level when inferring parental origin of the alleles. This is a novel feature compared to existing methods for haplotype estimation. We show that using a high stringency produces haplotype data with few errors, whereas a low stringency provides haplotype estimates in most situations, but with an increased number of errors. Conclusion By including a stringency criterion in our haplotyping method, the user is able to maintain the error rate at a suitable level for the particular study; one can select anything from haplotyped data with very small proportion of errors and a higher proportion of non-inferred haplotypes, to data with phase estimates for every marker, when haplotype errors are tolerable. Giving this choice makes the method more flexible and useful in a wide range of applications as it is able to fulfil different requirements regarding the tolerance for haplotype errors, or uncertain marker-phases.</p

Directory of Open Access Journals

Publikationer från Uppsala Universitet

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Bayesian Inference for Retrospective Population Genetics Models Using Markov Chain Monte Carlo Methods

Author: Pirinen Matti
Publication venue: 'University of Helsinki Libraries'
Publication date: 08/06/2009
Field of study

Genetics, the science of heredity and variation in living organisms, has a central role in medicine, in breeding crops and livestock, and in studying fundamental topics of biological sciences such as evolution and cell functioning. Currently the field of genetics is under a rapid development because of the recent advances in technologies by which molecular data can be obtained from living organisms. In order that most information from such data can be extracted, the analyses need to be carried out using statistical models that are tailored to take account of the particular genetic processes. In this thesis we formulate and analyze Bayesian models for genetic marker data of contemporary individuals. The major focus is on the modeling of the unobserved recent ancestry of the sampled individuals (say, for tens of generations or so), which is carried out by using explicit probabilistic reconstructions of the pedigree structures accompanied by the gene flows at the marker loci. For such a recent history, the recombination process is the major genetic force that shapes the genomes of the individuals, and it is included in the model by assuming that the recombination fractions between the adjacent markers are known. The posterior distribution of the unobserved history of the individuals is studied conditionally on the observed marker data by using a Markov chain Monte Carlo algorithm (MCMC). The example analyses consider estimation of the population structure, relatedness structure (both at the level of whole genomes as well as at each marker separately), and haplotype configurations. For situations where the pedigree structure is partially known, an algorithm to create an initial state for the MCMC algorithm is given. Furthermore, the thesis includes an extension of the model for the recent genetic history to situations where also a quantitative phenotype has been measured from the contemporary individuals. In that case the goal is to identify positions on the genome that affect the observed phenotypic values. This task is carried out within the Bayesian framework, where the number and the relative effects of the quantitative trait loci are treated as random variables whose posterior distribution is studied conditionally on the observed genetic and phenotypic data. In addition, the thesis contains an extension of a widely-used haplotyping method, the PHASE algorithm, to settings where genetic material from several individuals has been pooled together, and the allele frequencies of each pool are determined in a single genotyping.Perinnöllisyystieteessä eli genetiikassa tutkitaan perinnöllisen aineksen rakennetta, toimintaa ja muuntelua sekä muita yksilöiden väliseen vaihteluun vaikuttavia tekijöitä eliökunnassa. Nykyisten laboratoriomenetelmien avulla on mahdollista kerätä eliöistä yhä tarkempia ja laajempia molekyylitason aineistoja. Tällaisten aineistojen käsittelemiseksi tarvitaan tilastollisia malleja, jotka hyödyntävät mahdollisimman tarkasti käytettävissä olevaa tietämystä biologisista prosesseista, joiden tuloksena kerätyt aineistot ovat muodostuneet. Tässä väitöskirjassa kehitetään Bayesläisen tilastotieteen malleja eräille geneettisille prosesseille sekä sovelletaan malleja esimerkkiaineistoihin. Pääpaino on yksilöiden yhteisen lähihistorian mallittamisessa. Yksinkertaisimmillaan lähtökohtana on joukko nykyhetken yksilöitä, joiden perinnöllinen aines oletetaan tunnetuksi tietyissä merkkigeenikohdissa laboratoriossa suoritettujen genotyyppimittausten perusteella. Tilastollista mallia käytetään arvioimaan todennäköisyyksiä erilaisille yksilöitä yhdistäville lähihistorioille, jotka kuvataan sukupuurakenteiden sekä merkkigeenien periytymisreittien avulla. Tarkasteltavat aikajaksot ovat enintään kymmeniä sukupolvia. Väitöskirjassa myös hyödynnetään lähihistoriamallia geenikartoitussovelluksessa, jonka tavoitteena on paikallistaa sellaisia kohtia genomista, joilla on vaikutusta tiettyyn yksilöistä mitattuun tai havaittuun ominaisuuteen. Muita sovelluskohteita ovat populaatiorakenteen arviointi sekä yksilöiden välisten sukulaisuusasteiden arviointi

Helsingin yliopiston digitaalinen arkisto

An Efficient Algorithm for Haplotype Inference on Pedigrees with a Small Number of Recombinants

Author: Jing Xiao
Tao Jiang
Tiancheng Lou
Publication venue: Springer Nature
Publication date: 01/01/2011
Field of study