Search CORE

3,239 research outputs found

Recommended from our members

Haplotype Inference through Sequential Monte Carlo

Author: Iliadis Alexandros
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2013
Field of study

Technological advances in the last decade have given rise to large Genome Wide Studies which have helped researchers get better insights in the genetic basis of many common diseases. As the number of samples and genome coverage has increased dramatically it is currently typical that individuals are genotyped using high throughput platforms to more than 500,000 Single Nucleotide Polymorphisms. At the same time theoretical and empirical arguments have been made for the use of haplotypes, i.e. combinations of alleles at multiple loci in individual chromosomes, as opposed to genotypes so the problem of haplotype inference is particularly relevant. Existing haplotyping methods include population based methods, methods for pooled DNA samples and methods for family and pedigree data. Furthermore, the vast amount of available data pose new challenges for haplotyping algorithms. Candidate methods should scale well to the size of the datasets as the number of loci and the number of individuals are well to the thousands. In addition, as genotyping can be performed routinely, researchers encounter a number of specific new scenarios, which can be seen as hybrid between the population and pedigree inference scenarios and require special care to incorporate the maximum amount of information. In this thesis we present a Sequential Monte Carlo framework (TDS) and tailor it to address instances of haplotype inference and frequency estimation problems. Specifically, we first adjust our framework to perform haplotype inference in trio families resulting in a methodology that demonstrates an excellent tradeoff between speed and accuracy. Consequently, we extend our method to handle general nuclear families and demonstrate the gain using our approach as opposed to alternative scenarios. We further address the problem of haplotype inference in pooling data in which we show that our method achieves improved performance over existing approaches in datasets with large number of markers. We finally present a framework to handle the haplotype inference problem in regions of CNV/SNP data. Using our approach we can phase datasets where the ploidy of an individual can vary along the region and each individual can have different breakpoints

Columbia University Academic Commons

Parsimony-based genetic algorithm for haplotype resolution and block partitioning

Author: Sazonova Nadezhda A.
Publication venue: The Research Repository @ WVU
Publication date: 01/12/2007
Field of study

This dissertation proposes a new algorithm for performing simultaneous haplotype resolution and block partitioning. The algorithm is based on genetic algorithm approach and the parsimonious principle. The multiloculs LD measure (Normalized Entropy Difference) is used as a block identification criterion. The proposed algorithm incorporates missing data is a part of the model and allows blocks of arbitrary length. In addition, the algorithm provides scores for the block boundaries which represent measures of strength of the boundaries at specific positions. The performance of the proposed algorithm was validated by running it on several publicly available data sets including the HapMap data and comparing results to those of the existing state-of-the-art algorithms. The results show that the proposed genetic algorithm provides the accuracy of haplotype decomposition within the range of the same indicators shown by the other algorithms. The block structure output by our algorithm in general agrees with the block structure for the same data provided by the other algorithms. Thus, the proposed algorithm can be successfully used for block partitioning and haplotype phasing while providing some new valuable features like scores for block boundaries and fully incorporated treatment of missing data. In addition, the proposed algorithm for haplotyping and block partitioning is used in development of the new clustering algorithm for two-population mixed genotype samples. The proposed clustering algorithm extracts from the given genotype sample two clusters with substantially different block structures and finds haplotype resolution and block partitioning for each cluster

The Research Repository @ WVU (West Virginia University)

SNP haplotype tagging from DNA pools of two individuals

Author: Hoh Josephine
Lathrop Mark G
Markovic Daniela
Matsuda Fumihiko
Ott Jurg
Peng Xu
Publication venue: BioMed Central
Publication date: 01/01/2003
Field of study

BACKGROUND: DNA pooling is a technique to reduce genotyping effort while incurring only minor losses in accuracy of allele frequency estimates for single nucleotide polymorphism (SNP) markers. RESULTS: We present an algorithm for reconstructing haplotypes (alleles for multiple SNPs on same chromosome) from pools of two individual DNAs, in which Hardy-Weinberg equilibrium conditions or other assumptions are not required. The program outputs, in addition to inferred haplotypes, a minimal number of haplotype-tagging SNPs that are identified after an exhaustive search procedure. CONCLUSION: Our method and algorithms lead to a significant reduction in genotyping effort, for example, in case-control disease association studies while maintaining the possibility of reconstructing haplotypes under very general conditions

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

An efficient parallel algorithm for haplotype inference based on rule based approach and consensus methods.

Author: Saeed Qamar
Publication venue: 'University of Windsor Leddy Library'
Publication date: 01/01/2007
Field of study

Scholarship at UWindsor

Using DNA pools for genotyping trios

Author: Andreas Braun
Bansal
Barratt
Cargill
Choudhry
Conrad
Downes
Eran Halperin
Ewens
Halperin
Halperin
Hinds
Kenneth B. Beckman
Kenneth J. Abel
Kruglyak
Marchini
Nelson
Pe'er
Sham
Spielman
The International HapMap Consortium
Wang
Wang
Yang
Publication venue: Oxford University Press
Publication date: 01/01/2006
Field of study

The genotyping of mother–father–child trios is a very useful tool in disease association studies, as trios eliminate population stratification effects and increase the accuracy of haplotype inference. Unfortunately, the use of trios for association studies may reduce power, since it requires the genotyping of three individuals where only four independent haplotypes are involved. We describe here a method for genotyping a trio using two DNA pools, thus reducing the cost of genotyping trios to that of genotyping two individuals. Furthermore, we present extensions to the method that exploit the linkage disequilibrium structure to compensate for missing data and genotyping errors. We evaluated our method on trios from CEPH pedigree 66 of the Coriell Institute. We demonstrate that the error rates in the genotype calls of the proposed protocol are comparable to those of standard genotyping techniques, although the cost is reduced considerably. The approach described is generic and it can be applied to any genotyping platform that achieves a reasonable precision of allele frequency estimates from pools of two individuals. Using this approach, future trio-based association studies may be able to increase the sample size by 50% for the same cost and thereby increase the power to detect associations

CiteSeerX

Crossref

PubMed Central

A 32 kb Critical Region Excluding Y402H in CFH Mediates Risk for Age-Related Macular Degeneration

Author: A DeWan
A Rivera
A Swaroop
AE Hughes
Albert O. Edwards
Alice K. Henning
Anand Swaroop
Andy Itsara
AO Edwards
Arto Urtti
B Gold
Barbara E. K. Klein
CL Thompson
Deborah A. Nickerson
Dmitry V. Leontiev
Dwight E. Stambolian
EE Eichler
Emily Y. Chew
Evan E. Eichler
Feiyou Qiu
Goncalo R. Abecasis
Govindasamy Kumaramanickavel
GS Hageman
GS Hageman
Gyungah Jun
J Hellwage
J Maller
J Marchini
JA Bailey
JA Fagerness
JC Barrett
Jeffrey M. Kidd
JH Schick
JL Haines
JM Kidd
JM Kidd
JM Korn
JM Seddon
JR Yates
K Tamura
K Wang
KL Spencer
Kristine E. Lee
L Huang
Laura J. Kopplin
LG Fritsche
Lingam Vijaya
Liping Tian
LJ Kopplin
M Jozsi
M Li
M Stephens
Manmath Kumar Das
Manoharan Aarthi
Mark Seielstad
Michael L. Klein
N Patel
Neal S. Peachey
Parveen Sen
Paul Mitchell
Peter J. Francis
PJ Francis
R Klein
R Klein
Rajiv Raman
RJ Klein
Robert P. Igo
Ronald Klein
Ronnie George
S Hakobyan
S Heinen
S Purcell
S Raychaudhuri
SA McCarroll
SK Iyengar
Stephanie A. Hagstrom
Sudha K. Iyengar
T Marques-Bonet
Theru A. Sivakumaran
Thomas LaFramboise
Tien Y. Wong
TS Jokiranta
Vedam L. Ramprasad
W Chen
Wan-Ting Tay
Wei Chen
Yang Wang
Publication venue: Public Library of Science
Publication date: 01/01/2011
Field of study

Complement factor H shows very strong association with Age-related Macular Degeneration (AMD), and recent data suggest that multiple causal variants are associated with disease. To refine the location of the disease associated variants, we characterized in detail the structural variation at CFH and its paralogs, including two copy number polymorphisms (CNP), CNP147 and CNP148, and several rare deletions and duplications. Examination of 34 AMD-enriched extended families (N = 293) and AMD cases (White N = 4210 Indian = 134; Malay = 140) and controls (White N = 3229; Indian = 117; Malay = 2390) demonstrated that deletion CNP148 was protective against AMD, independent of SNPs at CFH. Regression analysis of seven common haplotypes showed three haplotypes, H1, H6 and H7, as conferring risk for AMD development. Being the most common haplotype H1 confers the greatest risk by increasing the odds of AMD by 2.75-fold (95% CI = [2.51, 3.01]; p = 8.31×10−109); Caucasian (H6) and Indian-specific (H7) recombinant haplotypes increase the odds of AMD by 1.85-fold (p = 3.52×10−9) and by 15.57-fold (P = 0.007), respectively. We identified a 32-kb region downstream of Y402H (rs1061170), shared by all three risk haplotypes, suggesting that this region may be critical for AMD development. Further analysis showed that two SNPs within the 32 kb block, rs1329428 and rs203687, optimally explain disease association. rs1329428 resides in 20 kb unique sequence block, but rs203687 resides in a 12 kb block that is 89% similar to a noncoding region contained in ΔCNP148. We conclude that causal variation in this region potentially encompasses both regulatory effects at single markers and copy number

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

eScholarship - University of California

University of Melbourne Institutional Repository

ScholarBank@NUS

The maintenance of standing genetic variation: Gene flow vs. selective neutrality in Atlantic stickleback fish

Author: Berner Daniel
Guerard Laurent
Haenel Quiterie
MacColl Andrew D. C.
Publication venue: 'Wiley'
Publication date: 01/01/2021
Field of study

Adaptation to derived habitats often occurs from standing genetic variation. The maintenance within ancestral populations of genetic variants favourable in derived habitats is commonly ascribed to long-term antagonism between purifying selection and gene flow resulting from hybridization across habitats. A largely unexplored alternative idea based on quantitative genetic models of polygenic adaptation is that variants favoured in derived habitats are neutral in ancestral populations when their frequency is relatively low. To explore the latter, we first identify genetic variants important to the adaptation of threespine stickleback fish (Gasterosteus aculeatus) to a rare derived habitat-nutrient-depleted acidic lakes-based on whole-genome sequence data. Sequencing marine stickleback from six locations across the Atlantic Ocean then allows us to infer that the frequency of these derived variants in the ancestral habitat is unrelated to the likely opportunity for gene flow of these variants from acidic-adapted populations. This result is consistent with the selective neutrality of derived variants within the ancestor. Our study thus supports an underappreciated explanation for the maintenance of standing genetic variation, and calls for a better understanding of the fitness consequences of adaptive variation across habitats and genomic backgrounds

Repository@Nottingham

edoc

PubMed Central