Search CORE

2,372 research outputs found

Characterisation of the genomic architecture of human chromosome 17q and evaluation of different methods for haplotype block definition

Author: Barton Anne
Eyre Stephen
John Sally
Ollier William
Ward Daniel
Worthington Jane
Zeggini Eleftheria
Publication venue: BioMed Central
Publication date: 01/01/2005
Field of study

BACKGROUND: The selection of markers in association studies can be informed through the use of haplotype blocks. Recent reports have determined the genomic architecture of chromosomal segments through different haplotype block definitions based on linkage disequilibrium (LD) measures or haplotype diversity criteria. The relative applicability of distinct block definitions to association studies, however, remains unclear. We compared different block definitions in 6.1 Mb of chromosome 17q in 189 unrelated healthy individuals. Using 137 single nucleotide polymorphisms (SNPs), at a median spacing of 15.5 kb, we constructed haplotype block maps using published methods and additional methods we have developed. Haplotype tagging SNPs (htSNPs) were identified for each map. RESULTS: Blocks were found to be shorter and coverage of the region limited with methods based on LD measures, compared to the method based on haplotype diversity. Although the distribution of blocks was highly variable, the number of SNPs that needed to be typed in order to capture the maximum number of haplotypes was consistent. CONCLUSION: For the marker spacing used in this study, choice of block definition is not important when used as an initial screen of the region to identify htSNPs. However, choice of block definition has consequences for the downstream interpretation of association study results

Directory of Open Access Journals

PubMed Central

Oxford University Research Archive

The University of Manchester - Institutional Repository

Genome resequencing reveals multiscale geographic structure and extensive linkage disequilibrium in the forest tree Populus trichocarpa

Author: Slavov Gancho Trifonu
Difazio Stephen P.
Martin Joel
Schackwitz Wendy
Muchero Wellington
Rodgers-Melnick Eli
Lipphardt Mindie F.
Pennacchio Christa P.
Hellsten Uffe
Pennacchio Len A.
Gunter Lee E.
Ranjan Priya
Vining Kelly
Pomraning Kyle R.
Wilhelm Larry J.
Pellegrini Matteo
Mockler Todd C.
Freitag Michael
Geraldes Armando
El-Kassaby Yousry A.
Mansfield Shawn D.
Cronk Quentin C. B.
Douglas Carl J.
Strauss Steven H.
Rokhsar Dan
Tuskan Gerald A.
Publication venue: John Wiley & Sons, Inc.
Publication date: 01/01/2009
Field of study

This is the publisher’s final pdf. The article is copyrighted by the New Phytologist Trust and published by John Wiley & Sons, Inc. It can be found at: http://onlinelibrary.wiley.com/journal/10.1111/%28ISSN%291469-8137. To the best of our knowledge, one or more authors of this paper were federal employees when contributing to this work.•Plant population genomics informs evolutionary biology, breeding, conservation and bioenergy feedstock development. For example, the detection of reliable phenotype–genotype associations and molecular signatures of selection requires a detailed knowledge about genome-wide patterns of allele frequency variation, linkage disequilibrium and recombination.\ud •We resequenced 16 genomes of the model tree Populus trichocarpa and genotyped 120 trees from 10 subpopulations using 29 213 single-nucleotide polymorphisms.\ud •Significant geographic differentiation was present at multiple spatial scales, and range-wide latitudinal allele frequency gradients were strikingly common across the genome. The decay of linkage disequilibrium with physical distance was slower than expected from previous studies in Populus, with r² dropping below 0.2 within 3–6 kb. Consistent with this, estimates of recent effective population size from linkage disequilibrium (N[subscript e] ≈ 4000–6000) were remarkably low relative to the large census sizes of P. trichocarpa stands. Fine-scale rates of recombination varied widely across the genome, but were largely predictable on the basis of DNA sequence and methylation features.\ud •Our results suggest that genetic drift has played a significant role in the recent evolutionary history of P. trichocarpa. Most importantly, the extensive linkage disequilibrium detected suggests that genome-wide association studies and genomic selection in undomesticated populations may be more feasible in Populus than previously assumed

CiteSeerX

Crossref

Aberystwyth Research Portal

ScholarsArchive@OSU

RMIT Research Repository

Statistical advances and challenges for analyzing correlated high dimensional SNP data in genomic study for complex diseases

Author: Kelemen Arpad
Liang Yulan
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 28/03/2008
Field of study

Recent advances of information technology in biomedical sciences and other applied areas have created numerous large diverse data sets with a high dimensional feature space, which provide us a tremendous amount of information and new opportunities for improving the quality of human life. Meanwhile, great challenges are also created driven by the continuous arrival of new data that requires researchers to convert these raw data into scientific knowledge in order to benefit from it. Association studies of complex diseases using SNP data have become more and more popular in biomedical research in recent years. In this paper, we present a review of recent statistical advances and challenges for analyzing correlated high dimensional SNP data in genomic association studies for complex diseases. The review includes both general feature reduction approaches for high dimensional correlated data and more specific approaches for SNPs data, which include unsupervised haplotype mapping, tag SNP selection, and supervised SNPs selection using statistical testing/scoring, statistical modeling and machine learning methods with an emphasis on how to identify interacting loci.Comment: Published in at http://dx.doi.org/10.1214/07-SS026 the Statistics Surveys (http://www.i-journals.org/ss/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

Crossref

Haplotype-tagged SNPs improve genomic prediction accuracy for Fusarium head blight resistance and yield-related traits in wheat

Author: Abebe Admas Alemu
Batista Lorena
Ceplitis Alf
Chawade Aakash
Singh Pawan K.
Publication venue
Publication date: 01/01/2023
Field of study

Genomic prediction is a powerful tool to enhance genetic gain in plant breeding. However, the method is accompanied by various complications leading to low prediction accuracy. One of the major challenges arises from the complex dimensionality of marker data. To overcome this issue, we applied two pre-selection methods for SNP markers viz. LD-based haplotype-tagging and GWAS-based trait-linked marker identification. Six different models were tested with preselected SNPs to predict the genomic estimated breeding values (GEBVs) of four traits measured in 419 winter wheat genotypes. Ten different sets of haplotype-tagged SNPs were selected by adjusting the level of LD thresholds. In addition, various sets of trait-linked SNPs were identified with different scenarios from the training-test combined and only from the training populations. The BRR and RR-BLUP models developed from haplotype-tagged SNPs had a higher prediction accuracy for FHB and SPW by 0.07 and 0.092, respectively, compared to the corresponding models developed without marker pre-selection. The highest prediction accuracy for SPW and FHB was achieved with tagged SNPs pruned at weak LD thresholds (r

Epsilon Open Archive

Using Population Mixtures to Optimize the Utility of Genomic Databases: Linkage Disequilibrium and Association Study Design in India

Author: Conrad D. F.
Coop G.
Jakobsson Mattias
Patel P. I.
Pemberton Trevor J.
Pritchard Jonathan K.
Rosenberg Noah A.
Wall J. D.
Publication venue: 'Wiley'
Publication date: 01/07/2008
Field of study

When performing association studies in populations that have not been the focus of large-scale investigations of haplotype variation, it is often helpful to rely on genomic databases in other populations for study design and analysis – such as in the selection of tag SNPs and in the imputation of missing genotypes. One way of improving the use of these databases is to rely on a mixture of database samples that is similar to the population of interest, rather than using the single most similar database sample. We demonstrate the effectiveness of the mixture approach in the application of African, European, and East Asian HapMap samples for tag SNP selection in populations from India, a genetically intermediate region underrepresented in genomic studies of haplotype variation.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/65949/1/j.1469-1809.2008.00457.x.pd

Crossref

PubMed Central

Deep Blue Documents at the University of Michigan

Evolutionary Signatures of Common Human Cis-Regulatory Haplotypes

Author: Krontiris Theodore G.
Ouyang Ching
Smith David D.
Publication venue: Public Library of Science
Publication date: 01/01/2008
Field of study

Variation in gene expression may give rise to a significant fraction of inter-individual phenotypic variation. Studies searching for the underlying genetic controls for such variation have been conducted in model organisms and humans in recent years. In our previous effort of assessing conserved underlying haplotype patterns across ethnic populations, we constructed common haplotypes using SNPs having conserved linkage disequilibrium (LD) across ethnic populations. These common haplotypes cluster into a simple evolutionary structure based on their frequencies, defining only up to three conserved clusters termed ‘haplotype frameworks’. One intriguing preliminary finding was that a significant portion of reported variants strongly associated with cis-regulation tags these globally conserved haplotype frameworks. Here we expand the investigation by collecting genes showing stringently determined cis-association between genotypes and expression phenotypes from major studies. We conducted phylogenetic analysis of current major haplotypes along with the corresponding haplotypes derived from chimpanzee reference sequences. Our analysis reveals that, for the vast majority of such cis-regulatory genes, the tagging SNPs showing the strongest association also tag the haplotype lineages directly separated from ancestry, inferred from either chimpanzee reference sequences or the allele frequency-derived haplotype frameworks, suggesting that the differentially expressed phenotypes were evolved relatively early in human history. Such evolutionary signatures provide keys for a more effective identification of globally-conserved candidate regulatory haplotypes across human genes in future epidemiologic and pharmacogenetic studies

CiteSeerX

PubMed Central

Assessing the power of tag SNPs in the mapping of quantitative trait loci (QTL) with extremal and random samples

Author: Sun Fengzhu
Zhang Kui
Publication venue: BioMed Central
Publication date: 01/01/2005
Field of study

BACKGROUND: Recent studies have indicated that the human genome could be divided into regions with low haplotype diversity interspersed with regions of high haplotype diversity. In regions of low haplotype diversity, a small fraction of SNPs (tag SNPs) are sufficient to account for most of the haplotype diversity of the human genome. These tag SNPs can be extremely useful for testing the association of a marker locus with a qualitative or quantitative trait locus in that it may not be necessary to genotype all the SNPs. When tag SNPs are used to reduce the genotyping effort in association studies, it is important to know how much power is lost. It is also important to know how much power is gained when tag SNPs instead of the same number of randomly chosen SNPs are used. RESULTS: We design a simulation study to tackle these problems for a variety of quantitative association tests using either case-parent samples or unrelated population samples. First, the samples are generated based on the quantitative trait model with the assumption of either an extremal sampling scheme or a random sampling scheme. Second, a small number of samples are selected to determine the haplotype blocks and the tag SNPs. Third, the statistical power of the tests is evaluated using four kinds of data: (1) all the SNPs and the corresponding haplotypes, (2) the tag SNPs and the corresponding haplotypes, (3) the same number of evenly spaced SNPs with minor allele frequency greater than a threshold and the corresponding haplotypes, (4) the same number of randomly chosen SNPs and their corresponding haplotypes. CONCLUSION: Our results suggest that in most situations genotyping efforts can be significantly reduced by using tag SNPs for mapping the QTL in association studies without much loss of power, which is consistent with previous studies on association mapping of qualitative traits. For all situations considered, two-locus haplotype analysis using tag SNPs are more powerful than those using the same number of randomly selected SNPs, but the degree of such power differences depends upon the sampling scheme and the population history

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

An evaluation of the performance of HapMap SNP data in a Shanghai Chinese population: Analyses of allele frequency, linkage disequilibrium pattern and tagging SNPs transferability on chromosome 1q21-q25

Author: Hu Cheng
Jia Weiping
Ma Xiaojing
Wang Congrong
Wang Jie
Xiang Kunsan
Zhang Rong
Zhang Weihua
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background The HapMap project aimed to catalog millions of common single nucleotide polymorphisms (SNPs) in the human genome in four major populations, in order to facilitate association studies of complex diseases. To examine the transferability of Han Chinese in Beijing HapMap data to the Southern Han Chinese in Shanghai, we performed comparative analyses between genotypes from over 4,500 SNPs in a 21 Mb region on chromosome 1q21-q25 in 80 unrelated Shanghai Chinese and 45 HapMap Chinese data. Results Three thousand and forty-two SNPs were analyzed after removal of SNPs that failed quality control and those not in the HapMap panel. We compared the allele frequency distributions, linkage disequilibrium patterns, haplotype frequency distributions and tagging SNP sets transferability between the HapMap population and Shanghai Chinese population. Among the four HapMap populations, Beijing Chinese showed the best correlation with Shanghai population on allele frequencies, linkage disequilibrium and haplotype frequencies. Tagging SNP sets selected from four HapMap populations at different thresholds were evaluated in the Shanghai sample. Under the threshold of r2 equal to 0.8 or 0.5, both HapMap Chinese and Japanese data showed better coverage and tagging efficiency than Caucasian and African data. Conclusion Our study supported the applicability of HapMap Beijing Chinese SNP data to the study of complex diseases among southern Chinese population.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Global haplotype partitioning for maximal associated SNP pairs

Author: A Indap
A Smith
Ali Katanforoush
B Huang
BL Browning
C Bardel
C Carlson
C Coulonges
C Durrant
C Li
C Pattaro
C Zapata
D Hinds
E Feingold
EC Anderson
Elahe Elahi
F Yates
G Hellenthal
G McVean
Hamid Pezeshk
J He
J Hwang
J Li
J Wall
JC Barrett
K Ding
K Ding
K Zhang
M Nothnagel
Mehdi Sadeghi
MJ Daly
N Patil
N Wang
N Zhou
P Fearnhead
RC Lewontin
RR Hudson
S Climer
S Gabriel
S Gu
S Lydersen
S Myers
The International HapMap Consortium
V Hasselblad
Y Zhao
Z Ding
Z Qin
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background Global partitioning based on pairwise associations of SNPs has not previously been used to define haplotype blocks within genomes. Here, we define an association index based on LD between SNP pairs. We use the Fisher's exact test to assess the statistical significance of the LD estimator. By this test, each SNP pair is characterized as associated, independent, or not-statistically-significant. We set limits on the maximum acceptable proportion of independent pairs within all blocks and search for the partitioning with maximal proportion of associated SNP pairs. Essentially, this model is reduced to a constrained optimization problem, the solution of which is obtained by iterating a dynamic programming algorithm. Results In comparison with other methods, our algorithm reports blocks of larger average size. Nevertheless, the haplotype diversity within the blocks is captured by a small number of tagSNPs. Resampling HapMap haplotypes under a block-based model of recombination showed that our algorithm is robust in reproducing the same partitioning for recombinant samples. Our algorithm performed better than previously reported models in a case-control association study aimed at mapping a single locus trait, based on simulation results that were evaluated by a block-based statistical test. Compared to methods of haplotype block partitioning, we performed best on detection of recombination hotspots. Conclusion Our proposed method divides chromosomes into the regions within which allelic associations of SNP pairs are maximized. This approach presents a native design for dimension reduction in genome-wide association studies. Our results show that the pairwise allelic association of SNPs can describe various features of genomic variation, in particular recombination hotspots.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Allele Frequency Matching Between SNPs Reveals an Excess of Linkage Disequilibrium in Genic Regions of the Human Genome

Author: Chimpanzee Sequencing and Analysis Consortium
Deborah A Nickerson
Leonid Kruglyak
Mark J Rieder
Michael A Eberle
Wayne N Frankel
Publication venue: Public Library of Science
Publication date: 01/09/2006
Field of study

Significant interest has emerged in mapping genetic susceptibility for complex traits through whole-genome association studies. These studies rely on the extent of association, i.e., linkage disequilibrium (LD), between single nucleotide polymorphisms (SNPs) across the human genome. LD describes the nonrandom association between SNP pairs and can be used as a metric when designing maximally informative panels of SNPs for association studies in human populations. Using data from the 1.58 million SNPs genotyped by Perlegen, we explored the allele frequency dependence of the LD statistic r (2) both empirically and theoretically. We show that average r (2) values between SNPs unmatched for allele frequency are always limited to much less than 1 (theoretical [Image: see text] approximately 0.46 to 0.57 for this dataset). Frequency matching of SNP pairs provides a more sensitive measure for assessing the average decay of LD and generates average r (2) values across nearly the entire informative range (from 0 to 0.89 through 0.95). Additionally, we analyzed the extent of perfect LD (r (2) = 1.0) using frequency-matched SNPs and found significant differences in the extent of LD in genic regions versus intergenic regions. The SNP pairs exhibiting perfect LD showed a significant bias for derived, nonancestral alleles, providing evidence for positive natural selection in the human genome

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

eScholarship - University of California