Search CORE

16,705 research outputs found

A model-based approach to selection of tag SNPs

Author: A Barron
A Thomas
AP Dempster
B Halldórsson
BV Halldórsson
CE Shannon
CS Carlson
CS Carlson
D Botstein
DC Crawford
DC Crawford
EC Anderson
Fengzhu Sun
G Schwarz
GA McVean
H Akaike
H Mannila
J Besag
JD Wall
JD Wall
JFC Kingman
JN Hirschhorn
K Zhang
K Zhang
K Zhang
L Breiman
L Excoffier
L Li
LE Baum
Lei M Li
LR Rabiner
M Koivisto
M Nothnagel
M Stephens
MJ Daly
N Li
N Patil
Pierre Nicolas
S Lin
SB Gabriel
SE Ptak
T Niu
TG Schulze
The International HapMap Consortium
TM Cover
W Zhai
X Ke
X Sun
Z Liu
Z Meng
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

BACKGROUND: Single Nucleotide Polymorphisms (SNPs) are the most common type of polymorphisms found in the human genome. Effective genetic association studies require the identification of sets of tag SNPs that capture as much haplotype information as possible. Tag SNP selection is analogous to the problem of data compression in information theory. According to Shannon's framework, the optimal tag set maximizes the entropy of the tag SNPs subject to constraints on the number of SNPs. This approach requires an appropriate probabilistic model. Compared to simple measures of Linkage Disequilibrium (LD), a good model of haplotype sequences can more accurately account for LD structure. It also provides a machinery for the prediction of tagged SNPs and thereby to assess the performances of tag sets through their ability to predict larger SNP sets. RESULTS: Here, we compute the description code-lengths of SNP data for an array of models and we develop tag SNP selection methods based on these models and the strategy of entropy maximization. Using data sets from the HapMap and ENCODE projects, we show that the hidden Markov model introduced by Li and Stephens outperforms the other models in several aspects: description code-length of SNP data, information content of tag sets, and prediction of tagged SNPs. This is the first use of this model in the context of tag SNP selection. CONCLUSION: Our study provides strong evidence that the tag sets selected by our best method, based on Li and Stephens model, outperform those chosen by several existing methods. The results also suggest that information content evaluated with a good model is more sensitive for assessing the quality of a tagging set than the correct prediction rate of tagged SNPs. Besides, we show that haplotype phase uncertainty has an almost negligible impact on the ability of good tag sets to predict tagged SNPs. This justifies the selection of tag SNPs on the basis of haplotype informativeness, although genotyping studies do not directly assess haplotypes. A software that implements our approach is available

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

HAL Descartes

Hal-Diderot

Using GWAS Data to Identify Copy Number Variants Contributing to Common Complex Diseases

Author: Teslovich Tanya M.
Zöllner Sebastian
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 25/10/2010
Field of study

Copy number variants (CNVs) account for more polymorphic base pairs in the human genome than do single nucleotide polymorphisms (SNPs). CNVs encompass genes as well as noncoding DNA, making these polymorphisms good candidates for functional variation. Consequently, most modern genome-wide association studies test CNVs along with SNPs, after inferring copy number status from the data generated by high-throughput genotyping platforms. Here we give an overview of CNV genomics in humans, highlighting patterns that inform methods for identifying CNVs. We describe how genotyping signals are used to identify CNVs and provide an overview of existing statistical models and methods used to infer location and carrier status from such data, especially the most commonly used methods exploring hybridization intensity. We compare the power of such methods with the alternative method of using tag SNPs to identify CNV carriers. As such methods are only powerful when applied to common CNVs, we describe two alternative approaches that can be informative for identifying rare CNVs contributing to disease risk. We focus particularly on methods identifying de novo CNVs and show that such methods can be more powerful than case-control designs. Finally we present some recommendations for identifying CNVs contributing to common complex disorders.Comment: Published in at http://dx.doi.org/10.1214/09-STS304 the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

Crossref

Accurate Liability Estimation Improves Power in Ascertained Case Control Studies

Author: AL Price
AL Price
C Lippert
C Widmer
Christoph Lippert
D Golan
D Welter
Dan Geiger
David Heckerman
DJ Balding
ER Dempster
J Listgarten
J Yang
J Yang
J Yang
LA Hindorff
LC Tsoi
M Fakiola
N Fusi
N Patterson
N Zaitlen
N Zaitlen
Omer Weissbrod
S Sawcer
S Wright
SH Lee
X Zhou
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/04/2015
Field of study

Linear mixed models (LMMs) have emerged as the method of choice for confounded genome-wide association studies. However, the performance of LMMs in non-randomly ascertained case-control studies deteriorates with increasing sample size. We propose a framework called LEAP (Liability Estimator As a Phenotype, https://github.com/omerwe/LEAP) that tests for association with estimated latent values corresponding to severity of phenotype, and demonstrate that this can lead to a substantial power increase

arXiv.org e-Print Archive

Crossref

MDC Repository

Fine-mapping of prostate cancer susceptibility loci in a large meta-analysis identifies candidate causal variants

Author: Andriole Gerald
Drake Bettina F.
Klim Aleksandra
Publication venue: Digital Commons@Becker
Publication date: 01/01/2018
Field of study

Digital Commons@Becker

Prediction of HLA class II alleles using SNPs in an African population

Author: Adeyemo Adebowale
Aseffa Abraham
Davey Gail
Finan Chris
Hailu Elena
Newport Melanie J
Rotimi Charles N
Tekola Ayele Fasil
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2012
Field of study

BACKGROUND: Despite the importance of the human leukocyte antigen (HLA) gene locus in research and clinical practice, direct HLA typing is laborious and expensive. Furthermore, the analysis requires specialized software and expertise which are unavailable in most developing country settings. Recently, in silico methods have been developed for predicting HLA alleles using single nucleotide polymorphisms (SNPs). However, the utility of these methods in African populations has not been systematically evaluated. METHODOLOGY/PRINCIPAL FINDINGS: In the present study, we investigate prediction of HLA class II (HLA-DRB1 and HLA-DQB1) alleles using SNPs in the Wolaita population, southern Ethiopia. The subjects comprised 297 Ethiopians with genome-wide SNP data, of whom 188 had also been HLA typed and were used for training and testing the model. The 109 subjects with SNP data alone were used for empirical prediction using the multi-allelic gene prediction method. We evaluated accuracy of the prediction, agreement between predicted and HLA typed alleles, and discriminative ability of the prediction probability supplied by the model. We found that the model predicted intermediate (two-digit) resolution for HLA-DRB1 and HLA-DQB1 alleles at accuracy levels of 96% and 87%, respectively. All measures of performance showed high accuracy and reliability for prediction. The distribution of the majority of HLA alleles in the study was similar to that previously reported for the Oromo and Amhara ethnic groups from Ethiopia. CONCLUSIONS/SIGNIFICANCE: We demonstrate that HLA class II alleles can be predicted from SNP genotype data with a high level of accuracy at intermediate (two-digit) resolution in an African population. This finding offers new opportunities for HLA studies of disease epidemiology and population genetics in developing countrie

Crossref

Directory of Open Access Journals

PubMed Central

Sussex Research Online

FigShare

Recommended from our members

Effect of natural genetic variation on enhancer selection and function.

Author: Allison KA
Benner C
Glass CK
Heinz S
Kaikkonen MU
Orozco LD
Romanoski CE
Publication venue: eScholarship, University of California
Publication date: 01/11/2013
Field of study

The mechanisms by which genetic variation affects transcription regulation and phenotypes at the nucleotide level are incompletely understood. Here we use natural genetic variation as an in vivo mutagenesis screen to assess the genome-wide effects of sequence variation on lineage-determining and signal-specific transcription factor binding, epigenomics and transcriptional outcomes in primary macrophages from different mouse strains. We find substantial genetic evidence to support the concept that lineage-determining transcription factors define epigenetic and transcriptomic states by selecting enhancer-like regions in the genome in a collaborative fashion and facilitating binding of signal-dependent factors. This hierarchical model of transcription factor function suggests that limited sets of genomic data for lineage-determining transcription factors and informative histone modifications can be used for the prioritization of disease-associated regulatory variants

eScholarship - University of California