Search CORE

9,022 research outputs found

Accurate Liability Estimation Improves Power in Ascertained Case Control Studies

Author: AL Price
AL Price
C Lippert
C Widmer
Christoph Lippert
D Golan
D Welter
Dan Geiger
David Heckerman
DJ Balding
ER Dempster
J Listgarten
J Yang
J Yang
J Yang
LA Hindorff
LC Tsoi
M Fakiola
N Fusi
N Patterson
N Zaitlen
N Zaitlen
Omer Weissbrod
S Sawcer
S Wright
SH Lee
X Zhou
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/04/2015
Field of study

Linear mixed models (LMMs) have emerged as the method of choice for confounded genome-wide association studies. However, the performance of LMMs in non-randomly ascertained case-control studies deteriorates with increasing sample size. We propose a framework called LEAP (Liability Estimator As a Phenotype, https://github.com/omerwe/LEAP) that tests for association with estimated latent values corresponding to severity of phenotype, and demonstrate that this can lead to a substantial power increase

arXiv.org e-Print Archive

Crossref

MDC Repository

Using GWAS Data to Identify Copy Number Variants Contributing to Common Complex Diseases

Author: Teslovich Tanya M.
Zöllner Sebastian
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 25/10/2010
Field of study

Copy number variants (CNVs) account for more polymorphic base pairs in the human genome than do single nucleotide polymorphisms (SNPs). CNVs encompass genes as well as noncoding DNA, making these polymorphisms good candidates for functional variation. Consequently, most modern genome-wide association studies test CNVs along with SNPs, after inferring copy number status from the data generated by high-throughput genotyping platforms. Here we give an overview of CNV genomics in humans, highlighting patterns that inform methods for identifying CNVs. We describe how genotyping signals are used to identify CNVs and provide an overview of existing statistical models and methods used to infer location and carrier status from such data, especially the most commonly used methods exploring hybridization intensity. We compare the power of such methods with the alternative method of using tag SNPs to identify CNV carriers. As such methods are only powerful when applied to common CNVs, we describe two alternative approaches that can be informative for identifying rare CNVs contributing to disease risk. We focus particularly on methods identifying de novo CNVs and show that such methods can be more powerful than case-control designs. Finally we present some recommendations for identifying CNVs contributing to common complex disorders.Comment: Published in at http://dx.doi.org/10.1214/09-STS304 the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

Crossref

Prediction of HLA class II alleles using SNPs in an African population

Author: Adeyemo Adebowale
Aseffa Abraham
Davey Gail
Finan Chris
Hailu Elena
Newport Melanie J
Rotimi Charles N
Tekola Ayele Fasil
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2012
Field of study

BACKGROUND: Despite the importance of the human leukocyte antigen (HLA) gene locus in research and clinical practice, direct HLA typing is laborious and expensive. Furthermore, the analysis requires specialized software and expertise which are unavailable in most developing country settings. Recently, in silico methods have been developed for predicting HLA alleles using single nucleotide polymorphisms (SNPs). However, the utility of these methods in African populations has not been systematically evaluated. METHODOLOGY/PRINCIPAL FINDINGS: In the present study, we investigate prediction of HLA class II (HLA-DRB1 and HLA-DQB1) alleles using SNPs in the Wolaita population, southern Ethiopia. The subjects comprised 297 Ethiopians with genome-wide SNP data, of whom 188 had also been HLA typed and were used for training and testing the model. The 109 subjects with SNP data alone were used for empirical prediction using the multi-allelic gene prediction method. We evaluated accuracy of the prediction, agreement between predicted and HLA typed alleles, and discriminative ability of the prediction probability supplied by the model. We found that the model predicted intermediate (two-digit) resolution for HLA-DRB1 and HLA-DQB1 alleles at accuracy levels of 96% and 87%, respectively. All measures of performance showed high accuracy and reliability for prediction. The distribution of the majority of HLA alleles in the study was similar to that previously reported for the Oromo and Amhara ethnic groups from Ethiopia. CONCLUSIONS/SIGNIFICANCE: We demonstrate that HLA class II alleles can be predicted from SNP genotype data with a high level of accuracy at intermediate (two-digit) resolution in an African population. This finding offers new opportunities for HLA studies of disease epidemiology and population genetics in developing countrie

Directory of Open Access Journals

PubMed Central

Sussex Research Online

FigShare

GBS-SNP-CROP: a reference-optional pipeline for SNP discovery and plant germplasm characterization using variable length, paired-end genotyping-by-sequencing data

Author: Bartaula Radhika
Hale Iago L.
Tavares De Oliveira Melo Arthur
Publication venue: University of New Hampshire Scholars\u27 Repository
Publication date: 12/01/2016
Field of study

Background: With its simple library preparation and robust approach to genome reduction, genotyping-by-sequencing (GBS) is a flexible and cost-effective strategy for SNP discovery and genotyping, provided an appropriate reference genome is available. For resource-limited curation, research, and breeding programs of underutilized plant genetic resources, however, even low-depth references may not be within reach, despite declining sequencing costs. Such programs would find value in an open-source bioinformatics pipeline that can maximize GBS data usage and perform high-density SNP genotyping in the absence of a reference. Results: The GBS SNP-Calling Reference Optional Pipeline (GBS-SNP-CROP) developed and presented here adopts a clustering strategy to build a population-tailored “Mock Reference” from the same GBS data used for downstream SNP calling and genotyping. Designed for libraries of paired-end (PE) reads, GBS-SNP-CROP maximizes data usage by eliminating unnecessary data culling due to imposed read-length uniformity requirements. Using 150 bp PE reads from a GBS library of 48 accessions of tetraploid kiwiberry (Actinidia arguta), GBS-SNP-CROP yielded on average three times as many SNPs as TASSEL-GBS analyses (32 and 64 bp tag lengths) and over 18 times as many as TASSEL-UNEAK, with fewer genotyping errors in all cases, as evidenced by comparing the genotypic characterizations of biological replicates. Using the published reference genome of a related diploid species (A. chinensis), the reference-based version of GBS-SNP-CROP behaved similarly to TASSEL-GBS in terms of the number of SNPs called but had an improved read depth distribution and fewer genotyping errors. Our results also indicate that the sets of SNPs detected by the different pipelines above are largely orthogonal to one another; thus GBS-SNP-CROP may be used to augment the results of alternative analyses, whether or not a reference is available. Conclusions: By achieving high-density SNP genotyping in populations for which no reference genome is available, GBS-SNP-CROP is worth consideration by curators, researchers, and breeders of under-researched plant genetic resources. In cases where a reference is available, especially if from a related species or when the target population is particularly diverse, GBS-SNP-CROP may complement other reference-based pipelines by extracting more information per sequencing dollar spent. The current version of GBS-SNP-CROP is available at https://github.com/halelab/GBS-SNP-CROP.gi

Springer - Publisher Connector

PubMed Central

UNH Scholars' Repository

Efficient inference for genetic association studies with multiple outcomes

Author: Davison Anthony C.
Hager Jörg
Irincheeva Irina
Ruffieux Hélène
Publication venue: 'Oxford University Press (OUP)'
Publication date: 20/03/2017
Field of study

Combined inference for heterogeneous high-dimensional data is critical in modern biology, where clinical and various kinds of molecular data may be available from a single study. Classical genetic association studies regress a single clinical outcome on many genetic variants one by one, but there is an increasing demand for joint analysis of many molecular outcomes and genetic variants in order to unravel functional interactions. Unfortunately, most existing approaches to joint modelling are either too simplistic to be powerful or are impracticable for computational reasons. Inspired by Richardson et al. (2010, Bayesian Statistics 9), we consider a sparse multivariate regression model that allows simultaneous selection of predictors and associated responses. As Markov chain Monte Carlo (MCMC) inference on such models can be prohibitively slow when the number of genetic variants exceeds a few thousand, we propose a variational inference approach which produces posterior information very close to that of MCMC inference, at a much reduced computational cost. Extensive numerical experiments show that our approach outperforms popular variable selection methods and tailored Bayesian procedures, dealing within hours with problems involving hundreds of thousands of genetic variants and tens to hundreds of clinical or molecular outcomes

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

The Metabochip, a Custom Genotyping Array for Genetic Studies of Metabolic, Cardiovascular, and Anthropometric Traits

Author: Andrew P. Morris
Anne U. Jackson
Antonella Mulas
Arne Pfeufer
Benjamin F
Benjamin F. Voight
Cameron D. Palmer
Cameron D. Palmer
Carlo Sidore
Carlo Sidore
Cecilia M. Lindgren
Christian Fuchsberger
Christopher Newton-cheh
Citable Link
Citation Voight
David Altshuler
Et Al
Francesco Cucca
Gonçalo R. Abecasis
Heribert Schunkert
Hyun Min Kang
Hyun Min Kang
Inga Prokopenko
Iris M. Heid
Jeanette Erdmann
Joel N. Hirschhorn
Joshua C. R
Jun Ding
Jun Ding
Kathleen Stirrups
Mark I. Mccarthy
Melissa Parkin
N. William Rayner
Neil Robertson
Nicole Soranzo Elizabeth K. Speliotes
Nilesh J. Samani
Noël P. Burtt
Noël P. Burtt
Panos Deloukas
Patricia B. Munroe
Peter S. Chines
Peter S. Chines
Ramaiah Nagaraja
Richa Saxena
Ruth J. F. Loos
Sekar Kathiresan
Serena Sanna
Simon Potter
Timothy M. Frayling
Toby Johnson
Tuomas O. Kilpeläinen
Wendy Winckler
Yanming Li
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2012
Field of study

PMCID: PMC3410907This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited

Directory of Open Access Journals

Spiral - Imperial College Digital Repository

PuSH

Queen Mary Research Online

Public Library of Science (PLOS)

CiteSeerX

Crossref

Harvard University - DASH

PubMed Central

Copenhagen University Research Information System

Oxford University Research Archive

Leicester Research Archive