Search CORE

Inference of locus-specific ancestry in closely related populations

Author: B. Pasaniuc
E. Halperin
Falush
G. Kimmel
Hoggart
Kennedy
Li
Li
Marchini
Patterson
Pei
Price
Pritchard
Reich
S. Sankararaman
Tang
Tang
Tang
Zhu
Publication venue: Oxford University Press
Publication date: 01/06/2009
Field of study

A characterization of the genetic variation of recently admixed populations may reveal historical population events, and is useful for the detection of single nucleotide polymorphisms (SNPs) associated with diseases through association studies and admixture mapping. Inference of locus-specific ancestry is key to our understanding of the genetic variation of such populations. While a number of methods for the inference of locus-specific ancestry are accurate when the ancestral populations are quite distant (e.g. African–Americans), current methods incur a large error rate when inferring the locus-specific ancestry in admixed populations where the ancestral populations are closely related (e.g. Americans of European descent)

MGMR: leveraging RNA-Seq population data to optimize expression estimation

Author: A Oshlack
B Li
B Li
B Pasaniuc
C Trapnell
Eran Halperin
JH Bullard
JK Pickrell
KA et al. Frazer
L Pachter
MD Robinson
Ron Shamir
Roye Rozov
SB Montgomery
TP Minka
Publication venue: BioMed Central
Publication date: 01/04/2012
Field of study

Abstract Background RNA-Seq is a technique that uses Next Generation Sequencing to identify transcripts and estimate transcription levels. When applying this technique for quantification, one must contend with reads that align to multiple positions in the genome (multireads). Previous efforts to resolve multireads have shown that RNA-Seq expression estimation can be improved using probabilistic allocation of reads to genes. These methods use a probabilistic generative model for data generation and resolve ambiguity using likelihood-based approaches. In many instances, RNA-seq experiments are performed in the context of a population. The generative models of current methods do not take into account such population information, and it is an open question whether this information can improve quantification of the individual samples Results In order to explore the contribution of population level information in RNA-seq quantification, we apply a hierarchical probabilistic generative model, which assumes that expression levels of different individuals are sampled from a Dirichlet distribution with parameters specific to the population, and reads are sampled from the distribution of expression levels. We introduce an optimization procedure for the estimation of the model parameters, and use HapMap data and simulated data to demonstrate that the model yields a significant improvement in the accuracy of expression levels of paralogous genes. Conclusions We provide a proof of principal of the benefit of drawing on population commonalities to estimate expression. The results of our experiments demonstrate this approach can be beneficial, primarily for estimation at the gene level.</p

Directory of Open Access Journals

Rapid genotype imputation from sequence without reference panels

Author: A McKenna
AH Freedman
B Howie
B Pasaniuc
B Yalcin
BE Huang
BN Howie
D Welter
G Lunter
H Li
HD Daetwyler
Jonathan Flint
JP Didion
M Sargolzaei
MA DePristo
O Delaneau
P Scheet
PM VanRaden
R VanBuren
Richard Mott
Robert W Davies
Simon Myers
SR Browning
TM Keane
Y Li
Publication venue
Publication date: 01/01/2016
Field of study

Inexpensive genotyping methods are essential for genetic studies requiring large sample sizes. In human studies, array-based microarrays and high-density haplotype reference panels allow efficient genotype imputation for this purpose. However, these resources are typically unavailable in non-human settings. Here we describe a method (STITCH) for imputation based only on sequencing read data, without requiring additional reference panels or array data. We demonstrate its applicability even in settings of extremely low sequencing coverage, by accurately imputing 5.7 million SNPs at a mean r(2) value of 0.98 in 2,073 outbred laboratory mice (0.15× sequencing coverage). In a sample of 11,670 Han Chinese (1.7× coverage), we achieve accuracy similar to that of alternative approaches that require a reference panel, demonstrating that our approach can work for genetically diverse populations. Our method enables straightforward progression from low-coverage sequence to imputed genotypes, overcoming barriers that at present restrict the application of genome-wide association study technology outside humans

UCL Discovery

Oxford University Research Archive

Using Extended Genealogy to Estimate Components of Heritability for 23 Quantitative and Dichotomous Traits

Author: A Gusev
A Helgason
A Kong
AL Price
Alkes L. Price
B Maher
B Pasaniuc
B Towne
BJ Hayes
Bogdan Pasaniuc
DB Goldstein
EA Stahl
EE Eichler
G Gibson
G Pilia
Gaurav Bhatia
HC So
HM Kang
IJ Deary
IJ Deary
J McClellan
J Yang
J Yang
J Yang
JE Powell
JM Murabito
K Silventoinen
KS Kendler
LA Hindorff
MF Feitosa
Nick Patterson
Noah Zaitlen
NR Wray
O Zuk
Peter Kraft
Peter M. Visscher
PM Visscher
PM Visscher
PM Visscher
PM Visscher
PM Visscher
S Vattikuti
Samuela Pollack
SH Lee
SP Dickson
SR Browning
SR Browning
TA Manolio
WG Hill
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/09/2012
Field of study

Important knowledge about the determinants of complex human phenotypes can be obtained from the estimation of heritability, the fraction of phenotypic variation in a population that is determined by genetic factors. Here, we make use of extensive phenotype data in Iceland, long-range phased genotypes, and a population-wide genealogical database to examine the heritability of 11 quantitative and 12 dichotomous phenotypes in a sample of 38,167 individuals. Most previous estimates of heritability are derived from family-based approaches such as twin studies, which may be biased upwards by epistatic interactions or shared environment. Our estimates of heritability, based on both closely and distantly related pairs of individuals, are significantly lower than those from previous studies. We examine phenotypic correlations across a range of relationships, from siblings to first cousins, and find that the excess phenotypic correlation in these related individuals is predominantly due to shared environment as opposed to dominance or epistasis. We also develop a new method to jointly estimate narrow-sense heritability and the heritability explained by genotyped SNPs. Unlike existing methods, this approach permits the use of information from both closely and distantly related pairs of individuals, thereby reducing the variance of estimates of heritability explained by genotyped SNPs while preventing upward bias. Our results show that common SNPs explain a larger proportion of the heritability than previously thought, with SNPs present on Illumina 300K genotyping arrays explaining more than half of the heritability for the 23 phenotypes examined in this study. Much of the remaining heritability is likely to be due to rare alleles that are not captured by standard genotyping arrays

DSpace@MIT

Harvard University - DASH

Directory of Open Access Journals

Large-scale transcriptome-wide association study identifies new prostate cancer risk regions

Author: Eeles R
Freedman M
Gayther S
Gusev A
Haiman C
Kote-Jarai Z
Mancuso N
Pasaniuc B
Penney KL
Zheng W
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 28/10/2022
Field of study

Although genome-wide association studies (GWAS) for prostate cancer (PrCa) have identified more than 100 risk regions, most of the risk genes at these regions remain largely unknown. Here we integrate the largest PrCa GWAS (N = 142,392) with gene expression measured in 45 tissues (N = 4458), including normal and tumor prostate, to perform a multi-tissue transcriptome-wide association study (TWAS) for PrCa. We identify 217 genes at 84 independent 1 Mb regions associated with PrCa risk, 9 of which are regions with no genome-wide significant SNP within 2 Mb. 23 genes are significant in TWAS only for alternative splicing models in prostate tumor thus supporting the hypothesis of splicing driving risk for continued oncogenesis. Finally, we use a Bayesian probabilistic approach to estimate credible sets of genes containing the causal gene at a pre-defined level; this reduced the list of 217 associations to 109 genes in the 90% credible set. Overall, our findings highlight the power of integrating expression with PrCa GWAS to identify novel risk loci and prioritize putative causal genes at known risk loci

UTUPub

Phenotype-Specific Enrichment of Mendelian Disorder Genes near GWAS Regions across 62 Complex Traits

Author: Arboleda V.A.
Burch K.S.
Freund M.K.
Garske K.M.
Kichaev G.
Laakso M.
Mancuso N.
Miao Z.
Mohlke K.L.
Pajukanta P.
Pan D.Z.
Pasaniuc B.
Shi H.
Publication venue: Cell Press
Publication date: 01/01/2018
Field of study

Although recent studies provide evidence for a common genetic basis between complex traits and Mendelian disorders, a thorough quantification of their overlap in a phenotype-specific manner remains elusive. Here, we have quantified the overlap of genes identified through large-scale genome-wide association studies (GWASs) for 62 complex traits and diseases with genes containing mutations known to cause 20 broad categories of Mendelian disorders. We identified a significant enrichment of genes linked to phenotypically matched Mendelian disorders in GWAS gene sets; of the total 1,240 comparisons, a higher proportion of phenotypically matched or related pairs (n = 50 of 92 [54%]) than phenotypically unmatched pairs (n = 27 of 1,148 [2%]) demonstrated significant overlap, confirming a phenotype-specific enrichment pattern. Further, we observed elevated GWAS effect sizes near genes linked to phenotypically matched Mendelian disorders. Finally, we report examples of GWAS variants localized at the transcription start site or physically interacting with the promoters of genes linked to phenotypically matched Mendelian disorders. Our results are consistent with the hypothesis that genes that are disrupted in Mendelian disorders are dysregulated by non-coding variants in complex traits and demonstrate how leveraging findings from related Mendelian disorders and functional genomic datasets can prioritize genes that are putatively dysregulated by local and distal non-coding GWAS variants

Carolina Digital Repository

Public Library of Science (PLOS)

Enhanced Statistical Tests for GWAS in Admixed Populations: Assessment using African Americans from CARe and a Breast Cancer Consortium

Author: A Adeyemo
AL Price
AL Price
AL Price
Alkes L. Price
Arti Tandon
B Devlin
B Newman
B Pasaniuc
B Pasaniuc
B Pasaniuc
BL Browning
BN Howie
Bogdan Pasaniuc
Brian E. Henderson
Cameron D. Palmer
CB Ambrosone
Christine B. Ambrosone
Christopher A. Haiman
CY Cheng
D Altshuler
D Reich
D Reich
David Reich
David S. Siscovick
DB Hancock
DJ Hunter
DM Altshuler
DW Jones
E Zeggini
EL Harris
Elisa V. Bandera
EM John
EM John
EM John
Emma Larkin
Ermeg L. Akylbekova
Esther M. John
G Lettre
Gary K. Chen
George J. Papanicolaou
Guillaume Lettre
Ingo Ruczinski
J Marchini
J Marchini
James G. Wilson
Jasmin Divers
Jennifer J. Hu
JK Pritchard
Joe Mychaleckyj
Joel N. Hirschhorn
Jorge L. Rodriguez-Gil
JR Palmer
K Wang
KA Frazer
KH Kjerulff
L Huang
L. Adrienne Cupples
Leslie A. Lange
Leslie Bernstein
LN Kolonel
Lynette Ekunwe
M Molokhia
M Slatkin
M Stephens
MA Nalls
MG Hayes
MI McCarthy
Michael F. Press
Mingyao Li
ML Freedman
MS Udler
MW Smith
MW Smith
Myriam Fornage
N Patterson
N Patterson
N Risch
N Zaitlen
N Zaitlen
NA Rosenberg
Nicholas J. Schork
Nick Patterson
Noah Zaitlen
PA Marchbanks
PC Prorok
Qiong Yang
R Cooper
RC Deo
Regina G. Ziegler
RF Gillum
Robert C. Millikan
Sandra L. Deming
Sarah Buxbaum
Sarah J. Nyante
Simon Myers
SJ Freedland
Solomon K. Musani
Stephen J. Chanock
Sue A. Ingles
TM Teslovich
TR Rebbeck
TR Smith
W Zheng
W. H. Linda Kao
Wei Zheng
WH Kao
X Zhu
Xiaofeng Zhu
Y Guan
Y Li
Publication venue: Public Library of Science
Publication date: 01/01/2011
Field of study

While genome-wide association studies (GWAS) have primarily examined populations of European ancestry, more recent studies often involve additional populations, including admixed populations such as African Americans and Latinos. In admixed populations, linkage disequilibrium (LD) exists both at a fine scale in ancestral populations and at a coarse scale (admixture-LD) due to chromosomal segments of distinct ancestry. Disease association statistics in admixed populations have previously considered SNP association (LD mapping) or admixture association (mapping by admixture-LD), but not both. Here, we introduce a new statistical framework for combining SNP and admixture association in case-control studies, as well as methods for local ancestry-aware imputation. We illustrate the gain in statistical power achieved by these methods by analyzing data of 6,209 unrelated African Americans from the CARe project genotyped on the Affymetrix 6.0 chip, in conjunction with both simulated and real phenotypes, as well as by analyzing the FGFR2 locus using breast cancer GWAS data from 5,761 African-American women. We show that, at typed SNPs, our method yields an 8% increase in statistical power for finding disease risk loci compared to the power achieved by standard methods in case-control studies. At imputed SNPs, we observe an 11% increase in statistical power for mapping disease loci when our local ancestry-aware imputation framework and the new scoring statistic are jointly employed. Finally, we show that our method increases statistical power in regions harboring the causal SNP in the case when the causal SNP is untyped and cannot be imputed. Our methods and our publicly available software are broadly applicable to GWAS in admixed populations

Directory of Open Access Journals

University of Miami: Scholarship Miami

Oxford University Research Archive

Massively parallel reporter assays of melanoma risk variants identify MX2 as a gene promoting melanoma

Author: Ablain J
Barrett JH
Brossard M
Brown KM
Chanock SJ
Chari R
Choi J
Colli LM
Demenais F
Funderburk KM
Gräwe C
Hennessey RC
Hoggart CJ
Iles MM
Kovacs MA
Law MH
Makowski MM
Pasaniuc B
Rothschild H
Taylor J
Vermeulen M
Vu A
Xu M
Yin J
Yu K
Zhang T
Zon LI
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/06/2020
Field of study

Genome-wide association studies (GWAS) have identified ~20 melanoma susceptibility loci, most of which are not functionally characterized. Here we report an approach integrating massively-parallel reporter assays (MPRA) with cell-type-specific epigenome and expression quantitative trait loci (eQTL) to identify susceptibility genes/variants from multiple GWAS loci. From 832 high-LD variants, we identify 39 candidate functional variants from 14 loci displaying allelic transcriptional activity, a subset of which corroborates four colocalizing melanocyte cis-eQTL genes. Among these, we further characterize the locus encompassing the HIV-1 restriction gene, MX2 (Chr21q22.3), and validate a functional intronic variant, rs398206. rs398206 mediates the binding of the transcription factor, YY1, to increase MX2 levels, consistent with the cis-eQTL of MX2 in primary human melanocytes. Melanocyte-specific expression of human MX2 in a zebrafish model demonstrates accelerated melanoma formation in a BRAFV600E background. Our integrative approach streamlines GWAS follow-up studies and highlights a pleiotropic function of MX2 in melanoma susceptibility

HAL-Inserm

White Rose Research Online

Improved imputation accuracy of rare and low-frequency variants using population-specific high-coverage WGS-based imputation reference panel

Genetic imputation is a cost-efficient way to improve the power and resolution of genome-wide association (GWA) studies. Current publicly accessible imputation reference panels accurately predict genotypes for common variants with minor allele frequency (MAF) >= 5% and low-frequency variants (0.5Peer reviewe

University of Liverpool Repository

DSpace@MIT