Imputation using external reference panels is a widely used approach for
increasing power in GWAS and meta-analysis. Existing HMM-based imputation
approaches require individual-level genotypes. Here, we develop a new method
for Gaussian imputation from summary association statistics, a type of data
that is becoming widely available. In simulations using 1000 Genomes (1000G)
data, this method recovers 84% (54%) of the effective sample size for common
(>5%) and low-frequency (1-5%) variants (increasing to 87% (60%) when summary
LD information is available from target samples) versus 89% (67%) for HMM-based
imputation, which cannot be applied to summary statistics. Our approach
accounts for the limited sample size of the reference panel, a crucial step to
eliminate false-positive associations, and is computationally very fast. As an
empirical demonstration, we apply our method to 7 case-control phenotypes from
the WTCCC data and a study of height in the British 1958 birth cohort (1958BC).
Gaussian imputation from summary statistics recovers 95% (105%) of the
effective sample size (as quantified by the ratio of χ2 association
statistics) compared to HMM-based imputation from individual-level genotypes at
the 227 (176) published SNPs in the WTCCC (1958BC height) data. In addition,
for publicly available summary statistics from large meta-analyses of 4 lipid
traits, we publicly release imputed summary statistics at 1000G SNPs, which
could not have been obtained using previously published methods, and
demonstrate their accuracy by masking subsets of the data. We show that 1000G
imputation using our approach increases the magnitude and statistical evidence
of enrichment at genic vs. non-genic loci for these traits, as compared to an
analysis without 1000G imputation. Thus, imputation of summary statistics will
be a valuable tool in future functional enrichment analyses.Comment: 32 pages, 4 figure