Abstract

Acknowledgements: We thank Abhijit Chowdhury, Anamitra Barik, Rajesh Kumar Rai, the Birbhum Health and Demographic Surveillance System, the Parkinson Research Alliance of India (PRAI), Syed Qasim Mehdi (deceased), and Partha Majumder for providing samples and sample metadata. J.D.W., J.R., and D.S. were supported in part by NIH grant R01 HG010689. A.V.K. was supported in part by NIH grants 1K08HG010155 and 1U01HG011719. Sequence data collection was supported by NIH grant 5UM1HG008895 to S.K. and by Genentech Research. We are grateful to all of our colleagues for their support and discussions throughout the course of this work and to all of the participants in this study.The benefits of large-scale genetic studies for healthcare of the populations studied are well documented, but these genetic studies have traditionally ignored people from some parts of the world, such as South Asia. Here we describe whole genome sequence (WGS) data from 4806 individuals recruited from the healthcare delivery systems of Pakistan, India and Bangladesh, combined with WGS from 927 individuals from isolated South Asian populations. We characterize population structure in South Asia and describe a genotyping array (SARGAM) and imputation reference panel that are optimized for South Asian genomes. We find evidence for high rates of reproductive isolation, endogamy and consanguinity that vary across the subcontinent and that lead to levels of rare homozygotes that reach 100 times that seen in outbred populations. Founder effects increase the power to associate functional variants with disease processes and make South Asia a uniquely powerful place for population-scale genetic studies

    Similar works

    Full text

    thumbnail-image