Article thumbnail

PRS-on-Spark (PRSoS): a novel, efficient and flexible approach for generating polygenic risk scores

By Lawrence M. Chen, Nelson Yao, Elika Garg, Yuecai Zhu, Thao T. T. Nguyen, Irina Pokhvisneva, Shantala A. Hari Dass, Eva Unternaehrer, Hélène Gaudreau, Marie Forest, Lisa M. McEwen, Julia L. MacIsaac, Michael S. Kobor, Celia M. T. Greenwood, Patricia P. Silveira, Michael J. Meaney and Kieran J. O’Donnell


Abstract Background Polygenic risk scores (PRS) describe the genomic contribution to complex phenotypes and consistently account for a larger proportion of variance in outcome than single nucleotide polymorphisms (SNPs) alone. However, there is little consensus on the optimal data input for generating PRS, and existing approaches largely preclude the use of imputed posterior probabilities and strand-ambiguous SNPs i.e., A/T or C/G polymorphisms. Our ability to predict complex traits that arise from the additive effects of a large number of SNPs would likely benefit from a more inclusive approach. Results We developed PRS-on-Spark (PRSoS), a software implemented in Apache Spark and Python that accommodates different data inputs and strand-ambiguous SNPs to calculate PRS. We compared performance between PRSoS and an existing software (PRSice v1.25) for generating PRS for major depressive disorder using a community cohort (N = 264). We found PRSoS to perform faster than PRSice v1.25 when PRS were generated for a large number of SNPs (~ 17 million SNPs; t = 42.865, p = 5.43E-04). We also show that the use of imputed posterior probabilities and the inclusion of strand-ambiguous SNPs increase the proportion of variance explained by a PRS for major depressive disorder (from 4.3% to 4.8%). Conclusions PRSoS provides the user with the ability to generate PRS using an inclusive and efficient approach that considers a larger number of SNPs than conventional approaches. We show that a PRS for major depressive disorder that includes strand-ambiguous SNPs, calculated using PRSoS, accounts for the largest proportion of variance in symptoms of depression in a community cohort, demonstrating the utility of this approach. The availability of this software will help users develop more informative PRS for a variety of complex phenotypes

Topics: PRS-on-spark, PRSoS, Polygenic risk score, Genetic profile score, Multi-core processing, Bioinformatics, Major depressive disorder, Computer applications to medicine. Medical informatics, R858-859.7, Biology (General), QH301-705.5
Publisher: BMC
Year: 2018
DOI identifier: 10.1186/s12859-018-2289-9
OAI identifier:
Download PDF:
Sorry, we are unable to provide the full text but you may find it at the following location(s):
  • (external link)
  • (external link)
  • (external link)
  • Suggested articles

    To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.