Lightning talk: PyPop - a software pipeline for large-scale multilocus population genomics

Abstract

PyPop (Python for Population Genomics) is an open-source framework for performing large-scale population genetic analyses on multilocus genotype and allele frequency data. It computes tests and measures of Hardy-Weinberg equilibrium (locus-level and individual genotype-level), linkage disequilibrum, and selection, and estimates multi-locus haplotypes. PyPop supplements and extends existing population genetic software incorporating them as modules, modified to accommodate highly polymorphic data, rather than reimplementing them from scratch. It facilitates evolutionary analyses by integrating population genetic statistics within and across populations.

Originally developed to analyze the highly polymorphic genetic data of the human leukocyte antigen region of the human genome, PyPop has applicability to any kind of multilocus genetic data. It was the primary platform for evolutionary analysis of data collected for a major NIH-funded collaborative grant that included over 30 laboratories and 200 populations (Lancaster et al., 2007a,b). PyPop has also been successfully used in studies by our group, with collaborators, and in publications by many independent research teams in over 70 peer reviewed papers.

PyPop deploys a standard Extensible Markup Language (XML) output format and integrates the results of multiple analyses on various populations that were performed at different times into a common output format that can be read into a spreadsheet. The XML output format allows PyPop to be embedded as part of larger analysis pipelines. It also features an Application Programming Interface (API) allowing functionality to be incorporated into other programs. This lightning talk will focus on recent features of PyPop which include the prefiltering of the input genotype data and the ability to translate arbitrary allele names into full amino acid or nucleotide sequences.

All code is made available under the terms of the GNU General Public License (GNU GPL):

Homepage: http://www.pypop.org/

References:

Lancaster, A. K., M. P. Nelson, R. M. Single, D. Meyer, and G. Thomson, 2007a Software framework for the Biostatistics Core of the International Histocompatibility Working Group. In J. A. Hansen, editor, Immunobiology of the Human MHC: Proceedings of the 13th International Histocompatibility Workshop and Conference, volume I. Seattle, WA: IHWG Press, 510-517.

Lancaster, A. K., R. M. Single, O. D. Solberg, M. P. Nelson, and G. Thomson, 2007b PyPop update–a software pipeline for large-scale multilocus population genomics. Tissue Antigens 69 Suppl 1:192-7

    Similar works

    Full text

    thumbnail-image

    Available Versions