We present nbodykit, an open-source, massively parallel Python toolkit for
analyzing large-scale structure (LSS) data. Using Python bindings of the
Message Passing Interface (MPI), we provide parallel implementations of many
commonly used algorithms in LSS. nbodykit is both an interactive and scalable
piece of scientific software, performing well in a supercomputing environment
while still taking advantage of the interactive tools provided by the Python
ecosystem. Existing functionality includes estimators of the power spectrum, 2
and 3-point correlation functions, a Friends-of-Friends grouping algorithm,
mock catalog creation via the halo occupation distribution technique, and
approximate N-body simulations via the FastPM scheme. The package also provides
a set of distributed data containers, insulated from the algorithms themselves,
that enable nbodykit to provide a unified treatment of both simulation and
observational data sets. nbodykit can be easily deployed in a high performance
computing environment, overcoming some of the traditional difficulties of using
Python on supercomputers. We provide performance benchmarks illustrating the
scalability of the software. The modular, component-based approach of nbodykit
allows researchers to easily build complex applications using its tools. The
package is extensively documented at http://nbodykit.readthedocs.io, which also
includes an interactive set of example recipes for new users to explore. As
open-source software, we hope nbodykit provides a common framework for the
community to use and develop in confronting the analysis challenges of future
LSS surveys.Comment: 18 pages, 7 figures. Feedback very welcome. Code available at
https://github.com/bccp/nbodykit and for documentation, see
http://nbodykit.readthedocs.i