Interactive (statistical) visualisation and exploration of a billion
  objects with Vaex

Breddels, Maarten A.

research

Interactive (statistical) visualisation and exploration of a billion objects with Vaex

Authors: Maarten A. Breddels
Publication date: 1 October 2016
Publisher: 'Cambridge University Press (CUP)'
Doi

Abstract

With new catalogues arriving such as the Gaia DR1, containing more than a billion objects, new methods of handling and visualizing these data volumes are needed. In visualization, one problem is that the number of datapoints can become so large, that a scatter plot becomes cluttered. Another problem is that with over a billion objects, only a few cpu cycles are available per object if one wants to process them within a second, making traditional methods by rendering glyphs not viable. Instead, we show that by calculating statistics on a regular (N-dimensional) grid, visualizations of a billion objects can be done within a second on a modern desktop computer. This is achieved using memory mapping of hdf5 files together with a simple binning algorithm, which are part of a Python library called vaex. This enables efficient exploration or large datasets interactively, making science exploration of large catalogues feasible. Vaex is a Python library, which also integrates well in the Jupyter/Numpy/Astropy/matplotlib stack. Build on top of this is the vaex application, which allows for interactive exploration and visualization. The motivation for developing vaex is the catalogue of the Gaia satellite, however, vaex can also be used on SPH or N-body simulations, any other (future) catalogues such as SDSS, Pan-STARRS, LSST, WISE, 2MASS, etc. or other tabular data. The homepage for vaex is http://vaex.astro.rug.nl.Comment: 6 pages, 4 figures, conference proceeding for the IAU symposium 325 on Astroinformatics (accepted), webpage http://vaex.astro.rug.n