33 research outputs found
Interactive (statistical) visualisation and exploration of a billion objects with Vaex
With new catalogues arriving such as the Gaia DR1, containing more than a
billion objects, new methods of handling and visualizing these data volumes are
needed. In visualization, one problem is that the number of datapoints can
become so large, that a scatter plot becomes cluttered. Another problem is that
with over a billion objects, only a few cpu cycles are available per object if
one wants to process them within a second, making traditional methods by
rendering glyphs not viable. Instead, we show that by calculating statistics on
a regular (N-dimensional) grid, visualizations of a billion objects can be done
within a second on a modern desktop computer. This is achieved using memory
mapping of hdf5 files together with a simple binning algorithm, which are part
of a Python library called vaex. This enables efficient exploration or large
datasets interactively, making science exploration of large catalogues
feasible. Vaex is a Python library, which also integrates well in the
Jupyter/Numpy/Astropy/matplotlib stack. Build on top of this is the vaex
application, which allows for interactive exploration and visualization. The
motivation for developing vaex is the catalogue of the Gaia satellite, however,
vaex can also be used on SPH or N-body simulations, any other (future)
catalogues such as SDSS, Pan-STARRS, LSST, WISE, 2MASS, etc. or other tabular
data. The homepage for vaex is http://vaex.astro.rug.nl.Comment: 6 pages, 4 figures, conference proceeding for the IAU symposium 325
on Astroinformatics (accepted), webpage http://vaex.astro.rug.n
Vaex: Big Data exploration in the era of Gaia
We present a new Python library called vaex, to handle extremely large
tabular datasets, such as astronomical catalogues like the Gaia catalogue,
N-body simulations or any other regular datasets which can be structured in
rows and columns. Fast computations of statistics on regular N-dimensional
grids allows analysis and visualization in the order of a billion rows per
second. We use streaming algorithms, memory mapped files and a zero memory copy
policy to allow exploration of datasets larger than memory, e.g. out-of-core
algorithms. Vaex allows arbitrary (mathematical) transformations using normal
Python expressions and (a subset of) numpy functions which are lazily evaluated
and computed when needed in small chunks, which avoids wasting of RAM. Boolean
expressions (which are also lazily evaluated) can be used to explore subsets of
the data, which we call selections. Vaex uses a similar DataFrame API as
Pandas, a very popular library, which helps migration from Pandas.
Visualization is one of the key points of vaex, and is done using binned
statistics in 1d (e.g. histogram), in 2d (e.g. 2d histograms with colormapping)
and 3d (using volume rendering). Vaex is split in in several packages:
vaex-core for the computational part, vaex-viz for visualization mostly based
on matplotlib, vaex-jupyter for visualization in the Jupyter notebook/lab based
in IPyWidgets, vaex-server for the (optional) client-server communication,
vaex-ui for the Qt based interface, vaex-hdf5 for hdf5 based memory mapped
storage, vaex-astro for astronomy related selections, transformations and
memory mapped (column based) fits storage. Vaex is open source and available
under MIT license on github, documentation and other information can be found
on the main website: https://vaex.io, https://docs.vaex.io or
https://github.com/maartenbreddels/vaexComment: 14 pages, 8 figures, Submitted to A&A, interactive version of Fig 4:
https://vaex.io/paper/fig
Complexity on dwarf galaxies scale: A bimodal distribution function in Sculptor
In previous work we have presented Schwarzschild models of the Sculptor dSph,
demonstrating that this system could be embedded in dark matter halos that are
either cusped or cored. Here we show that the non-parametric distribution
function recovered through Schwarschild's method is bimodal in energy and
angular momentum space for all best fitting mass models explored. We
demonstrate that this bimodality is directly related to the two components
known to be present in Sculptor through stellar populations analysis, although
our method is purely dynamical in nature and does not use this prior
information. It therefore constitutes independent confirmation of the existence
of two physically distinct dynamical components in Sculptor and suggests a
rather complex assembly history for this dwarf galaxy.Comment: 4 pages, 4 figures, 1 table, accepted to ApJ Letter
What can Gaia proper motions tell us about Milky Way dwarf galaxies?
We present a proper-motion study on models of the dwarf spheroidal galaxy
Sculptor, based on the predicted proper-motion accuracy of Gaia measurements.
Gaia will measure proper motions of several hundreds of stars for a
Sculptor-like system. Even with an uncertainty on the proper motion of order
1.5 times the size of an individual proper-motion value of ~10 mas/century, we
find that it is possible to recover Sculptor's systemic proper motion at its
distance of 79 kpc.Comment: 5 pages, 1 figure; to appear in the proceedings of the GREAT-ITN
conference on "The Milky Way Unravelled by Gaia", Barcelona (Dec 1-5 2014
Matching the dark matter profiles of dSph galaxies with those of simulated satellites: a two parameter comparison
We compare the dark matter halos' structural parameters derived for four
Milky Way dwarf spheroidal galaxies to those of subhalos found in cosmological
-body simulations. We confirm that estimates of the mass at a single fixed
radius are fully consistent with the observations. However, when a second
structural parameter such as the logarithmic slope of the dark halo density
profile measured close to the half-light radius is included in the comparison,
we find little to no overlap between the satellites and the subhalos. Typically
the right mass subhalos have steeper profiles at these radii than measurements
of the dSph suggest. Using energy arguments we explore if it is possible to
solve this discrepancy by invoking baryonic effects. Assuming that feedback
from supernovae can lead to a reshaping of the halos, we compute the required
efficiency and find entirely plausible values for a significant fraction of the
subhalos and even as low as 0.1%. This implies that care must be taken not to
exaggerate the effect of supernovae feedback as this could make the halos too
shallow. These results could be used to calibrate and possibly constrain
feedback recipes in hydrodynamical simulations.Comment: 6 pages, 3 figures, submitted to ApJ
A box full of chocolates: The rich structure of the nearby stellar halo revealed by Gaia and RAVE
The hierarchical structure formation model predicts that stellar halos should
form, at least partly, via mergers. If this was a predominant formation channel
for the Milky Way's halo, imprints of this merger history in the form of moving
groups or streams should exist also in the vicinity of the Sun. Here we study
the kinematics of halo stars in the Solar neighbourhood using the very recent
first data release from the Gaia mission, and in particular the TGAS dataset,
in combination with data from the RAVE survey. Our aim is to determine the
amount of substructure present in the phase-space distribution of halo stars
that could be linked to merger debris. To characterise kinematic substructure,
we measure the velocity correlation function in our sample of halo (low
metallicity) stars. We also study the distribution of these stars in the space
of energy and two components of the angular momentum, in what we call
"Integrals of Motion" space. The velocity correlation function reveals
substructure in the form of an excess of pairs of stars with similar
velocities, well above that expected for a smooth distribution. Comparison to
cosmological simulations of the formation of stellar halos indicate that the
levels found are consistent with the Galactic halo having been built fully via
accretion. Similarly, the distribution of stars in the space of "Integrals of
motion" is highly complex. A strikingly high fraction (between 58% and upto
73%) of the stars that are somewhat less bound than the Sun are on (highly)
retrograde orbits. A simple comparison to Milky Way-mass galaxies in
cosmological hydrodynamical simulations suggests that less than 1% have such
prominently retrograde outer halos. We also identify several other
statistically significant structures in "Integrals of Motion" space that could
potentially be related to merger events.Comment: 19 pages, 16 figures. A&A in pres
Leaves on trees: identifying halo stars with extreme gradient boosted trees
Extended stellar haloes are a natural by-product of the hierarchical
formation of massive galaxies. If merging is a non-negligible factor in the
growth of our Galaxy, evidence of such events should be encoded in its stellar
halo. Reliable identification of genuine halo stars is a challenging task
however. The 1st Gaia data release contains the positions, parallaxes and
proper motions for over 2 million stars, mostly in the Solar neighbourhood.
Gaia DR2 will enlarge this sample to over 1.5 billion stars, the brightest ~5
million of which will have a full phase-space information. Our aim is to
develop a machine learning model to reliably identify halo stars, even when
their full phase-space information is not available. We use the Gradient
Boosted Trees algorithm to build a supervised halo star classifier. The
classifier is trained on a sample extracted from the Gaia Universe Model
Snapshot, convolved with the errors of TGAS, as well as with the expected
uncertainties of the upcoming Gaia DR2. We also trained our classifier on the
cross-match between the TGAS and RAVE catalogues, where the halo stars are
labelled in an entirely model independent way. We then use this model to
identify halo stars in TGAS. When full phase- space information is available
and for Gaia DR2-like uncertainties, our classifier is able to recover 90% of
the halo stars with at most 30% distance errors, in a completely unseen test
set, and with negligible levels of contamination. When line-of-sight velocity
is not available, we recover ~60% of such halo stars, with less than 10%
contamination. When applied to the TGAS data, our classifier detects 337 high
confidence RGB halo stars. Although small, this number is consistent with the
expectation from models given the data uncertainties. The large parallax errors
are the biggest limitation to identify a larger number of halo stars in all the
cases studied.Comment: Accepted for publication in Astronomy & Astrophysics. 13 pages, 9
figure, 2 table