31 research outputs found

    Interactive (statistical) visualisation and exploration of a billion objects with Vaex

    Get PDF
    With new catalogues arriving such as the Gaia DR1, containing more than a billion objects, new methods of handling and visualizing these data volumes are needed. In visualization, one problem is that the number of datapoints can become so large, that a scatter plot becomes cluttered. Another problem is that with over a billion objects, only a few cpu cycles are available per object if one wants to process them within a second, making traditional methods by rendering glyphs not viable. Instead, we show that by calculating statistics on a regular (N-dimensional) grid, visualizations of a billion objects can be done within a second on a modern desktop computer. This is achieved using memory mapping of hdf5 files together with a simple binning algorithm, which are part of a Python library called vaex. This enables efficient exploration or large datasets interactively, making science exploration of large catalogues feasible. Vaex is a Python library, which also integrates well in the Jupyter/Numpy/Astropy/matplotlib stack. Build on top of this is the vaex application, which allows for interactive exploration and visualization. The motivation for developing vaex is the catalogue of the Gaia satellite, however, vaex can also be used on SPH or N-body simulations, any other (future) catalogues such as SDSS, Pan-STARRS, LSST, WISE, 2MASS, etc. or other tabular data. The homepage for vaex is http://vaex.astro.rug.nl.Comment: 6 pages, 4 figures, conference proceeding for the IAU symposium 325 on Astroinformatics (accepted), webpage http://vaex.astro.rug.n

    Vaex: Big Data exploration in the era of Gaia

    Get PDF
    We present a new Python library called vaex, to handle extremely large tabular datasets, such as astronomical catalogues like the Gaia catalogue, N-body simulations or any other regular datasets which can be structured in rows and columns. Fast computations of statistics on regular N-dimensional grids allows analysis and visualization in the order of a billion rows per second. We use streaming algorithms, memory mapped files and a zero memory copy policy to allow exploration of datasets larger than memory, e.g. out-of-core algorithms. Vaex allows arbitrary (mathematical) transformations using normal Python expressions and (a subset of) numpy functions which are lazily evaluated and computed when needed in small chunks, which avoids wasting of RAM. Boolean expressions (which are also lazily evaluated) can be used to explore subsets of the data, which we call selections. Vaex uses a similar DataFrame API as Pandas, a very popular library, which helps migration from Pandas. Visualization is one of the key points of vaex, and is done using binned statistics in 1d (e.g. histogram), in 2d (e.g. 2d histograms with colormapping) and 3d (using volume rendering). Vaex is split in in several packages: vaex-core for the computational part, vaex-viz for visualization mostly based on matplotlib, vaex-jupyter for visualization in the Jupyter notebook/lab based in IPyWidgets, vaex-server for the (optional) client-server communication, vaex-ui for the Qt based interface, vaex-hdf5 for hdf5 based memory mapped storage, vaex-astro for astronomy related selections, transformations and memory mapped (column based) fits storage. Vaex is open source and available under MIT license on github, documentation and other information can be found on the main website: https://vaex.io, https://docs.vaex.io or https://github.com/maartenbreddels/vaexComment: 14 pages, 8 figures, Submitted to A&A, interactive version of Fig 4: https://vaex.io/paper/fig

    Complexity on dwarf galaxies scale: A bimodal distribution function in Sculptor

    Get PDF
    In previous work we have presented Schwarzschild models of the Sculptor dSph, demonstrating that this system could be embedded in dark matter halos that are either cusped or cored. Here we show that the non-parametric distribution function recovered through Schwarschild's method is bimodal in energy and angular momentum space for all best fitting mass models explored. We demonstrate that this bimodality is directly related to the two components known to be present in Sculptor through stellar populations analysis, although our method is purely dynamical in nature and does not use this prior information. It therefore constitutes independent confirmation of the existence of two physically distinct dynamical components in Sculptor and suggests a rather complex assembly history for this dwarf galaxy.Comment: 4 pages, 4 figures, 1 table, accepted to ApJ Letter

    What can Gaia proper motions tell us about Milky Way dwarf galaxies?

    Full text link
    We present a proper-motion study on models of the dwarf spheroidal galaxy Sculptor, based on the predicted proper-motion accuracy of Gaia measurements. Gaia will measure proper motions of several hundreds of stars for a Sculptor-like system. Even with an uncertainty on the proper motion of order 1.5 times the size of an individual proper-motion value of ~10 mas/century, we find that it is possible to recover Sculptor's systemic proper motion at its distance of 79 kpc.Comment: 5 pages, 1 figure; to appear in the proceedings of the GREAT-ITN conference on "The Milky Way Unravelled by Gaia", Barcelona (Dec 1-5 2014

    Matching the dark matter profiles of dSph galaxies with those of simulated satellites: a two parameter comparison

    Get PDF
    We compare the dark matter halos' structural parameters derived for four Milky Way dwarf spheroidal galaxies to those of subhalos found in cosmological NN-body simulations. We confirm that estimates of the mass at a single fixed radius are fully consistent with the observations. However, when a second structural parameter such as the logarithmic slope of the dark halo density profile measured close to the half-light radius is included in the comparison, we find little to no overlap between the satellites and the subhalos. Typically the right mass subhalos have steeper profiles at these radii than measurements of the dSph suggest. Using energy arguments we explore if it is possible to solve this discrepancy by invoking baryonic effects. Assuming that feedback from supernovae can lead to a reshaping of the halos, we compute the required efficiency and find entirely plausible values for a significant fraction of the subhalos and even as low as 0.1%. This implies that care must be taken not to exaggerate the effect of supernovae feedback as this could make the halos too shallow. These results could be used to calibrate and possibly constrain feedback recipes in hydrodynamical simulations.Comment: 6 pages, 3 figures, submitted to ApJ

    A box full of chocolates: The rich structure of the nearby stellar halo revealed by Gaia and RAVE

    Get PDF
    The hierarchical structure formation model predicts that stellar halos should form, at least partly, via mergers. If this was a predominant formation channel for the Milky Way's halo, imprints of this merger history in the form of moving groups or streams should exist also in the vicinity of the Sun. Here we study the kinematics of halo stars in the Solar neighbourhood using the very recent first data release from the Gaia mission, and in particular the TGAS dataset, in combination with data from the RAVE survey. Our aim is to determine the amount of substructure present in the phase-space distribution of halo stars that could be linked to merger debris. To characterise kinematic substructure, we measure the velocity correlation function in our sample of halo (low metallicity) stars. We also study the distribution of these stars in the space of energy and two components of the angular momentum, in what we call "Integrals of Motion" space. The velocity correlation function reveals substructure in the form of an excess of pairs of stars with similar velocities, well above that expected for a smooth distribution. Comparison to cosmological simulations of the formation of stellar halos indicate that the levels found are consistent with the Galactic halo having been built fully via accretion. Similarly, the distribution of stars in the space of "Integrals of motion" is highly complex. A strikingly high fraction (between 58% and upto 73%) of the stars that are somewhat less bound than the Sun are on (highly) retrograde orbits. A simple comparison to Milky Way-mass galaxies in cosmological hydrodynamical simulations suggests that less than 1% have such prominently retrograde outer halos. We also identify several other statistically significant structures in "Integrals of Motion" space that could potentially be related to merger events.Comment: 19 pages, 16 figures. A&A in pres

    Leaves on trees: identifying halo stars with extreme gradient boosted trees

    Full text link
    Extended stellar haloes are a natural by-product of the hierarchical formation of massive galaxies. If merging is a non-negligible factor in the growth of our Galaxy, evidence of such events should be encoded in its stellar halo. Reliable identification of genuine halo stars is a challenging task however. The 1st Gaia data release contains the positions, parallaxes and proper motions for over 2 million stars, mostly in the Solar neighbourhood. Gaia DR2 will enlarge this sample to over 1.5 billion stars, the brightest ~5 million of which will have a full phase-space information. Our aim is to develop a machine learning model to reliably identify halo stars, even when their full phase-space information is not available. We use the Gradient Boosted Trees algorithm to build a supervised halo star classifier. The classifier is trained on a sample extracted from the Gaia Universe Model Snapshot, convolved with the errors of TGAS, as well as with the expected uncertainties of the upcoming Gaia DR2. We also trained our classifier on the cross-match between the TGAS and RAVE catalogues, where the halo stars are labelled in an entirely model independent way. We then use this model to identify halo stars in TGAS. When full phase- space information is available and for Gaia DR2-like uncertainties, our classifier is able to recover 90% of the halo stars with at most 30% distance errors, in a completely unseen test set, and with negligible levels of contamination. When line-of-sight velocity is not available, we recover ~60% of such halo stars, with less than 10% contamination. When applied to the TGAS data, our classifier detects 337 high confidence RGB halo stars. Although small, this number is consistent with the expectation from models given the data uncertainties. The large parallax errors are the biggest limitation to identify a larger number of halo stars in all the cases studied.Comment: Accepted for publication in Astronomy & Astrophysics. 13 pages, 9 figure, 2 table
    corecore