27,833 research outputs found

    Vaex: Big Data exploration in the era of Gaia

    Get PDF
    We present a new Python library called vaex, to handle extremely large tabular datasets, such as astronomical catalogues like the Gaia catalogue, N-body simulations or any other regular datasets which can be structured in rows and columns. Fast computations of statistics on regular N-dimensional grids allows analysis and visualization in the order of a billion rows per second. We use streaming algorithms, memory mapped files and a zero memory copy policy to allow exploration of datasets larger than memory, e.g. out-of-core algorithms. Vaex allows arbitrary (mathematical) transformations using normal Python expressions and (a subset of) numpy functions which are lazily evaluated and computed when needed in small chunks, which avoids wasting of RAM. Boolean expressions (which are also lazily evaluated) can be used to explore subsets of the data, which we call selections. Vaex uses a similar DataFrame API as Pandas, a very popular library, which helps migration from Pandas. Visualization is one of the key points of vaex, and is done using binned statistics in 1d (e.g. histogram), in 2d (e.g. 2d histograms with colormapping) and 3d (using volume rendering). Vaex is split in in several packages: vaex-core for the computational part, vaex-viz for visualization mostly based on matplotlib, vaex-jupyter for visualization in the Jupyter notebook/lab based in IPyWidgets, vaex-server for the (optional) client-server communication, vaex-ui for the Qt based interface, vaex-hdf5 for hdf5 based memory mapped storage, vaex-astro for astronomy related selections, transformations and memory mapped (column based) fits storage. Vaex is open source and available under MIT license on github, documentation and other information can be found on the main website: https://vaex.io, https://docs.vaex.io or https://github.com/maartenbreddels/vaexComment: 14 pages, 8 figures, Submitted to A&A, interactive version of Fig 4: https://vaex.io/paper/fig

    A model for digital preservation repository risk relationships

    Get PDF
    The paper introduces the Preserved Object and Repository Risk Ontology (PORRO), a model that relates preservation functionality with associated risks and opportunities for their mitigation. Building on work undertaken in a range of EU and UK funded research projects (including the Digital Curation Centre , DigitalPreservationEurope and DELOS ), this ontology illustrates relationships between fundamental digital library goals and their parameters; associated rights and responsibilities; practical activities and resources involved in their accomplishment; and risks facing digital libraries and their collections. Its purpose is to facilitate a comprehensive understanding of risk causality and to illustrate opportunities for mitigation and avoidance. The ontology reflects evidence accumulated from a series of institutional audits and evaluations, including a specific subset of digital libraries in the DELOS project which led to the definition of a digital library preservation risk profile. Its applicability is intended to be widespread, and its coverage expected to evolve to reflect developments within the community. Attendees will gain an understanding of the model and learn how they can utilize this online resource to inform their own risk management activities

    Discrete optimization algorithms for marker-assisted plant breeding

    Get PDF

    Human Resource Practices, Knowledge-Creation Capability And Performance In High Technology Firms

    Get PDF
    This study examines the relationship among key HR practices (i.e., effective acquisition, employee-development, commitment-building, and networking practices), three dimensions of knowledge-creation capability (human capital, employee motivation, and information combination and exchange), and firm performance. Results from a sample of 78 high technology firms showed that the three dimensions of knowledge creation interact to positively affect sales growth. Further, the HR practices were found to affect sales growth through their affect on the dimensions of knowledge-creation capability

    A Computational Field Framework for Collaborative Task Execution in Volunteer Clouds

    Get PDF
    The increasing diffusion of cloud technologies is opening new opportunities for distributed and collaborative computing. Volunteer clouds are a prominent example, where participants join and leave the platform and collaborate by sharing their computational resources. The high dynamism and unpredictability of such scenarios call for decentralized self-* approaches to guarantee QoS. We present a simulation framework for collaborative task execution in volunteer clouds and propose one concrete instance based on Ant Colony Optimization, which is validated through a set of simulation experiments based on Google workload data

    Bayesian Exponential Random Graph Models with Nodal Random Effects

    Get PDF
    We extend the well-known and widely used Exponential Random Graph Model (ERGM) by including nodal random effects to compensate for heterogeneity in the nodes of a network. The Bayesian framework for ERGMs proposed by Caimo and Friel (2011) yields the basis of our modelling algorithm. A central question in network models is the question of model selection and following the Bayesian paradigm we focus on estimating Bayes factors. To do so we develop an approximate but feasible calculation of the Bayes factor which allows one to pursue model selection. Two data examples and a small simulation study illustrate our mixed model approach and the corresponding model selection.Comment: 23 pages, 9 figures, 3 table
    corecore