Search CORE

27,833 research outputs found

Vaex: Big Data exploration in the era of Gaia

Author: Breddels Maarten A.
Veljanoski Jovan
Publication venue: 'EDP Sciences'
Publication date: 08/01/2018
Field of study

We present a new Python library called vaex, to handle extremely large tabular datasets, such as astronomical catalogues like the Gaia catalogue, N-body simulations or any other regular datasets which can be structured in rows and columns. Fast computations of statistics on regular N-dimensional grids allows analysis and visualization in the order of a billion rows per second. We use streaming algorithms, memory mapped files and a zero memory copy policy to allow exploration of datasets larger than memory, e.g. out-of-core algorithms. Vaex allows arbitrary (mathematical) transformations using normal Python expressions and (a subset of) numpy functions which are lazily evaluated and computed when needed in small chunks, which avoids wasting of RAM. Boolean expressions (which are also lazily evaluated) can be used to explore subsets of the data, which we call selections. Vaex uses a similar DataFrame API as Pandas, a very popular library, which helps migration from Pandas. Visualization is one of the key points of vaex, and is done using binned statistics in 1d (e.g. histogram), in 2d (e.g. 2d histograms with colormapping) and 3d (using volume rendering). Vaex is split in in several packages: vaex-core for the computational part, vaex-viz for visualization mostly based on matplotlib, vaex-jupyter for visualization in the Jupyter notebook/lab based in IPyWidgets, vaex-server for the (optional) client-server communication, vaex-ui for the Qt based interface, vaex-hdf5 for hdf5 based memory mapped storage, vaex-astro for astronomy related selections, transformations and memory mapped (column based) fits storage. Vaex is open source and available under MIT license on github, documentation and other information can be found on the main website: https://vaex.io, https://docs.vaex.io or https://github.com/maartenbreddels/vaexComment: 14 pages, 8 figures, Submitted to A&A, interactive version of Fig 4: https://vaex.io/paper/fig

arXiv.org e-Print Archive

Proceedings - University of Groningen

University of Groningen

EDP Sciences OAI-PMH repository (1.2.0)

ARTS repository - University of Groningen

Dissertations of the University of Groningen

A model for digital preservation repository risk relationships

Author: McHugh A.
Publication venue
Publication date: 01/01/2012
Field of study

The paper introduces the Preserved Object and Repository Risk Ontology (PORRO), a model that relates preservation functionality with associated risks and opportunities for their mitigation. Building on work undertaken in a range of EU and UK funded research projects (including the Digital Curation Centre , DigitalPreservationEurope and DELOS ), this ontology illustrates relationships between fundamental digital library goals and their parameters; associated rights and responsibilities; practical activities and resources involved in their accomplishment; and risks facing digital libraries and their collections. Its purpose is to facilitate a comprehensive understanding of risk causality and to illustrate opportunities for mitigation and avoidance. The ontology reflects evidence accumulated from a series of institutional audits and evaluations, including a specific subset of digital libraries in the DELOS project which led to the definition of a digital library preservation risk profile. Its applicability is intended to be widespread, and its coverage expected to evolve to reflect developments within the community. Attendees will gain an understanding of the model and learn how they can utilize this online resource to inform their own risk management activities

Enlighten

Discrete optimization algorithms for marker-assisted plant breeding

Author: De Beukelaer Herman
Publication venue: Ghent University. Faculty of Sciences
Publication date: 01/01/2017
Field of study

Ghent University Academic Bibliography

Human Resource Practices, Knowledge-Creation Capability And Performance In High Technology Firms

Author: Collins Christopher J.
Smith Ken G.
Stevens Cynthia Kay
Publication venue: DigitalCommons@ILR
Publication date: 13/01/2001
Field of study

This study examines the relationship among key HR practices (i.e., effective acquisition, employee-development, commitment-building, and networking practices), three dimensions of knowledge-creation capability (human capital, employee motivation, and information combination and exchange), and firm performance. Results from a sample of 78 high technology firms showed that the three dimensions of knowledge creation interact to positively affect sales growth. Further, the HR practices were found to affect sales growth through their affect on the dimensions of knowledge-creation capability

DigitalCommons@ILR

eCommons@Cornell

A Computational Field Framework for Collaborative Task Execution in Volunteer Clouds

Author: Amoretti Michele
Lluch-Lafuente Alberto
Sebastio Stefano
Publication venue
Publication date: 26/07/2013
Field of study

The increasing diffusion of cloud technologies is opening new opportunities for distributed and collaborative computing. Volunteer clouds are a prominent example, where participants join and leave the platform and collaborate by sharing their computational resources. The high dynamism and unpredictability of such scenarios call for decentralized self-* approaches to guarantee QoS. We present a simulation framework for collaborative task execution in volunteer clouds and propose one concrete instance based on Ant Colony Optimization, which is validated through a set of simulation experiments based on Google workload data

IMT Institutional Repository

Bayesian Exponential Random Graph Models with Nodal Random Effects

Author: A. Caimo
Caimo
Caimo
Caimo
Chatterjee
Everitt
Fienberg
Frank
G. Kauermann
Gelman
Geyer
Gill
Goldenberg
Holland
Hunter
Hunter
Hunter
Hunter
Kapferer
Kass
Kolaczyk
Krivitsky
Lusher
Milgram
Murray
N. Friel
R Core Team
Robins
Robins
Robins
Robins
S. Thiemichen
Salter-Townshend
Schweinberger
Severini
Snijders
Spiegelhalter
Strauss
Thurner
van Duijn
Varin
Watts
Zachary
Zijlstra
Publication venue
Publication date: 12/01/2015
Field of study

We extend the well-known and widely used Exponential Random Graph Model (ERGM) by including nodal random effects to compensate for heterogeneity in the nodes of a network. The Bayesian framework for ERGMs proposed by Caimo and Friel (2011) yields the basis of our modelling algorithm. A central question in network models is the question of model selection and following the Bayesian paradigm we focus on estimating Bayes factors. To do so we develop an approximate but feasible calculation of the Bayes factor which allows one to pursue model selection. Two data examples and a small simulation study illustrate our mixed model approach and the corresponding model selection.Comment: 23 pages, 9 figures, 3 table

arXiv.org e-Print Archive

Crossref

Research Repository UCD

Arrow@TUDublin

Irish Universities