Search CORE

1,862 research outputs found

ALOJA: A benchmarking and predictive platform for big data performance analysis

Author: Berral García Josep Lluís
Carrera Pérez David
Poggi Nicolas
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

The main goals of the ALOJA research project from BSC-MSR, are to explore and automate the characterization of cost-effectivenessof Big Data deployments. The development of the project over its first year, has resulted in a open source benchmarking platform, an online public repository of results with over 42,000 Hadoop job runs, and web-based analytic tools to gather insights about system's cost-performance1. This article describes the evolution of the project's focus and research lines from over a year of continuously benchmarking Hadoop under dif- ferent configuration and deployments options, presents results, and dis cusses the motivation both technical and market-based of such changes. During this time, ALOJA's target has evolved from a previous low-level profiling of Hadoop runtime, passing through extensive benchmarking and evaluation of a large body of results via aggregation, to currently leveraging Predictive Analytics (PA) techniques. Modeling benchmark executions allow us to estimate the results of new or untested configu- rations or hardware set-ups automatically, by learning techniques from past observations saving in benchmarking time and costs.This work is partially supported the BSC-Microsoft Research Centre, the Span- ish Ministry of Education (TIN2012-34557), the MINECO Severo Ochoa Research program (SEV-2011-0067) and the Generalitat de Catalunya (2014-SGR-1051).Peer ReviewedPostprint (author's final draft

Crossref

UPCommons. Portal del coneixement obert de la UPC

ALOJA: A framework for benchmarking and predictive analytics in Hadoop deployments

Author: Berral García Josep Lluís
Call Aaron
Carrera Pérez David
Green Daron
Poggi Mastrokalo Nicolas
Reinauer Rob
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2015
Field of study

This article presents the ALOJA project and its analytics tools, which leverages machine learning to interpret Big Data benchmark performance data and tuning. ALOJA is part of a long-term collaboration between BSC and Microsoft to automate the characterization of cost-effectiveness on Big Data deployments, currently focusing on Hadoop. Hadoop presents a complex run-time environment, where costs and performance depend on a large number of configuration choices. The ALOJA project has created an open, vendor-neutral repository, featuring over 40,000 Hadoop job executions and their performance details. The repository is accompanied by a test-bed and tools to deploy and evaluate the cost-effectiveness of different hardware configurations, parameters and Cloud services. Despite early success within ALOJA, a comprehensive study requires automation of modeling procedures to allow an analysis of large and resource-constrained search spaces. The predictive analytics extension, ALOJA-ML, provides an automated system allowing knowledge discovery by modeling environments from observed executions. The resulting models can forecast execution behaviors, predicting execution times for new configurations and hardware choices. That also enables model-based anomaly detection or efficient benchmark guidance by prioritizing executions. In addition, the community can benefit from ALOJA data-sets and framework to improve the design and deployment of Big Data applications.This project has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No 639595). This work is partially supported by the Ministry of Economy of Spain under contracts TIN2012-34557 and 2014SGR1051.Peer ReviewedPostprint (published version

arXiv.org e-Print Archive

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

A High-Fidelity Realization of the Euclid Code Comparison $N$ -body Simulation with Abacus

Author: Eisenstein Daniel J.
Garrison Lehman H.
Pinto Philip A.
Publication venue: 'Oxford University Press (OUP)'
Publication date: 06/03/2019
Field of study

We present a high-fidelity realization of the cosmological

N

-body simulation from the Schneider et al. (2016) code comparison project. The simulation was performed with our Abacus

N

-body code, which offers high force accuracy, high performance, and minimal particle integration errors. The simulation consists of

2048^3

particles in a

500\ h^{-1}\mathrm{Mpc}

box, for a particle mass of

1.2\times 10^9\ h^{-1}\mathrm{M}_\odot

with $10\ h^{-1}\mathrm{kpc}

spline softening. Abacus executed 1052 global time steps to

z=0

in 107 hours on one dual-Xeon, dual-GPU node, for a mean rate of 23 million particles per second per step. We find Abacus is in good agreement with Ramses and Pkdgrav3 and less so with Gadget3. We validate our choice of time step by halving the step size and find sub-percent differences in the power spectrum and 2PCF at nearly all measured scales, with

<0.3\%

errors at

k<10\ \mathrm{Mpc}^{-1}h

. On large scales, Abacus reproduces linear theory better than

0.01\%$. Simulation snapshots are available at http://nbody.rc.fas.harvard.edu/public/S2016 .Comment: 13 pages, 8 figures. Minor changes to match MNRAS accepted versio

arXiv.org e-Print Archive

The University of Arizona

Hierarchical Content Stores in High-speed ICN Routers: Emulation and Prototype Implementation

Author: Barcellos Marinho
Gallo Massimo
Leonardi Emilio
Mansilha Rordrigo
Perino Diego
Rossi Dario
Saino Lorenzo
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/09/2015
Field of study

Recent work motivates the design of Information-centric rou-ters that make use of hierarchies of memory to jointly scale in the size and speed of content stores. The present paper advances this understanding by (i) instantiating a general purpose two-layer packet-level caching system, (ii) investigating the solution design space via emulation, and (iii) introducing a proof-of-concept prototype. The emulation-based study reveals insights about the broad design space, the expected impact of workload, and gains due to multi-threaded execution. The full-blown system prototype experimentally confirms that, by exploiting both DRAM and SSD memory technologies, ICN routers can sustain cache operations in excess of 10Gbps running on off-the-shelf hardware

INRIA a CCSD electronic archive server

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

PORTO Publications Open Repository TOrino