14,439 research outputs found
ALOJA: A benchmarking and predictive platform for big data performance analysis
The main goals of the ALOJA research project from BSC-MSR, are to explore and automate the characterization of cost-effectivenessof Big Data deployments. The development of the project over its first year, has resulted in a open source benchmarking platform, an online public repository of results with over 42,000 Hadoop job runs, and web-based analytic tools to gather insights about system's cost-performance1.
This article describes the evolution of the project's focus and research
lines from over a year of continuously benchmarking Hadoop under dif-
ferent configuration and deployments options, presents results, and dis
cusses the motivation both technical and market-based of such changes.
During this time, ALOJA's target has evolved from a previous low-level
profiling of Hadoop runtime, passing through extensive benchmarking
and evaluation of a large body of results via aggregation, to currently
leveraging Predictive Analytics (PA) techniques. Modeling benchmark
executions allow us to estimate the results of new or untested configu-
rations or hardware set-ups automatically, by learning techniques from
past observations saving in benchmarking time and costs.This work is partially supported the BSC-Microsoft Research Centre, the Span-
ish Ministry of Education (TIN2012-34557), the MINECO Severo Ochoa Research program (SEV-2011-0067) and the Generalitat de Catalunya (2014-SGR-1051).Peer ReviewedPostprint (author's final draft
Predicting Intermediate Storage Performance for Workflow Applications
Configuring a storage system to better serve an application is a challenging
task complicated by a multidimensional, discrete configuration space and the
high cost of space exploration (e.g., by running the application with different
storage configurations). To enable selecting the best configuration in a
reasonable time, we design an end-to-end performance prediction mechanism that
estimates the turn-around time of an application using storage system under a
given configuration. This approach focuses on a generic object-based storage
system design, supports exploring the impact of optimizations targeting
workflow applications (e.g., various data placement schemes) in addition to
other, more traditional, configuration knobs (e.g., stripe size or replication
level), and models the system operation at data-chunk and control message
level.
This paper presents our experience to date with designing and using this
prediction mechanism. We evaluate this mechanism using micro- as well as
synthetic benchmarks mimicking real workflow applications, and a real
application.. A preliminary evaluation shows that we are on a good track to
meet our objectives: it can scale to model a workflow application run on an
entire cluster while offering an over 200x speedup factor (normalized by
resource) compared to running the actual application, and can achieve, in the
limited number of scenarios we study, a prediction accuracy that enables
identifying the best storage system configuration
- …