Search CORE

5 research outputs found

Pando: Personal Volunteer Computing in Browsers

Author: Anderson David P.
Balouek Daniel
Berry Kevin
Cherniack Mitch
Chorazyk Pawel
Dias David
Duda Jerzy
Jangda Abhinav
Lavoie Erick
Martınez Gonzalo J
Nakamoto Satoshi
Reginald Cushing
Ryza Sandy
Smolka Gert
Werner M. J.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 06/09/2019
Field of study

The large penetration and continued growth in ownership of personal electronic devices represents a freely available and largely untapped source of computing power. To leverage those, we present Pando, a new volunteer computing tool based on a declarative concurrent programming model and implemented using JavaScript, WebRTC, and WebSockets. This tool enables a dynamically varying number of failure-prone personal devices contributed by volunteers to parallelize the application of a function on a stream of values, by using the devices' browsers. We show that Pando can provide throughput improvements compared to a single personal device, on a variety of compute-bound applications including animation rendering and image processing. We also show the flexibility of our approach by deploying Pando on personal devices connected over a local network, on Grid5000, a French-wide computing grid in a virtual private network, and seven PlanetLab nodes distributed in a wide area network over Europe.Comment: 14 pages, 12 figures, 2 table

arXiv.org e-Print Archive

Crossref

Black or White? How to Develop an AutoTuner for Memory-based Analytics [Extended Version]

Author: Agrawal Sanjay
Alipourfard Omid
Cao Zhen
Chaudhuri Surajit
Chaudhuri Surajit
Dias Karl
Hsu Chin-Jung
Iorgulescu Calin
Kwan Eva
Lillicrap Timothy P.
Marcus Ryan
Or Andrew
Ryza Sandy
Storm Adam J.
Venkataraman Shivaram
Xin Reynold
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 26/02/2020
Field of study

There is a lot of interest today in building autonomous (or, self-driving) data processing systems. An emerging school of thought is to leverage AI-driven "black box" algorithms for this purpose. In this paper, we present a contrarian view. We study the problem of autotuning the memory allocation for applications running on modern distributed data processing systems. For this problem, we show that an empirically-driven "white-box" algorithm, called RelM, that we have developed provides a close-to-optimal tuning at a fraction of the overheads compared to state-of-the-art AI-driven "black box" algorithms, namely, Bayesian Optimization (BO) and Deep Distributed Policy Gradient (DDPG). The main reason for RelM's superior performance is that the memory management in modern memory-based data analytics systems is an interplay of algorithms at multiple levels: (i) at the resource-management level across various containers allocated by resource managers like Kubernetes and YARN, (ii) at the container level among the OS, pods, and processes such as the Java Virtual Machine (JVM), (iii) at the application level for caching, aggregation, data shuffles, and application data structures, and (iv) at the JVM level across various pools such as the Young and Old Generation. RelM understands these interactions and uses them in building an analytical solution to autotune the memory management knobs. In another contribution, called GBO, we use the RelM's analytical models to speed up Bayesian Optimization. Through an evaluation based on Apache Spark, we showcase that RelM's recommendations are significantly better than what commonly-used Spark deployments provide, and are close to the ones obtained by brute-force exploration; while GBO provides optimality guarantees for a higher, but still significantly lower compared to the state-of-the-art AI-driven policies, cost overhead.Comment: Main version in ACM SIGMOD 202

arXiv.org e-Print Archive

Crossref

Advanced analytics with Spark: patterns from learning from data at scale

Author: Laserson Uri
Owen Sean
Ryza Sandy
Wills Josh
Publication venue: O'Reilly Media
Publication date: 01/01/2017
Field of study

CERN Document Server

Advanced Analytics with PySpark

Author: Laserson Uri
Owen Sean
Ryza Sandy
Tandon Akash
Wills Josh
Publication venue: O'Reilly Media Inc
Publication date: 01/01/2022
Field of study

CERN Document Server

Advanced analytics with Spark

Author: Laserson Uri
Owen Sean
Ryza Sandy
Wills Josh
Publication venue: O'Reilly Media
Publication date: 01/01/2015
Field of study

CERN Document Server