Interacting with Large Distributed Datasets using Sketch

Al-Kiswany, Samer; Andoni, Alexandr; Barham, Paul; Boshmaf, Yazan; Budiu, Mihai; Isaacs, Rebecca; Luo, Qingzhou; Murray, Derek; Plotkin, Gordon

Interacting with Large Distributed Datasets using Sketch

Authors: Samer Al-Kiswany
Alexandr Andoni
Paul Barham
Yazan Boshmaf
Mihai Budiu
Rebecca Isaacs
Qingzhou Luo
Derek Murray
Gordon Plotkin
Publication date: 29 January 2015
Publisher
Doi

Abstract

We present Sketch, a distributed software infrastructure for building interactive tools for exploring large datasets, distributed across multiple machines. We have built three sophisticated applications using this framework: a billion-row spreadsheet, a distributed log browser, and a distributed- systems performance debugging tool. Sketch applications allow interactive and responsive exploration of complex distributed datasets, scaling gracefully to large system sizes. The conflicting constraints of large-scale data and small timescales required by human interaction are difficult to satisfy simultaneously. Sketch exploits a sweet spot in this trade-off by exploiting the observation that the precision of a data view is limited by the resolution of the user?s screen. The system pushes data reduction operations to the data sources. The core Sketch abstraction provides a narrow programming interface; Sketch clients construct a distributed application by stacking modular components with identical interfaces, each providing a useful feature: network transparency, concurrency, fault-tolerance, straggler avoidance, round-trip reduction, distributed aggregation

Similar works

Full text

Open in the Core reader

Download PDF

Available Versions

Edinburgh Research Explorer

oai:pure.ed.ac.uk:publications...

Last time updated on 09/08/2016

Minds@University of Wisconsin

oai:minds.wisconsin.edu:1793/7...

Last time updated on 08/05/2016