Making State Explicit for Imperative Big Data Processing

Castro Fernandez, Raul; Kalyvianaki, Evangelia; Migliavacca, Matteo; Pietzuch, Peter

Making State Explicit for Imperative Big Data Processing

Authors: Raul Castro Fernandez
Evangelia Kalyvianaki
Matteo Migliavacca
Peter Pietzuch
Publication date: 13 July 2019
Publisher: Proceedings of the 2014 USENIX Conference on USENIX Annual Technical Conference
Doi

Abstract

Data scientists often implement machine learning algo- rithms in imperative languages such as Java, Matlab and R. Yet such implementations fail to achieve the per- formance and scalability of specialised data-parallel pro- cessing frameworks. Our goal is to execute impera- tive Java programs in a data-parallel fashion with high throughput and low latency. This raises two challenges: how to support the arbitrary mutable state of Java pro- grams without compromising scalability, and how to re- cover that state after failure with low overhead. Our idea is to infer the dataflow and the types of state accesses from a Java program and use this information to generate a stateful dataflow graph (SDG). By explic- itly separating data from mutable state, SDGs have spe- cific features to enable this translation: to ensure scala- bility, distributed state can be partitioned across nodes if computation can occur entirely in parallel; if this is not possible, partial state gives nodes local instances for in- dependent computation, which are reconciled according to application semantics. For fault tolerance, large in- memory state is checkpointed asynchronously without global coordination. We show that the performance of SDGs for several imperative online applications matches that of existing data-parallel processing frameworks

Similar works

Full text

Open in the Core reader

Download PDF

Available Versions

Sustaining member

Apollo (Cambridge)

oai:www.repository.cam.ac.uk:1...

Last time updated on 07/08/2019

CiteSeerX

oai:CiteSeerX.psu:10.1.1.1046....

Last time updated on 07/12/2020