Search CORE

88,327 research outputs found

Recommended from our members

Making State Explicit for Imperative Big Data Processing

Author: Fernandez R. C.
Kalyvianaki E.
Migliavacca M.
Pietzuch P.
Publication venue
Publication date: 01/01/2014
Field of study

Data scientists often implement machine learning algorithms in imperative languages such as Java, Matlab and R. Yet such implementations fail to achieve the performance and scalability of specialised data-parallel processing frameworks. Our goal is to execute imperative Java programs in a data-parallel fashion with high throughput and low latency. This raises two challenges: how to support the arbitrary mutable state of Java programs without compromising scalability, and how to recover that state after failure with low overhead. Our idea is to infer the dataflow and the types of state accesses from a Java program and use this information to generate a stateful dataflow graph (SDG). By explicitly separating data from mutable state, SDGs have specific features to enable this translation: to ensure scalability, distributed state can be partitioned across nodes if computation can occur entirely in parallel; if this is not possible, partial state gives nodes local instances for independent computation, which are reconciled according to application semantics. For fault tolerance, large inmemory state is checkpointed asynchronously without global coordination. We show that the performance of SDGs for several imperative online applications matches that of existing data-parallel processing frameworks

City Research Online

Kent Academic Repository

Spiral - Imperial College Digital Repository

Making State Explicit for Imperative Big Data Processing

Author: Fernandez RC
Kalyvianaki E
Migliavacca M
Pietzuch PR
Publication venue: USENIX
Publication date: 01/04/2014
Field of study

Data scientists often implement machine learning algorithms in imperative languages such as Java, Matlab and R. Yet such implementations fail to achieve the performance and scalability of specialised data-parallel processing frameworks. Our goal is to execute imperative Java programs in a data-parallel fashion with high throughput and low latency. This raises two challenges: how to support the arbitrary mutable state of Java programs without compromising scalability, and how to re cover that state after failure with low overhead. Our idea is to infer the dataflow and the types of state accesses from a Java program and use this information to generate a stateful dataflow graph (SDG) . By explicitly separating data from mutablestate, SDGs have specific features to enable this translation: to ensure scalability, distributed state can be partitioned across nodes if computation can occur entirely in parallel; if this is not possible, partial state gives nodes local instances for independent computation, which are reconciled according to application semantics. For fault tolerance, large inmemory state is checkpointed asynchronously without global coordination. We show that the performance of SDGs for several imperative online applications matches that of existing data-parallel processing frameworks

Spiral - Imperial College Digital Repository

Making State Explicit for Imperative Big Data Processing

Author: Castro Fernandez Raul
Kalyvianaki Evangelia
Migliavacca Matteo
Pietzuch Peter
Publication venue: Proceedings of the 2014 USENIX Conference on USENIX Annual Technical Conference
Publication date: 13/07/2019
Field of study

Data scientists often implement machine learning algo- rithms in imperative languages such as Java, Matlab and R. Yet such implementations fail to achieve the per- formance and scalability of specialised data-parallel pro- cessing frameworks. Our goal is to execute impera- tive Java programs in a data-parallel fashion with high throughput and low latency. This raises two challenges: how to support the arbitrary mutable state of Java pro- grams without compromising scalability, and how to re- cover that state after failure with low overhead. Our idea is to infer the dataflow and the types of state accesses from a Java program and use this information to generate a stateful dataflow graph (SDG). By explic- itly separating data from mutable state, SDGs have spe- cific features to enable this translation: to ensure scala- bility, distributed state can be partitioned across nodes if computation can occur entirely in parallel; if this is not possible, partial state gives nodes local instances for in- dependent computation, which are reconciled according to application semantics. For fault tolerance, large in- memory state is checkpointed asynchronously without global coordination. We show that the performance of SDGs for several imperative online applications matches that of existing data-parallel processing frameworks

CiteSeerX

Apollo (Cambridge)

A Comparison of Big Data Frameworks on a Layered Dataflow Model

Author: Aldinucci Marco
Drocco Maurizio
Misale Claudia
Tremblay Guy
Publication venue
Publication date: 16/06/2016
Field of study

In the world of Big Data analytics, there is a series of tools aiming at simplifying programming applications to be executed on clusters. Although each tool claims to provide better programming, data and execution models, for which only informal (and often confusing) semantics is generally provided, all share a common underlying model, namely, the Dataflow model. The Dataflow model we propose shows how various tools share the same expressiveness at different levels of abstraction. The contribution of this work is twofold: first, we show that the proposed model is (at least) as general as existing batch and streaming frameworks (e.g., Spark, Flink, Storm), thus making it easier to understand high-level data-processing applications written in such frameworks. Second, we provide a layered model that can represent tools and applications following the Dataflow paradigm and we show how the analyzed tools fit in each level.Comment: 19 pages, 6 figures, 2 tables, In Proc. of the 9th Intl Symposium on High-Level Parallel Programming and Applications (HLPP), July 4-5 2016, Muenster, German

arXiv.org e-Print Archive

Institutional Research Information System University of Turin

Recommended from our members

Intelligence Unleashed: An argument for AI in Education

Author: Forcier Laurie B.
Griffiths Mark
Holmes Wayne
Luckin Rose
Publication venue: Pearson Education
Publication date: 01/01/2016
Field of study

Open Research Online (The Open University)

The Profiling Potential of Computer Vision and the Challenge of Computational Empiricism

Author: Alley Thomas
Andrejevic Mark
Apter Emily
Citron Danielle Keats
Clemens Justin
Gandy Oscar H
Humphries Paul
Jäger Jens
Krizhevsky Alex
Mann Monique
Marien Mary Warner
Selinger Evan
Todorov Alexander
Vanian Jonathan
Weatherby Leif
Willis J
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 22/04/2019
Field of study

Computer vision and other biometrics data science applications have commenced a new project of profiling people. Rather than using 'transaction generated information', these systems measure the 'real world' and produce an assessment of the 'world state' - in this case an assessment of some individual trait. Instead of using proxies or scores to evaluate people, they increasingly deploy a logic of revealing the truth about reality and the people within it. While these profiling knowledge claims are sometimes tentative, they increasingly suggest that only through computation can these excesses of reality be captured and understood. This article explores the bases of those claims in the systems of measurement, representation, and classification deployed in computer vision. It asks if there is something new in this type of knowledge claim, sketches an account of a new form of computational empiricism being operationalised, and questions what kind of human subject is being constructed by these technological systems and practices. Finally, the article explores legal mechanisms for contesting the emergence of computational empiricism as the dominant knowledge platform for understanding the world and the people within it

arXiv.org e-Print Archive

Crossref