Search CORE

21 research outputs found

Implementing Performance Competitive Logical Recovery

Author: Lomet David
Tzoumas Kostas
Zwilling Michael
Publication venue
Publication date: 01/01/2010
Field of study

New hardware platforms, e.g. cloud, multi-core, etc., have led to a reconsideration of database system architecture. Our Deuteronomy project separates transactional functionality from data management functionality, enabling a flexible response to exploiting new platforms. This separation requires, however, that recovery is described logically. In this paper, we extend current recovery methods to work in this logical setting. While this is straightforward in principle, performance is an issue. We show how ARIES style recovery optimizations can work for logical recovery where page information is not captured on the log. In side-by-side performance experiments using a common log, we compare logical recovery with a state-of-the art ARIES style recovery implementation and show that logical redo performance can be competitive.Comment: VLDB201

arXiv.org e-Print Archive

CiteSeerX

Crossref

VBN

Enabling Operator Reordering in Data Flow Programs Through Static Code Analysis

Author: Hueske Fabian
Krettek Aljoscha
Tzoumas Kostas
Publication venue
Publication date: 17/01/2013
Field of study

In many massively parallel data management platforms, programs are represented as small imperative pieces of code connected in a data flow. This popular abstraction makes it hard to apply algebraic reordering techniques employed by relational DBMSs and other systems that use an algebraic programming abstraction. We present a code analysis technique based on reverse data and control flow analysis that discovers a set of properties from user code, which can be used to emulate algebraic optimizations in this setting.Comment: 4 pages, accepted and presented at the First International Workshop on Cross-model Language Design and Implementation (XLDI), affiliated with ICFP 2012, Copenhage

arXiv.org e-Print Archive

CiteSeerX

Lightweight Asynchronous Snapshots for Distributed Dataflows

Author: Carbone Paris
Ewen Stephan
Fóra Gyula
Haridi Seif
Tzoumas Kostas
Publication venue
Publication date: 01/01/2015
Field of study

Distributed stateful stream processing enables the deployment and execution of large scale continuous computations in the cloud, targeting both low latency and high throughput. One of the most fundamental challenges of this paradigm is providing processing guarantees under potential failures. Existing approaches rely on periodic global state snapshots that can be used for failure recovery. Those approaches suffer from two main drawbacks. First, they often stall the overall computation which impacts ingestion. Second, they eagerly persist all records in transit along with the operation states which results in larger snapshots than required. In this work we propose Asynchronous Barrier Snapshotting (ABS), a lightweight algorithm suited for modern dataflow execution engines that minimises space requirements. ABS persists only operator states on acyclic execution topologies while keeping a minimal record log on cyclic dataflows. We implemented ABS on Apache Flink, a distributed analytics engine that supports stateful stream processing. Our evaluation shows that our algorithm does not have a heavy impact on the execution, maintaining linear scalability and performing well with frequent snapshots.Comment: 8 pages, 7 figure

arXiv.org e-Print Archive

Publikationer från KTH

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Spinning Fast Iterative Data Flows

Author: Ewen Stephan
Kaufmann Moritz
Markl Volker
Tzoumas Kostas
Publication venue
Publication date: 01/01/2012
Field of study

Parallel dataflow systems are a central part of most analytic pipelines for big data. The iterative nature of many analysis and machine learning algorithms, however, is still a challenge for current systems. While certain types of bulk iterative algorithms are supported by novel dataflow frameworks, these systems cannot exploit computational dependencies present in many algorithms, such as graph algorithms. As a result, these algorithms are inefficiently executed and have led to specialized systems based on other paradigms, such as message passing or shared memory. We propose a method to integrate incremental iterations, a form of workset iterations, with parallel dataflows. After showing how to integrate bulk iterations into a dataflow system and its optimizer, we present an extension to the programming model for incremental iterations. The extension alleviates for the lack of mutable state in dataflows and allows for exploiting the sparse computational dependencies inherent in many iterative algorithms. The evaluation of a prototypical implementation shows that those aspects lead to up to two orders of magnitude speedup in algorithm runtime, when exploited. In our experiments, the improved dataflow system is highly competitive with specialized systems while maintaining a transparent and unified dataflow abstraction.Comment: VLDB201

arXiv.org e-Print Archive

CiteSeerX

Techniques for the Efficient Management of Non-Uniform Data and Workloads

Author: Tzoumas Kostas
Publication venue
Publication date: 01/06/2011
Field of study

VBN

Introduction to Apache Flink: stream processing for real time and beyond

Author: Friedman B Ellen
Tzoumas Kostas
Publication venue: O'Reilly Media
Publication date: 01/01/2016
Field of study

CERN Document Server

Myriad: Scalable and Expressive Data Generation

Author: Alexander Alexandrov
Kostas Tzoumas
Volker Markl
Publication venue
Publication date: 01/01/2012
Field of study

ABSTRACT The current research focus on Big Data systems calls for a rethinking of data generation methods. The traditional sequential data generation approach is not well suited to large-scale systems as generating a terabyte of data may require days or even weeks depending on the number of constraints imposed on the generated model. We demonstrate Myriad, a new data generation toolkit that enables the specification of semantically rich data generator programs that can scale out linearly in a shared-nothing environment. Data generation programs built on top of Myriad implement an efficient parallel execution strategy leveraged by the extensive use of pseudo-random number generators with random access support

CiteSeerX

Implementing performance competitive logical recovery

Author: Lomet David
Tzoumas Kostas
Zwilling Michael
Publication venue
Publication date: 01/04/2011
Field of study

VBN

Myriad: Scalable and Expressive Data Generation

Author: Alexander Alexandrov
Kostas Tzoumas
Volker Markl
Publication venue
Publication date: 01/01/2012
Field of study

CiteSeerX