10,712 research outputs found

    Lightweight Asynchronous Snapshots for Distributed Dataflows

    Full text link
    Distributed stateful stream processing enables the deployment and execution of large scale continuous computations in the cloud, targeting both low latency and high throughput. One of the most fundamental challenges of this paradigm is providing processing guarantees under potential failures. Existing approaches rely on periodic global state snapshots that can be used for failure recovery. Those approaches suffer from two main drawbacks. First, they often stall the overall computation which impacts ingestion. Second, they eagerly persist all records in transit along with the operation states which results in larger snapshots than required. In this work we propose Asynchronous Barrier Snapshotting (ABS), a lightweight algorithm suited for modern dataflow execution engines that minimises space requirements. ABS persists only operator states on acyclic execution topologies while keeping a minimal record log on cyclic dataflows. We implemented ABS on Apache Flink, a distributed analytics engine that supports stateful stream processing. Our evaluation shows that our algorithm does not have a heavy impact on the execution, maintaining linear scalability and performing well with frequent snapshots.Comment: 8 pages, 7 figure

    Relations between automata and the simple k-path problem

    Full text link
    Let GG be a directed graph on nn vertices. Given an integer k<=nk<=n, the SIMPLE kk-PATH problem asks whether there exists a simple kk-path in GG. In case GG is weighted, the MIN-WT SIMPLE kk-PATH problem asks for a simple kk-path in GG of minimal weight. The fastest currently known deterministic algorithm for MIN-WT SIMPLE kk-PATH by Fomin, Lokshtanov and Saurabh runs in time O(2.851k⋅nO(1)⋅log⁡W)O(2.851^k\cdot n^{O(1)}\cdot \log W) for graphs with integer weights in the range [−W,W][-W,W]. This is also the best currently known deterministic algorithm for SIMPLE k-PATH- where the running time is the same without the log⁡W\log W factor. We define Lk(n)⊆[n]kL_k(n)\subseteq [n]^k to be the set of words of length kk whose symbols are all distinct. We show that an explicit construction of a non-deterministic automaton (NFA) of size f(k)⋅nO(1)f(k)\cdot n^{O(1)} for Lk(n)L_k(n) implies an algorithm of running time O(f(k)⋅nO(1)⋅log⁡W)O(f(k)\cdot n^{O(1)}\cdot \log W) for MIN-WT SIMPLE kk-PATH when the weights are non-negative or the constructed NFA is acyclic as a directed graph. We show that the algorithm of Kneis et al. and its derandomization by Chen et al. for SIMPLE kk-PATH can be used to construct an acylic NFA for Lk(n)L_k(n) of size O∗(4k+o(k))O^*(4^{k+o(k)}). We show, on the other hand, that any NFA for Lk(n)L_k(n) must be size at least 2k2^k. We thus propose closing this gap and determining the smallest NFA for Lk(n)L_k(n) as an interesting open problem that might lead to faster algorithms for MIN-WT SIMPLE kk-PATH. We use a relation between SIMPLE kk-PATH and non-deterministic xor automata (NXA) to give another direction for a deterministic algorithm with running time O∗(2k)O^*(2^k) for SIMPLE kk-PATH

    The data-exchange chase under the microscope

    Full text link
    In this paper we take closer look at recent developments for the chase procedure, and provide additional results. Our analysis allows us create a taxonomy of the chase variations and the properties they satisfy. Two of the most central problems regarding the chase is termination, and discovery of restricted classes of sets of dependencies that guarantee termination of the chase. The search for the restricted classes has been motivated by a fairly recent result that shows that it is undecidable to determine whether the chase with a given dependency set will terminate on a given instance. There is a small dissonance here, since the quest has been for classes of sets of dependencies guaranteeing termination of the chase on all instances, even though the latter problem was not known to be undecidable. We resolve the dissonance in this paper by showing that determining whether the chase with a given set of dependencies terminates on all instances is coRE-complete. For the hardness proof we use a reduction from word rewriting systems, thereby also showing the close connection between the chase and word rewriting. The same reduction also gives us the aforementioned instance-dependent RE-completeness result as a byproduct. For one of the restricted classes guaranteeing termination on all instances, the stratified sets dependencies, we provide new complexity results for the problem of testing whether a given set of dependencies belongs to it. These results rectify some previous claims that have occurred in the literature.Comment: arXiv admin note: substantial text overlap with arXiv:1303.668

    Large-scale Parallel Stratified Defeasible Reasoning

    Get PDF
    We are recently experiencing an unprecedented explosion of available data from the Web, sensors readings, scientific databases, government authorities and more. Such datasets could benefit from the introduction of rule sets encoding commonly accepted rules or facts, application- or domain-specific rules, commonsense knowledge etc. This raises the question of whether, how, and to what extent knowledge representation methods are capable of handling huge amounts of data for these applications. In this paper, we consider inconsistency-tolerant reasoning in the form of defeasible logic, and analyze how parallelization, using the MapReduce framework, can be used to reason with defeasible rules over huge datasets. We extend previous work by dealing with predicates of arbitrary arity, under the assumption of stratification. Moving from unary to multi-arity predicates is a decisive step towards practical applications, e.g. reasoning with linked open (RDF) data. Our experimental results demonstrate that defeasible reasoning with millions of data is performant, and has the potential to scale to billions of facts
    • 

    corecore