104,173 research outputs found

    Communication Efficient Checking of Big Data Operations

    Get PDF
    We propose fast probabilistic algorithms with low (i.e., sublinear in the input size) communication volume to check the correctness of operations in Big Data processing frameworks and distributed databases. Our checkers cover many of the commonly used operations, including sum, average, median, and minimum aggregation, as well as sorting, union, merge, and zip. An experimental evaluation of our implementation in Thrill (Bingmann et al., 2016) confirms the low overhead and high failure detection rate predicted by theoretical analysis

    Fast Witness Extraction Using a Decision Oracle

    Full text link
    The gist of many (NP-)hard combinatorial problems is to decide whether a universe of nn elements contains a witness consisting of kk elements that match some prescribed pattern. For some of these problems there are known advanced algebra-based FPT algorithms which solve the decision problem but do not return the witness. We investigate techniques for turning such a YES/NO-decision oracle into an algorithm for extracting a single witness, with an objective to obtain practical scalability for large values of nn. By relying on techniques from combinatorial group testing, we demonstrate that a witness may be extracted with O(klogn)O(k\log n) queries to either a deterministic or a randomized set inclusion oracle with one-sided probability of error. Furthermore, we demonstrate through implementation and experiments that the algebra-based FPT algorithms are practical, in particular in the setting of the kk-path problem. Also discussed are engineering issues such as optimizing finite field arithmetic.Comment: Journal version, 16 pages. Extended abstract presented at ESA'1
    corecore