12,226 research outputs found
Streaming Verification of Graph Properties
Streaming interactive proofs (SIPs) are a framework for outsourced
computation. A computationally limited streaming client (the verifier) hands
over a large data set to an untrusted server (the prover) in the cloud and the
two parties run a protocol to confirm the correctness of result with high
probability. SIPs are particularly interesting for problems that are hard to
solve (or even approximate) well in a streaming setting. The most notable of
these problems is finding maximum matchings, which has received intense
interest in recent years but has strong lower bounds even for constant factor
approximations.
In this paper, we present efficient streaming interactive proofs that can
verify maximum matchings exactly. Our results cover all flavors of matchings
(bipartite/non-bipartite and weighted). In addition, we also present streaming
verifiers for approximate metric TSP. In particular, these are the first
efficient results for weighted matchings and for metric TSP in any streaming
verification model.Comment: 26 pages, 2 figure, 1 tabl
Space-Efficient Algorithms and Verification Schemes for Graph Streams
Structured data-sets are often easy to represent using graphs. The prevalence of massive data-sets in the modern world gives rise to big graphs such as web graphs, social networks, biological networks, and citation graphs. Most of these graphs keep growing continuously and pose two major challenges in their processing: (a) it is infeasible to store them entirely in the memory of a regular server, and (b) even if stored entirely, it is incredibly inefficient to reread the whole graph every time a new query appears. Thus, a natural approach for efficiently processing and analyzing such graphs is reading them as a stream of edge insertions and deletions and maintaining a summary that can be (a) stored in affordable memory (significantly smaller than the input size) and (b) used to detect properties of the original graph. In this thesis, we explore the strengths and limitations of such graph streaming algorithms under three main paradigms: classical or standard streaming, adversarially robust streaming, and streaming verification.
In the classical streaming model, an algorithm needs to process an adversarially chosen input stream using space sublinear in the input size and return a desired output at the end of the stream. Here, we study a collection of fundamental directed graph problems like reachability, acyclicity testing, and topological sorting. Our investigation reveals that while most problems are provably hard for general digraphs, they admit efficient algorithms for the special and widely-studied subclass of tournament graphs. Further, we exhibit certain problems that become drastically easier when the stream elements arrive in random order rather than adversarial order, as well as problems that do not get much easier even under this relaxation. Furthermore, we study the graph coloring problem in this model and design color-efficient algorithms using novel parameterizations and establish complexity separations between different versions of the problem.
The classical streaming setting assumes that the entire input stream is fixed by an adversary before the algorithm reads it. Many randomized algorithms in this setting, however, fail when the stream is extended by an adaptive adversary based on past outputs received. This is the so-called adversarially robust streaming model. We show that graph coloring is significantly harder in the robust setting than in the classical setting, thus establishing the first such separation for a ``natural\u27\u27 problem. We also design a class of efficient robust coloring algorithms using novel techniques.
In classical streaming, many important problems turn out to be ``intractable\u27\u27, i.e., provably impossible to solve in sublinear space. It is then natural to consider an enhanced streaming setting where a space-bounded client outsources the computation to a space-unbounded but untrusted cloud service, who replies with the solution and a supporting ``proof\u27\u27 that the client needs to verify. This is called streaming verification or the annotated streaming model. It allows algorithms or verification schemes for the otherwise intractable problems using both space and proof length sublinear in the input size. We devise efficient schemes that improve upon the state of the art for a variety of fundamental graph problems including triangle counting, maximum matching, topological sorting, maximal independent set, graph connectivity, and shortest paths, as well as for computing frequency-based functions such as distinct items and maximum frequency, which have broad applications in graph streaming. Some of our schemes were conjectured to be impossible, while some others attain smooth and optimal tradeoffs between space and communication costs
Streaming Verification of Graph Computations via Graph Structure
We give new algorithms in the annotated data streaming setting - also known as verifiable data stream computation - for certain graph problems. This setting is meant to model outsourced computation, where a space-bounded verifier limited to sequential data access seeks to overcome its computational limitations by engaging a powerful prover, without needing to trust the prover. As is well established, several problems that admit no sublinear-space algorithms under traditional streaming do allow protocols using a sublinear amount of prover/verifier communication and sublinear-space verification. We give algorithms for many well-studied graph problems including triangle counting, its generalization to subgraph counting, maximum matching, problems about the existence (or not) of short paths, finding the shortest path between two vertices, and testing for an independent set. While some of these problems have been studied before, our results achieve new tradeoffs between space and communication costs that were hitherto unknown. In particular, two of our results disprove explicit conjectures of Thaler (ICALP, 2016) by giving triangle counting and maximum matching algorithms for n-vertex graphs, using o(n) space and o(n^2) communication
Semi-Streaming Algorithms for Annotated Graph Streams
Considerable effort has been devoted to the development of streaming
algorithms for analyzing massive graphs. Unfortunately, many results have been
negative, establishing that a wide variety of problems require
space to solve. One of the few bright spots has been the development of
semi-streaming algorithms for a handful of graph problems -- these algorithms
use space .
In the annotated data streaming model of Chakrabarti et al., a
computationally limited client wants to compute some property of a massive
input, but lacks the resources to store even a small fraction of the input, and
hence cannot perform the desired computation locally. The client therefore
accesses a powerful but untrusted service provider, who not only performs the
requested computation, but also proves that the answer is correct.
We put forth the notion of semi-streaming algorithms for annotated graph
streams (semi-streaming annotation schemes for short). These are protocols in
which both the client's space usage and the length of the proof are . We give evidence that semi-streaming annotation schemes
represent a substantially more robust solution concept than does the standard
semi-streaming model. On the positive side, we give semi-streaming annotation
schemes for two dynamic graph problems that are intractable in the standard
model: (exactly) counting triangles, and (exactly) computing maximum matchings.
The former scheme answers a question of Cormode. On the negative side, we
identify for the first time two natural graph problems (connectivity and
bipartiteness in a certain edge update model) that can be solved in the
standard semi-streaming model, but cannot be solved by annotation schemes of
"sub-semi-streaming" cost. That is, these problems are just as hard in the
annotations model as they are in the standard model.Comment: This update includes some additional discussion of the results
proven. The result on counting triangles was previously included in an ECCC
technical report by Chakrabarti et al. available at
http://eccc.hpi-web.de/report/2013/180/. That report has been superseded by
this manuscript, and the CCC 2015 paper "Verifiable Stream Computation and
Arthur-Merlin Communication" by Chakrabarti et a
Towards the Temporal Streaming of Graph Data on Distributed Ledgers
We present our work-in-progress on handling temporal RDF graph data using the Ethereum distributed ledger. The motivation for this work are scenarios where multiple distributed consumers of streamed data may need or wish to verify that data has not been tampered with since it was generated – for example, if the data describes something which can be or has been sold, such as domestically-generated electricity. We describe a system in which temporal annotations, and information suitable to validate a given dataset, are stored on a distributed ledger, alongside the results of fixed SPARQL queries executed at the time of data storage. The model adopted implements a graph-based form of temporal RDF, in which time intervals are represented by named graphs corresponding to ledger entries. We conclude by discussing evaluation, what remains to be implemented, and future directions
Event Stream Processing with Multiple Threads
Current runtime verification tools seldom make use of multi-threading to
speed up the evaluation of a property on a large event trace. In this paper, we
present an extension to the BeepBeep 3 event stream engine that allows the use
of multiple threads during the evaluation of a query. Various parallelization
strategies are presented and described on simple examples. The implementation
of these strategies is then evaluated empirically on a sample of problems.
Compared to the previous, single-threaded version of the BeepBeep engine, the
allocation of just a few threads to specific portions of a query provides
dramatic improvement in terms of running time
Streaming visualisation of quantitative mass spectrometry data based on a novel raw signal decomposition method
As data rates rise, there is a danger that informatics for high-throughput LC-MS becomes more opaque and inaccessible to practitioners. It is therefore critical that efficient visualisation tools are available to facilitate quality control, verification, validation, interpretation, and sharing of raw MS data and the results of MS analyses. Currently, MS data is stored as contiguous spectra. Recall of individual spectra is quick but panoramas, zooming and panning across whole datasets necessitates processing/memory overheads impractical for interactive use. Moreover, visualisation is challenging if significant quantification data is missing due to data-dependent acquisition of MS/MS spectra. In order to tackle these issues, we leverage our seaMass technique for novel signal decomposition. LC-MS data is modelled as a 2D surface through selection of a sparse set of weighted B-spline basis functions from an over-complete dictionary. By ordering and spatially partitioning the weights with an R-tree data model, efficient streaming visualisations are achieved. In this paper, we describe the core MS1 visualisation engine and overlay of MS/MS annotations. This enables the mass spectrometrist to quickly inspect whole runs for ionisation/chromatographic issues, MS/MS precursors for coverage problems, or putative biomarkers for interferences, for example. The open-source software is available from http://seamass.net/viz/
Linear Programming in the Semi-streaming Model with Application to the Maximum Matching Problem
In this paper, we study linear programming based approaches to the maximum
matching problem in the semi-streaming model. The semi-streaming model has
gained attention as a model for processing massive graphs as the importance of
such graphs has increased. This is a model where edges are streamed-in in an
adversarial order and we are allowed a space proportional to the number of
vertices in a graph.
In recent years, there has been several new results in this semi-streaming
model. However broad techniques such as linear programming have not been
adapted to this model. We present several techniques to adapt and optimize
linear programming based approaches in the semi-streaming model with an
application to the maximum matching problem. As a consequence, we improve
(almost) all previous results on this problem, and also prove new results on
interesting variants
- …