Search CORE

9 research outputs found

An Improved Interactive Streaming Algorithm for the Distinct Elements Problem

Author: A. Chakrabarti
A. Razborov
A. Razborov
C. Lund
D.M. Kane
G. Cormode
G. Cormode
G. Cormode
H. Klauck
J. Justesen
O. Lachish
S. Goldwasser
T. Gur
Publication venue
Publication date: 01/01/2014
Field of study

The exact computation of the number of distinct elements (frequency moment

F_0

) is a fundamental problem in the study of data streaming algorithms. We denote the length of the stream by

n

where each symbol is drawn from a universe of size

m

. While it is well known that the moments

F_0,F_1,F_2

can be approximated by efficient streaming algorithms, it is easy to see that exact computation of

F_0,F_2

requires space

\Omega(m)

. In previous work, Cormode et al. therefore considered a model where the data stream is also processed by a powerful helper, who provides an interactive proof of the result. They gave such protocols with a polylogarithmic number of rounds of communication between helper and verifier for all functions in NC. This number of rounds

\left(O(\log^2 m) \;\text{in the case of} \;F_0 \right)

can quickly make such protocols impractical. Cormode et al. also gave a protocol with

\log m +1

rounds for the exact computation of

F_0

where the space complexity is

O\left(\log m \log n+\log^2 m\right)

but the total communication

O\left(\sqrt{n}\log m\left(\log n+ \log m \right)\right)

. They managed to give

\log m

round protocols with

\operatorname{polylog}(m,n)

complexity for many other interesting problems including

F_2

, Inner product, and Range-sum, but computing

F_0

exactly with polylogarithmic space and communication and

O(\log m)

rounds remained open. In this work, we give a streaming interactive protocol with

\log m

rounds for exact computation of

F_0

using

O\left(\log m \left(\,\log n + \log m \log\log m\,\right)\right)

bits of space and the communication is

O\left( \log m \left(\,\log n +\log^3 m (\log\log m)^2 \,\right)\right)

. The update time of the verifier per symbol received is

O(\log^2 m)

.Comment: Submitted to ICALP 201

arXiv.org e-Print Archive

Crossref

Streaming Verification of Graph Computations via Graph Structure

Author: Chakrabarti Amit
Ghosh Prantar
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2019)
Publication date: 01/01/2019
Field of study

We give new algorithms in the annotated data streaming setting - also known as verifiable data stream computation - for certain graph problems. This setting is meant to model outsourced computation, where a space-bounded verifier limited to sequential data access seeks to overcome its computational limitations by engaging a powerful prover, without needing to trust the prover. As is well established, several problems that admit no sublinear-space algorithms under traditional streaming do allow protocols using a sublinear amount of prover/verifier communication and sublinear-space verification. We give algorithms for many well-studied graph problems including triangle counting, its generalization to subgraph counting, maximum matching, problems about the existence (or not) of short paths, finding the shortest path between two vertices, and testing for an independent set. While some of these problems have been studied before, our results achieve new tradeoffs between space and communication costs that were hitherto unknown. In particular, two of our results disprove explicit conjectures of Thaler (ICALP, 2016) by giving triangle counting and maximum matching algorithms for n-vertex graphs, using o(n) space and o(n^2) communication

Dagstuhl Research Online Publication Server

Semi-Streaming Algorithms for Annotated Graph Streams

Author: Thaler Justin
Publication venue
Publication date: 10/08/2015
Field of study

Considerable effort has been devoted to the development of streaming algorithms for analyzing massive graphs. Unfortunately, many results have been negative, establishing that a wide variety of problems require

\Omega(n^2)

space to solve. One of the few bright spots has been the development of semi-streaming algorithms for a handful of graph problems -- these algorithms use space

O(n\cdot\text{polylog}(n))

. In the annotated data streaming model of Chakrabarti et al., a computationally limited client wants to compute some property of a massive input, but lacks the resources to store even a small fraction of the input, and hence cannot perform the desired computation locally. The client therefore accesses a powerful but untrusted service provider, who not only performs the requested computation, but also proves that the answer is correct. We put forth the notion of semi-streaming algorithms for annotated graph streams (semi-streaming annotation schemes for short). These are protocols in which both the client's space usage and the length of the proof are

O(n \cdot \text{polylog}(n))

. We give evidence that semi-streaming annotation schemes represent a substantially more robust solution concept than does the standard semi-streaming model. On the positive side, we give semi-streaming annotation schemes for two dynamic graph problems that are intractable in the standard model: (exactly) counting triangles, and (exactly) computing maximum matchings. The former scheme answers a question of Cormode. On the negative side, we identify for the first time two natural graph problems (connectivity and bipartiteness in a certain edge update model) that can be solved in the standard semi-streaming model, but cannot be solved by annotation schemes of "sub-semi-streaming" cost. That is, these problems are just as hard in the annotations model as they are in the standard model.Comment: This update includes some additional discussion of the results proven. The result on counting triangles was previously included in an ECCC technical report by Chakrabarti et al. available at http://eccc.hpi-web.de/report/2013/180/. That report has been superseded by this manuscript, and the CCC 2015 paper "Verifiable Stream Computation and Arthur-Merlin Communication" by Chakrabarti et a

arXiv.org e-Print Archive

CiteSeerX

Dagstuhl Research Online Publication Server

New Verification Schemes for Frequency-Based Functions on Data Streams

Author: Ghosh Prantar
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 40th IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS 2020)
Publication date: 01/01/2020
Field of study

We study the general problem of computing frequency-based functions, i.e., the sum of any given function of data stream frequencies. Special cases include fundamental data stream problems such as computing the number of distinct elements (

F_0

), frequency moments (

F_k

), and heavy-hitters. It can also be applied to calculate the maximum frequency of an element (

F_{\infty}

). Given that exact computation of most of these special cases provably do not admit any sublinear space algorithm, a natural approach is to consider them in an enhanced data streaming model, where we have a computationally unbounded but untrusted prover sending proofs or help messages to ease the computation. Think of a memory-restricted client delegating the computation to a stronger cloud service whom it doesn't want to trust blindly. Using its limited memory, it wants to verify the proof that the cloud sends. Chakrabarti et al.~(ICALP '09) introduced this setting as the "annotated data streaming model" and showed that multiple problems including exact computation of frequency-based functions---that have no sublinear algorithms in basic streaming---do have annotated streaming algorithms, also called "schemes", with both space and proof-length sublinear in the input size. We give a general scheme for computing any frequency-based function with both space usage and proof-size of

O(n^{2/3}\log n)

bits, where

n

is the size of the universe. This improves upon the best known bound of

O(n^{2/3}\log^{4/3} n)

given by the seminal paper of Chakrabarti et al.~and as a result, also improves upon the best known bounds for the important special cases of computing

F_0

and

F_{\infty}

. We emphasize that while being quantitatively better, our scheme is also qualitatively better in the sense that it is simpler than the previously best scheme that uses intricate data structures and elaborate subroutines.Comment: To appear in FSTTCS 202

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

Streaming Verification for Graph Problems: Optimal Tradeoffs and Nonlinear Sketches

Author: Chakrabarti Amit
Ghosh Prantar
Thaler Justin
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2020)
Publication date: 01/01/2020
Field of study

We study graph computations in an enhanced data streaming setting, where a space-bounded client reading the edge stream of a massive graph may delegate some of its work to a cloud service. We seek algorithms that allow the client to verify a purported proof sent by the cloud service that the work done in the cloud is correct. A line of work starting with Chakrabarti et al. (ICALP 2009) has provided such algorithms, which we call schemes, for several statistical and graph-theoretic problems, many of which exhibit a tradeoff between the length of the proof and the space used by the streaming verifier. This work designs new schemes for a number of basic graph problems---including triangle counting, maximum matching, topological sorting, and single-source shortest paths---where past work had either failed to obtain smooth tradeoffs between these two key complexity measures or only obtained suboptimal tradeoffs. Our key innovation is having the verifier compute certain nonlinear sketches of the input stream, leading to either new or improved tradeoffs. In many cases, our schemes in fact provide optimal tradeoffs up to logarithmic factors. Specifically, for most graph problems that we study, it is known that the product of the verifier's space cost

v

and the proof length

h

must be at least

\Omega(n^2)

for

n

-vertex graphs. However, matching upper bounds are only known for a handful of settings of

h

and

v

on the curve

h \cdot v=\tilde{\Theta}(n^2)

. For example, for counting triangles and maximum matching, schemes with costs lying on this curve are only known for

(h=\tilde{O}(n^2), v=\tilde{O}(1))

(h=\tilde{O}(n), v=\tilde{O}(n))

, and the trivial

(h=\tilde{O}(1), v=\tilde{O}(n^2))

. A major message of this work is that by exploiting nonlinear sketches, a significant ``portion'' of costs on the tradeoff curve

h \cdot v = n^2

can be achieved

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

Annotations for Sparse Data Streams

Author: Chakrabarti Amit
Cormode Graham
Goyal Navin
Thaler Justin
Publication venue
Publication date: 13/04/2013
Field of study

Motivated by cloud computing, a number of recent works have studied annotated data streams and variants thereof. In this setting, a computationally weak verifier (cloud user), lacking the resources to store and manipulate his massive input locally, accesses a powerful but untrusted prover (cloud service). The verifier must work within the restrictive data streaming paradigm. The prover, who can annotate the data stream as it is read, must not just supply the answer but also convince the verifier of its correctness. Ideally, both the amount of annotation and the space used by the verifier should be sublinear in the relevant input size parameters. A rich theory of such algorithms -- which we call schemes -- has emerged. Prior work has shown how to leverage the prover's power to efficiently solve problems that have no non-trivial standard data stream algorithms. However, while optimal schemes are now known for several basic problems, such optimality holds only for streams whose length is commensurate with the size of the data universe. In contrast, many real-world datasets are relatively sparse, including graphs that contain only O(n^2) edges, and IP traffic streams that contain much fewer than the total number of possible IP addresses, 2^128 in IPv6. We design the first schemes that allow both the annotation and the space usage to be sublinear in the total number of stream updates rather than the size of the data universe. We solve significant problems, including variations of INDEX, SET-DISJOINTNESS, and FREQUENCY-MOMENTS, plus several natural problems on graphs. On the other hand, we give a new lower bound that, for the first time, rules out smooth tradeoffs between annotation and space usage for a specific problem. Our technique brings out new nuances in Merlin-Arthur communication complexity models, and provides a separation between online versions of the MA and AMA models.Comment: 29 pages, 5 table

arXiv.org e-Print Archive

CiteSeerX

Practical Verified Computation with Streaming Interactive Proofs

Author: Justin R. Thaler
Justin R. Thaler
Michael Mitzenmacher
Publication venue: 'Harvard University Botany Libraries'
Publication date: 14/10/2013
Field of study

As the cloud computing paradigm has gained prominence, the need for verifiable computation has grown urgent. Protocols for verifiable computation enable a weak client to outsource difficult computations to a powerful, but untrusted, server. These protocols provide the client with a (probabilistic) guarantee that the server performed the requested computations correctly, without requiring the client to perform the computations herself.Engineering and Applied Science

CiteSeerX

Harvard University - DASH

24th International Conference on Information Modelling and Knowledge Bases

Author
Publication venue: Selbstverlag des Instituts für Informatik, Kiel
Publication date: 01/01/2014
Field of study

In the last three decades information modelling and knowledge bases have become essentially important subjects not only in academic communities related to information systems and computer science but also in the business area where information technology is applied. The series of European – Japanese Conference on Information Modelling and Knowledge Bases (EJC) originally started as a co-operation initiative between Japan and Finland in 1982. The practical operations were then organised by professor Ohsuga in Japan and professors Hannu Kangassalo and Hannu Jaakkola in Finland (Nordic countries). Geographical scope has expanded to cover Europe and also other countries. Workshop characteristic - discussion, enough time for presentations and limited number of participants (50) / papers (30) - is typical for the conference. Suggested topics include, but are not limited to: 1. Conceptual modelling: Modelling and specification languages; Domain-specific conceptual modelling; Concepts, concept theories and ontologies; Conceptual modelling of large and heterogeneous systems; Conceptual modelling of spatial, temporal and biological data; Methods for developing, validating and communicating conceptual models. 2. Knowledge and information modelling and discovery: Knowledge discovery, knowledge representation and knowledge management; Advanced data mining and analysis methods; Conceptions of knowledge and information; Modelling information requirements; Intelligent information systems; Information recognition and information modelling. 3. Linguistic modelling: Models of HCI; Information delivery to users; Intelligent informal querying; Linguistic foundation of information and knowledge; Fuzzy linguistic models; Philosophical and linguistic foundations of conceptual models. 4. Cross-cultural communication and social computing: Cross-cultural support systems; Integration, evolution and migration of systems; Collaborative societies; Multicultural web-based software systems; Intercultural collaboration and support systems; Social computing, behavioral modeling and prediction. 5. Environmental modelling and engineering: Environmental information systems (architecture); Spatial, temporal and observational information systems; Large-scale environmental systems; Collaborative knowledge base systems; Agent concepts and conceptualisation; Hazard prediction, prevention and steering systems. 6. Multimedia data modelling and systems: Modelling multimedia information and knowledge; Contentbased multimedia data management; Content-based multimedia retrieval; Privacy and context enhancing technologies; Semantics and pragmatics of multimedia data; Metadata for multimedia information systems. Overall we received 56 submissions. After careful evaluation, 16 papers have been selected as long paper, 17 papers as short papers, 5 papers as position papers, and 3 papers for presentation of perspective challenges. We thank all colleagues for their support of this issue of the EJC conference, especially the program committee, the organising committee, and the programme coordination team. The long and the short papers presented in the conference are revised after the conference and published in the Series of “Frontiers in Artificial Intelligence” by IOS Press (Amsterdam). The books “Information Modelling and Knowledge Bases” are edited by the Editing Committee of the conference. We believe that the conference will be productive and fruitful in the advance of research and application of information modelling and knowledge bases. Bernhard Thalheim Hannu Jaakkola Yasushi Kiyok

MACAU: Open Access Repository of Kiel University