Search CORE

1,174 research outputs found

Individual Privacy vs Population Privacy: Learning to Attack Anonymization

Author: Cormode Graham
Publication venue
Publication date: 10/11/2010
Field of study

Over the last decade there have been great strides made in developing techniques to compute functions privately. In particular, Differential Privacy gives strong promises about conclusions that can be drawn about an individual. In contrast, various syntactic methods for providing privacy (criteria such as kanonymity and l-diversity) have been criticized for still allowing private information of an individual to be inferred. In this report, we consider the ability of an attacker to use data meeting privacy definitions to build an accurate classifier. We demonstrate that even under Differential Privacy, such classifiers can be used to accurately infer "private" attributes in realistic data. We compare this to similar approaches for inferencebased attacks on other forms of anonymized data. We place these attacks on the same scale, and observe that the accuracy of inference of private attributes for Differentially Private data and l-diverse data can be quite similar

arXiv.org e-Print Archive

CiteSeerX

One Table to Count Them All: Parallel Frequency Estimation on Single-Board Computers

Author: G Cormode
G Cormode
G Zipf
Graham Cormode
M Cafaro
M Charikar
M Thorup
Mihai Pǎtraşcu
S Das
S Muthukrishnan
Publication venue
Publication date: 02/03/2019
Field of study

Sketches are probabilistic data structures that can provide approximate results within mathematically proven error bounds while using orders of magnitude less memory than traditional approaches. They are tailored for streaming data analysis on architectures even with limited memory such as single-board computers that are widely exploited for IoT and edge computing. Since these devices offer multiple cores, with efficient parallel sketching schemes, they are able to manage high volumes of data streams. However, since their caches are relatively small, a careful parallelization is required. In this work, we focus on the frequency estimation problem and evaluate the performance of a high-end server, a 4-core Raspberry Pi and an 8-core Odroid. As a sketch, we employed the widely used Count-Min Sketch. To hash the stream in parallel and in a cache-friendly way, we applied a novel tabulation approach and rearranged the auxiliary tables into a single one. To parallelize the process with performance, we modified the workflow and applied a form of buffering between hash computations and sketch updates. Today, many single-board computers have heterogeneous processors in which slow and fast cores are equipped together. To utilize all these cores to their full potential, we proposed a dynamic load-balancing mechanism which significantly increased the performance of frequency estimation.Comment: 12 pages, 4 figures, 3 algorithms, 1 table, submitted to EuroPar'1

arXiv.org e-Print Archive

Crossref

Sabanci University Research Database

Tight Lower Bound for Comparison-Based Quantile Summaries

Author: Cormode Graham
Veselý Pavel
Publication venue
Publication date: 16/01/2020
Field of study

Quantiles, such as the median or percentiles, provide concise and useful information about the distribution of a collection of items, drawn from a totally ordered universe. We study data structures, called quantile summaries, which keep track of all quantiles, up to an error of at most

\varepsilon

. That is, an

\varepsilon

-approximate quantile summary first processes a stream of items and then, given any quantile query

0\le \phi\le 1

, returns an item from the stream, which is a

\phi'

-quantile for some

\phi' = \phi \pm \varepsilon

. We focus on comparison-based quantile summaries that can only compare two items and are otherwise completely oblivious of the universe. The best such deterministic quantile summary to date, due to Greenwald and Khanna (SIGMOD '01), stores at most

O(\frac{1}{\varepsilon}\cdot \log \varepsilon N)

items, where

N

is the number of items in the stream. We prove that this space bound is optimal by showing a matching lower bound. Our result thus rules out the possibility of constructing a deterministic comparison-based quantile summary in space

f(\varepsilon)\cdot o(\log N)

, for any function

f

that does not depend on

N

. As a corollary, we improve the lower bound for biased quantiles, which provide a stronger, relative-error guarantee of

(1\pm \varepsilon)\cdot \phi

, and for other related computational tasks.Comment: 20 pages, 2 figures, major revison of the construction (Sec. 3) and some other parts of the pape

arXiv.org e-Print Archive

Crossref

Warwick Research Archives Portal Repository

An Improved Interactive Streaming Algorithm for the Distinct Elements Problem

Author: A. Chakrabarti
A. Razborov
A. Razborov
C. Lund
D.M. Kane
G. Cormode
G. Cormode
G. Cormode
H. Klauck
J. Justesen
O. Lachish
S. Goldwasser
T. Gur
Publication venue
Publication date: 01/01/2014
Field of study

The exact computation of the number of distinct elements (frequency moment

F_0

) is a fundamental problem in the study of data streaming algorithms. We denote the length of the stream by

n

where each symbol is drawn from a universe of size

m

. While it is well known that the moments

F_0,F_1,F_2

can be approximated by efficient streaming algorithms, it is easy to see that exact computation of

F_0,F_2

requires space

\Omega(m)

. In previous work, Cormode et al. therefore considered a model where the data stream is also processed by a powerful helper, who provides an interactive proof of the result. They gave such protocols with a polylogarithmic number of rounds of communication between helper and verifier for all functions in NC. This number of rounds

\left(O(\log^2 m) \;\text{in the case of} \;F_0 \right)

can quickly make such protocols impractical. Cormode et al. also gave a protocol with

\log m +1

rounds for the exact computation of

F_0

where the space complexity is

O\left(\log m \log n+\log^2 m\right)

but the total communication

O\left(\sqrt{n}\log m\left(\log n+ \log m \right)\right)

. They managed to give

\log m

round protocols with

\operatorname{polylog}(m,n)

complexity for many other interesting problems including

F_2

, Inner product, and Range-sum, but computing

F_0

exactly with polylogarithmic space and communication and

O(\log m)

rounds remained open. In this work, we give a streaming interactive protocol with

\log m

rounds for exact computation of

F_0

using

O\left(\log m \left(\,\log n + \log m \log\log m\,\right)\right)

bits of space and the communication is

O\left( \log m \left(\,\log n +\log^3 m (\log\log m)^2 \,\right)\right)

. The update time of the verifier per symbol received is

O(\log^2 m)

.Comment: Submitted to ICALP 201

arXiv.org e-Print Archive

Crossref

First Author Advantage: Citation Labeling in Research

Author: Cormode Graham
Muthukrishnan S.
Yan Jinyun
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2013
Field of study

Citations among research papers, and the networks they form, are the primary object of study in scientometrics. The act of making a citation reflects the citer's knowledge of the related literature, and of the work being cited. We aim to gain insight into this process by studying citation keys: user-chosen labels to identify a cited work. Our main observation is that the first listed author is disproportionately represented in such labels, implying a strong mental bias towards the first author.Comment: Computational Scientometrics: Theory and Applications at The 22nd CIKM 201

arXiv.org e-Print Archive

CiteSeerX

Crossref

Iterative hessian sketch in input sparsity time

Author: Cormode Graham
Dickens Charlie
Publication venue
Publication date
Field of study

Warwick Research Archives Portal Repository