319 research outputs found
Max-stable sketches: estimation of Lp-norms, dominance norms and point queries for non-negative signals
Max-stable random sketches can be computed efficiently on fast streaming
positive data sets by using only sequential access to the data. They can be
used to answer point and Lp-norm queries for the signal. There is an intriguing
connection between the so-called p-stable (or sum-stable) and the max-stable
sketches. Rigorous performance guarantees through error-probability estimates
are derived and the algorithmic implementation is discussed
Privacy via the Johnson-Lindenstrauss Transform
Suppose that party A collects private information about its users, where each
user's data is represented as a bit vector. Suppose that party B has a
proprietary data mining algorithm that requires estimating the distance between
users, such as clustering or nearest neighbors. We ask if it is possible for
party A to publish some information about each user so that B can estimate the
distance between users without being able to infer any private bit of a user.
Our method involves projecting each user's representation into a random,
lower-dimensional space via a sparse Johnson-Lindenstrauss transform and then
adding Gaussian noise to each entry of the lower-dimensional representation. We
show that the method preserves differential privacy---where the more privacy is
desired, the larger the variance of the Gaussian noise. Further, we show how to
approximate the true distances between users via only the lower-dimensional,
perturbed data. Finally, we consider other perturbation methods such as
randomized response and draw comparisons to sketch-based methods. While the
goal of releasing user-specific data to third parties is more broad than
preserving distances, this work shows that distance computations with privacy
is an achievable goal.Comment: 24 page
Bloom Filters in Adversarial Environments
Many efficient data structures use randomness, allowing them to improve upon
deterministic ones. Usually, their efficiency and correctness are analyzed
using probabilistic tools under the assumption that the inputs and queries are
independent of the internal randomness of the data structure. In this work, we
consider data structures in a more robust model, which we call the adversarial
model. Roughly speaking, this model allows an adversary to choose inputs and
queries adaptively according to previous responses. Specifically, we consider a
data structure known as "Bloom filter" and prove a tight connection between
Bloom filters in this model and cryptography.
A Bloom filter represents a set of elements approximately, by using fewer
bits than a precise representation. The price for succinctness is allowing some
errors: for any it should always answer `Yes', and for any it should answer `Yes' only with small probability.
In the adversarial model, we consider both efficient adversaries (that run in
polynomial time) and computationally unbounded adversaries that are only
bounded in the number of queries they can make. For computationally bounded
adversaries, we show that non-trivial (memory-wise) Bloom filters exist if and
only if one-way functions exist. For unbounded adversaries we show that there
exists a Bloom filter for sets of size and error , that is
secure against queries and uses only
bits of memory. In comparison, is the best
possible under a non-adaptive adversary
Strong key derivation from noisy sources
A shared cryptographic key enables strong authentication. Candidate sources for creating such a shared key include biometrics and physically unclonable functions. However, these sources come with a substantial problem: noise in repeated readings.
A fuzzy extractor produces a stable key from a noisy source. It consists of two stages. At enrollment time, the generate algorithm produces a key from an initial reading of the source. At authentication time, the reproduce algorithm takes a repeated but noisy reading of the source, yielding the same key when the two readings are close. For many sources of practical importance, traditional fuzzy extractors provide no meaningful security guarantee.
This dissertation improves key derivation from noisy sources. These improvements stem from three observations about traditional fuzzy extractors.
First, the only property of a source that standard fuzzy extractors use is the entropy in the original reading. We observe that additional structural information about the source can facilitate key derivation.
Second, most fuzzy extractors work by first recovering the initial reading from the noisy reading (known as a secure sketch). This approach imposes harsh limitations on the length of the derived key. We observe that it is possible to produce a consistent key without recovering the original reading of the source.
Third, traditional fuzzy extractors provide information-theoretic security. However, security against computationally bounded adversaries is sufficient. We observe fuzzy extractors providing computational security can overcome limitations of traditional approaches.
The above observations are supported by negative results and constructions. As an example, we combine all three observations to construct a fuzzy extractor achieving properties that have eluded prior approaches. The construction remains secure even when the initial enrollment phase is repeated multiple times with noisy readings. Furthermore, for many practical sources, reliability demands that the tolerated noise is larger than the entropy of the original reading. The construction provides security for sources of this type by utilizing additional source structure, producing a consistent key without recovering the original reading, and providing computational security
Functionally Private Approximations of Negligibly-Biased Estimators
We study functionally private approximations.
An approximation function is {em functionally private} with respect to
if, for any input , reveals no more information about than
.
Our main result states that a function admits an efficiently-computable
functionally private approximation if there exists an efficiently-computable
and negligibly-biased estimator for .
Contrary to previous generic results, our theorem is more general and
has a wider application reach.We provide two distinct applications of the above result to demonstrate its flexibility.
In the data stream model, we provide a functionally private approximation to the
-norm estimation problem, a quintessential application in streaming, using only
polylogarithmic space in the input size.
The privacy guarantees rely on the use of pseudo-random {em
functions} (PRF) (a stronger cryptographic notion than pseudo-random
generators) of which can be based on common cryptographic assumptions.The application of PRFs in this context appears to be novel and we expect other results to follow suit.Moreover, this is the first known functionally private streaming result for {em any} problem.
Our second application result states that every problem in some subclasses of SP of
hard counting problems admit efficient and functionally private approximation protocols.
This result is based on a functionally private approximation for the SDNF
problem (or estimating the number of satisfiable truth assignments to a
Boolean formula in disjunctive normal form), which is an application of our
main theorem and previously known results
The Flajolet-Martin sketch itself preserves differential privacy: private counting with minimal space
https://proceedings.neurips.cc/paper/2020/file/e3019767b1b23f82883c9850356b71d6-Paper.pd
Adaptive learning and cryptography
Significant links exist between cryptography and computational learning theory. Cryptographic functions are the usual method of demonstrating significant intractability results in computational learning theory as they can demonstrate that certain problems are hard in a representation independent sense. On the other hand, hard learning problems have been used to create efficient cryptographic protocols such as authentication schemes, pseudo-random permutations and functions, and even public key encryption schemes.;Learning theory / coding theory also impacts cryptography in that it enables cryptographic primitives to deal with the issues of noise or bias in their inputs. Several different constructions of fuzzy primitives exist, a fuzzy primitive being a primitive which functions correctly even in the presence of noisy , or non-uniform inputs. Some examples of these primitives include error-correcting blockciphers, fuzzy identity based cryptosystems, fuzzy extractors and fuzzy sketches. Error correcting blockciphers combine both encryption and error correction in a single function which results in increased efficiency. Fuzzy identity based encryption allows the decryption of any ciphertext that was encrypted under a close enough identity. Fuzzy extractors and sketches are methods of reliably (re)-producing a uniformly random secret key given an imperfectly reproducible string from a biased source, through a public string that is called the sketch .;While hard learning problems have many qualities which make them useful in constructing cryptographic protocols, such as their inherent error tolerance and simple algebraic structure, it is often difficult to utilize them to construct very secure protocols due to assumptions they make on the learning algorithm. Due to these assumptions, the resulting protocols often do not have security against various types of adaptive adversaries. to help deal with this issue, we further examine the inter-relationships between cryptography and learning theory by introducing the concept of adaptive learning . Adaptive learning is a rather weak form of learning in which the learner is not expected to closely approximate the concept function in its entirety, rather it is only expected to answer a query of the learner\u27s choice about the target. Adaptive learning allows for a much weaker learner than in the standard model, while maintaining the the positive properties of many learning problems in the standard model, a fact which we feel makes problems that are hard to adaptively learn more useful than standard model learning problems in the design of cryptographic protocols. We argue that learning parity with noise is hard to do adaptively and use that assumption to construct a related key secure, efficient MAC as well as an efficient authentication scheme. In addition we examine the security properties of fuzzy sketches and extractors and demonstrate how these properties can be combined by using our related key secure MAC. We go on to demonstrate that our extractor can allow a form of related-key hardening for protocols in that, by affecting how the key for a primitive is stored it renders that protocol immune to related key attacks
Lightweight Techniques for Private Heavy Hitters
This paper presents a new protocol for solving the private heavy-hitters
problem. In this problem, there are many clients and a small set of
data-collection servers. Each client holds a private bitstring. The servers
want to recover the set of all popular strings, without learning anything else
about any client's string. A web-browser vendor, for instance, can use our
protocol to figure out which homepages are popular, without learning any user's
homepage. We also consider the simpler private subset-histogram problem, in
which the servers want to count how many clients hold strings in a particular
set without revealing this set to the clients.
Our protocols use two data-collection servers and, in a protocol run, each
client send sends only a single message to the servers. Our protocols protect
client privacy against arbitrary misbehavior by one of the servers and our
approach requires no public-key cryptography (except for secure channels), nor
general-purpose multiparty computation. Instead, we rely on incremental
distributed point functions, a new cryptographic tool that allows a client to
succinctly secret-share the labels on the nodes of an exponentially large
binary tree, provided that the tree has a single non-zero path. Along the way,
we develop new general tools for providing malicious security in applications
of distributed point functions.
In an experimental evaluation with two servers on opposite sides of the U.S.,
the servers can find the 200 most popular strings among a set of 400,000
client-held 256-bit strings in 54 minutes. Our protocols are highly
parallelizable. We estimate that with 20 physical machines per logical server,
our protocols could compute heavy hitters over ten million clients in just over
one hour of computation.Comment: To appear in IEEE Security & Privacy 202
- …