25,691 research outputs found
Scalable Techniques for Similarity Search
Document similarity is similar to the nearest neighbour problem and has applications in various domains. In order to determine the similarity / dissimilarity of the documents first they need to be converted into sets containing shingles. Each document is converted into k-shingles, k being the length of each shingle. The similarity is calculated using Jaccard distance between sets and output into a characteristic matrix, the complexity to parse this matrix is significantly high especially when the sets are large. In this project we explore various approaches such as Min hashing, LSH & Bloom Filter to decrease the matrix size and to improve the time complexity. Min hashing creates a signature matrix which significantly smaller compared to a characteristic matrix. In this project we will look into Min-Hashing implementation, pros and cons. Also we will explore Locality Sensitive Hashing, Bloom Filters and their advantages
Towards an Information Theoretic Analysis of Searchable Encryption (Extended Version)
Searchable encryption is a technique that allows a client to store
data in encrypted form on a curious server, such that data can be
retrieved while leaking a minimal amount of information to the
server. Many searchable encryption schemes have been proposed and
proved secure in their own computational model. In this paper we
propose a generic model for the analysis of searchable
encryptions. We then identify the security parameters of
searchable encryption schemes and prove information theoretical
bounds on the security of the parameters. We argue that perfectly
secure searchable encryption schemes cannot be efficient. We
classify the seminal schemes in two categories: the schemes that
leak information upfront during the storage phase, and schemes
that leak some information at every search. This helps designers
to choose the right scheme for an application
Revisiting Shared Data Protection Against Key Exposure
This paper puts a new light on secure data storage inside distributed
systems. Specifically, it revisits computational secret sharing in a situation
where the encryption key is exposed to an attacker. It comes with several
contributions: First, it defines a security model for encryption schemes, where
we ask for additional resilience against exposure of the encryption key.
Precisely we ask for (1) indistinguishability of plaintexts under full
ciphertext knowledge, (2) indistinguishability for an adversary who learns: the
encryption key, plus all but one share of the ciphertext. (2) relaxes the
"all-or-nothing" property to a more realistic setting, where the ciphertext is
transformed into a number of shares, such that the adversary can't access one
of them. (1) asks that, unless the user's key is disclosed, noone else than the
user can retrieve information about the plaintext. Second, it introduces a new
computationally secure encryption-then-sharing scheme, that protects the data
in the previously defined attacker model. It consists in data encryption
followed by a linear transformation of the ciphertext, then its fragmentation
into shares, along with secret sharing of the randomness used for encryption.
The computational overhead in addition to data encryption is reduced by half
with respect to state of the art. Third, it provides for the first time
cryptographic proofs in this context of key exposure. It emphasizes that the
security of our scheme relies only on a simple cryptanalysis resilience
assumption for blockciphers in public key mode: indistinguishability from
random, of the sequence of diferentials of a random value. Fourth, it provides
an alternative scheme relying on the more theoretical random permutation model.
It consists in encrypting with sponge functions in duplex mode then, as before,
secret-sharing the randomness
- …