25,691 research outputs found

    Scalable Techniques for Similarity Search

    Get PDF
    Document similarity is similar to the nearest neighbour problem and has applications in various domains. In order to determine the similarity / dissimilarity of the documents first they need to be converted into sets containing shingles. Each document is converted into k-shingles, k being the length of each shingle. The similarity is calculated using Jaccard distance between sets and output into a characteristic matrix, the complexity to parse this matrix is significantly high especially when the sets are large. In this project we explore various approaches such as Min hashing, LSH & Bloom Filter to decrease the matrix size and to improve the time complexity. Min hashing creates a signature matrix which significantly smaller compared to a characteristic matrix. In this project we will look into Min-Hashing implementation, pros and cons. Also we will explore Locality Sensitive Hashing, Bloom Filters and their advantages

    Towards an Information Theoretic Analysis of Searchable Encryption (Extended Version)

    Get PDF
    Searchable encryption is a technique that allows a client to store data in encrypted form on a curious server, such that data can be retrieved while leaking a minimal amount of information to the server. Many searchable encryption schemes have been proposed and proved secure in their own computational model. In this paper we propose a generic model for the analysis of searchable encryptions. We then identify the security parameters of searchable encryption schemes and prove information theoretical bounds on the security of the parameters. We argue that perfectly secure searchable encryption schemes cannot be efficient. We classify the seminal schemes in two categories: the schemes that leak information upfront during the storage phase, and schemes that leak some information at every search. This helps designers to choose the right scheme for an application

    Revisiting Shared Data Protection Against Key Exposure

    Full text link
    This paper puts a new light on secure data storage inside distributed systems. Specifically, it revisits computational secret sharing in a situation where the encryption key is exposed to an attacker. It comes with several contributions: First, it defines a security model for encryption schemes, where we ask for additional resilience against exposure of the encryption key. Precisely we ask for (1) indistinguishability of plaintexts under full ciphertext knowledge, (2) indistinguishability for an adversary who learns: the encryption key, plus all but one share of the ciphertext. (2) relaxes the "all-or-nothing" property to a more realistic setting, where the ciphertext is transformed into a number of shares, such that the adversary can't access one of them. (1) asks that, unless the user's key is disclosed, noone else than the user can retrieve information about the plaintext. Second, it introduces a new computationally secure encryption-then-sharing scheme, that protects the data in the previously defined attacker model. It consists in data encryption followed by a linear transformation of the ciphertext, then its fragmentation into shares, along with secret sharing of the randomness used for encryption. The computational overhead in addition to data encryption is reduced by half with respect to state of the art. Third, it provides for the first time cryptographic proofs in this context of key exposure. It emphasizes that the security of our scheme relies only on a simple cryptanalysis resilience assumption for blockciphers in public key mode: indistinguishability from random, of the sequence of diferentials of a random value. Fourth, it provides an alternative scheme relying on the more theoretical random permutation model. It consists in encrypting with sponge functions in duplex mode then, as before, secret-sharing the randomness
    corecore