438 research outputs found

    Towards a secure and efficient search over encrypted cloud data

    Get PDF
    Includes bibliographical references.2016 Summer.Cloud computing enables new types of services where the computational and network resources are available online through the Internet. One of the most popular services of cloud computing is data outsourcing. For reasons of cost and convenience, public as well as private organizations can now outsource their large amounts of data to the cloud and enjoy the benefits of remote storage and management. At the same time, confidentiality of remotely stored data on untrusted cloud server is a big concern. In order to reduce these concerns, sensitive data, such as, personal health records, emails, income tax and financial reports, are usually outsourced in encrypted form using well-known cryptographic techniques. Although encrypted data storage protects remote data from unauthorized access, it complicates some basic, yet essential data utilization services such as plaintext keyword search. A simple solution of downloading the data, decrypting and searching locally is clearly inefficient since storing data in the cloud is meaningless unless it can be easily searched and utilized. Thus, cloud services should enable efficient search on encrypted data to provide the benefits of a first-class cloud computing environment. This dissertation is concerned with developing novel searchable encryption techniques that allow the cloud server to perform multi-keyword ranked search as well as substring search incorporating position information. We present results that we have accomplished in this area, including a comprehensive evaluation of existing solutions and searchable encryption schemes for ranked search and substring position search

    Oblivious Substring Search with Updates

    Get PDF
    We are the first to address the problem of efficient oblivious substring search over encrypted data supporting updates. Our two new protocols SA-ORAM and ST-ORAM obliviously search for substrings in an outsourced set of n encrypted strings. Both protocols are efficient, requiring communication complexity that is only poly-logarithmic in n. Compared to a straightforward solution for substring search using recent “oblivious data structures” [30], we demonstrate that our tailored solutions improve communication complexity by a factor of logn. The idea behind SA-ORAM and ST-ORAM is to employ a new, hierarchical ORAM tree structure that takes advantage of data dependency and optimizes the size of ORAM blocks and tree height. Based on oblivious suffix arrays, SA-ORAM targets efficiency, yet does not allow updates to the outsourced set of strings. ST-ORAM, based on oblivious suffix trees, allows updates at the additional communications cost of a factor of loglogn. We implement and benchmark SA-ORAM to show its feasibility for practical deployments: even for huge datasets of 2^40 strings, an oblivious substring search can be performed with only hundreds of KBytes communication cost

    Search and Retrieval in Massive Data Collections

    Get PDF
    The main goal of this research is to produce a novel and efficient searching application by means of best match and proximity searching with particular application to very large numeric and textual data stores. In today’s world a huge amount of information is produced. Almost every part of our society is touched by systems that collect, store and analyse data. As an example I mention the case of scientific instrumentation: new sensors capture massive amounts of information (e.g. new telescopes acquiring data from different regions of the spectrum). Description of biological and chemical interactions also produce complex and large amounts of data. It is in this context that a big challenge for current analysis algorithms is presented. Many of the traditional methods for data analysis do not scale well in massive data sets nor in very high dimensional spaces. In this work I introduce a novel (ultrametric) distance called Baire based on the longest common prefix and show how it can be used to produce clusters through grouping data in ’bins’ taking linear or O(n) computational time. Furthermore, it follows that this distance can be strictly fitted to a hierarchy tree. This is a property that proves very useful for classifying, storing, accessing and retrieving information. I go further to apply this methodology on data from different scientific areas such as astronomy and chemistry to create groups or clusters. Additionally I apply this method to document sets for clustering and retrieval. In particular, I look into the new area of enterprise search to propose a new method to support scalable search and clustering

    Secure and Efficient Comparisons between Untrusted Parties

    Get PDF
    A vast number of online services is based on users contributing their personal information. Examples are manifold, including social networks, electronic commerce, sharing websites, lodging platforms, and genealogy. In all cases user privacy depends on a collective trust upon all involved intermediaries, like service providers, operators, administrators or even help desk staff. A single adversarial party in the whole chain of trust voids user privacy. Even more, the number of intermediaries is ever growing. Thus, user privacy must be preserved at every time and stage, independent of the intrinsic goals any involved party. Furthermore, next to these new services, traditional offline analytic systems are replaced by online services run in large data centers. Centralized processing of electronic medical records, genomic data or other health-related information is anticipated due to advances in medical research, better analytic results based on large amounts of medical information and lowered costs. In these scenarios privacy is of utmost concern due to the large amount of personal information contained within the centralized data. We focus on the challenge of privacy-preserving processing on genomic data, specifically comparing genomic sequences. The problem that arises is how to efficiently compare private sequences of two parties while preserving confidentiality of the compared data. It follows that the privacy of the data owner must be preserved, which means that as little information as possible must be leaked to any party participating in the comparison. Leakage can happen at several points during a comparison. The secured inputs for the comparing party might leak some information about the original input, or the output might leak information about the inputs. In the latter case, results of several comparisons can be combined to infer information about the confidential input of the party under observation. Genomic sequences serve as a use-case, but the proposed solutions are more general and can be applied to the generic field of privacy-preserving comparison of sequences. The solution should be efficient such that performing a comparison yields runtimes linear in the length of the input sequences and thus producing acceptable costs for a typical use-case. To tackle the problem of efficient, privacy-preserving sequence comparisons, we propose a framework consisting of three main parts. a) The basic protocol presents an efficient sequence comparison algorithm, which transforms a sequence into a set representation, allowing to approximate distance measures over input sequences using distance measures over sets. The sets are then represented by an efficient data structure - the Bloom filter -, which allows evaluation of certain set operations without storing the actual elements of the possibly large set. This representation yields low distortion for comparing similar sequences. Operations upon the set representation are carried out using efficient, partially homomorphic cryptographic systems for data confidentiality of the inputs. The output can be adjusted to either return the actual approximated distance or the result of an in-range check of the approximated distance. b) Building upon this efficient basic protocol we introduce the first mechanism to reduce the success of inference attacks by detecting and rejecting similar queries in a privacy-preserving way. This is achieved by generating generalized commitments for inputs. This generalization is done by treating inputs as messages received from a noise channel, upon which error-correction from coding theory is applied. This way similar inputs are defined as inputs having a hamming distance of their generalized inputs below a certain predefined threshold. We present a protocol to perform a zero-knowledge proof to assess if the generalized input is indeed a generalization of the actual input. Furthermore, we generalize a very efficient inference attack on privacy-preserving sequence comparison protocols and use it to evaluate our inference-control mechanism. c) The third part of the framework lightens the computational load of the client taking part in the comparison protocol by presenting a compression mechanism for partially homomorphic cryptographic schemes. It reduces the transmission and storage overhead induced by the semantically secure homomorphic encryption schemes, as well as encryption latency. The compression is achieved by constructing an asymmetric stream cipher such that the generated ciphertext can be converted into a ciphertext of an associated homomorphic encryption scheme without revealing any information about the plaintext. This is the first compression scheme available for partially homomorphic encryption schemes. Compression of ciphertexts of fully homomorphic encryption schemes are several orders of magnitude slower at the conversion from the transmission ciphertext to the homomorphically encrypted ciphertext. Indeed our compression scheme achieves optimal conversion performance. It further allows to generate keystreams offline and thus supports offloading to trusted devices. This way transmission-, storage- and power-efficiency is improved. We give security proofs for all relevant parts of the proposed protocols and algorithms to evaluate their security. A performance evaluation of the core components demonstrates the practicability of our proposed solutions including a theoretical analysis and practical experiments to show the accuracy as well as efficiency of approximations and probabilistic algorithms. Several variations and configurations to detect similar inputs are studied during an in-depth discussion of the inference-control mechanism. A human mitochondrial genome database is used for the practical evaluation to compare genomic sequences and detect similar inputs as described by the use-case. In summary we show that it is indeed possible to construct an efficient and privacy-preserving (genomic) sequences comparison, while being able to control the amount of information that leaves the comparison. To the best of our knowledge we also contribute to the field by proposing the first efficient privacy-preserving inference detection and control mechanism, as well as the first ciphertext compression system for partially homomorphic cryptographic systems

    Storage Efficient Substring Searchable Symmetric Encryption

    Get PDF
    We address the problem of substring searchable encryption. A single user produces a big stream of data and later on wants to learn the positions in the string that some patterns occur. Although current techniques exploit auxiliary data structures to achieve efficient substring search on the server side, the cost at the user side may be prohibitive. We revisit the work of substring searchable encryption in order to reduce the storage cost of auxiliary data structures. Our solution entails a suffix array based index design, which allows optimal storage cost O (n) with small hidden factor at the size of the string n. We analyze the security of the protocol in the real ideal framework. Moreover, we implemented our scheme and the state of the art protocol [7] to demonstrate the performance advantage of our solution with precise benchmark results

    Pattern matching encryption, strategic equivalence of range voting and approval voting, and statistical robustness of voting rules

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2013.Cataloged from PDF version of thesis.Includes bibliographical references (p. 119-123).We present new results in the areas of cryptography and voting systems. 1. Pattern matching encryption: We present new, general definitions for queryable encryption schemes - encryption schemes that allow evaluation of private queries on encrypted data without performing full decryption. We construct an efficient queryable encryption scheme supporting pattern matching queries, based on suffix trees. Storage and communication complexity are comparable to those for (unencrypted) suffix trees. The construction is based only on symmetric-key primitives, so it is practical. 2. Strategic equivalence of range voting and approval voting: We study strategic voting in the context of range voting in a formal model. We show that under general conditions, as the number of voters becomes large, strategic range-voting becomes equivalent to approval voting. We propose beta distributions as a new and interesting way to model voter's subjective information about other votes. 3. Statistical robustness of voting rules: We introduce a new notion called "statistical robustness" for voting rules: a voting rule is statistically robust if, for any profile of votes, the most likely winner of a sample of the profile is the winner of the complete profile. We show that plurality is the only interesting voting rule that is statistically robust; approval voting (perhaps surprisingly) and other common voting rules are not statistically robust.by Emily Shen.Ph.D

    Hardware-Assisted Secure Computation

    Get PDF
    The theory community has worked on Secure Multiparty Computation (SMC) for more than two decades, and has produced many protocols for many settings. One common thread in these works is that the protocols cannot use a Trusted Third Party (TTP), even though this is conceptually the simplest and most general solution. Thus, current protocols involve only the direct players---we call such protocols self-reliant. They often use blinded boolean circuits, which has several sources of overhead, some due to the circuit representation and some due to the blinding. However, secure coprocessors like the IBM 4758 have actual security properties similar to ideal TTPs. They also have little RAM and a slow CPU.We call such devices Tiny TTPs. The availability of real tiny TTPs opens the door for a different approach to SMC problems. One major challenge with this approach is how to execute large programs on large inputs using the small protected memory of a tiny TTP, while preserving the trust properties that an ideal TTP provides. In this thesis we have investigated the use of real TTPs to help with the solution of SMC problems. We start with the use of such TTPs to solve the Private Information Retrieval (PIR) problem, which is one important instance of SMC. Our implementation utilizes a 4758. The rest of the thesis is targeted at general SMC. Our SMC system, Faerieplay, moves some functionality into a tiny TTP, and thus avoids the blinded circuit overhead. Faerieplay consists of a compiler from high-level code to an arithmetic circuit with special gates for efficient indirect array access, and a virtual machine to execute this circuit on a tiny TTP while maintaining the typical SMC trust properties. We report on Faerieplay\u27s security properties, the specification of its components, and our implementation and experiments. These include comparisons with the Fairplay circuit-based two-party system, and an implementation of the Dijkstra graph shortest path algorithm. We also provide an implementation of an oblivious RAM which supports similar tiny TTP-based SMC functionality but using a standard RAM program. Performance comparisons show Faerieplay\u27s circuit approach to be considerably faster, at the expense of a more constrained programming environment when targeting a circuit

    Detection and Identification of Software Encryption Solutions in NT-based Microsoft Windows Operating Systems

    Get PDF
    As encrypted information is very difficult or impossible to reconstruct, there are many situations in which it is critical to detect the presence of encryption software before a computer is shut down. Currently there is no solution that reliably identifies installed encryption software.;For this investigation, thirty encryption software products for Microsoft Windows based on the NT-kernel have been identified and investigated. Operating system dependent factors such as registry, file attributes, operating system attributes, process list analysis and independent factors such as file headers, keyword search, Master Boot Record analysis as well as hashing of software components were investigated and allow the identification of these programs. The most reliable detection rate is achieved through a combination of the aforementioned factors
    • …
    corecore