365 research outputs found

    Using materialized views for answering graph pattern queries

    Get PDF
    Discovering patterns in graphs by evaluating graph pattern queries involving direct (edge-to-edge mapping) and reachability (edge-to-path mapping) relationships under homomorphisms on data graphs has been extensively studied. Previous studies have aimed to reduce the evaluation time of graph pattern queries due to the potentially numerous matches on large data graphs. In this work, the concept of the summary graph is developed to improve the evaluation of tree pattern queries and graph pattern queries. The summary graph first filters out candidate matches which violate certain reachability constraints, and then finds local matches of query edges. This reduces redundancy in the representation of the query results and allows for computation sharing during the generation of these results. Methods using materialized graph pattern views are developed to improve the efficiency of graph pattern query evaluation. A view is materialized as a summary graph, which compactly records all the homomorphisms of the view to the data graph. View usability is characterized in terms of query edge coverage to provide necessary and sufficient conditions for answering queries using views, and algorithms are developed for determining view usability and for summary graph construction. Experimental evaluation shows that the methods using summary graphs and its related concepts outperform previous state-of-the-art approaches. It also demonstrates that the view materialization method outperforms, by several orders of magnitude, a state-of-the-art approach which does not use materialized views, and substantially improves upon its scalability

    Secure Graph Database Search with Oblivious Filter

    Get PDF
    With the emerging popularity of cloud computing, the problem of how to query over cryptographically-protected data has been widely studied. However, most existing works focus on querying protected relational databases, few work has shown interests in graph databases. In this paper, we first investigate and summarize two single-instruction queries, namely Graph Pattern Matching (GPM) and Graph Navigation (GN). Then we follow their design intuitions and leverage secure Multi-Party Computation (MPC) to implement their functionalities in a privacy-preserving manner. Moreover, we propose a general framework for processing multi-instruction query on secret-shared graph databases and present a novel cryptographic primitive Oblivious Filter (OF) as a core building block. Nevertheless, we formalize the problem of OF and present its constructions using homomorphic encryption. Finally, we conduct an empirical study to evaluate the efficiency of our proposed OF protocol

    Doctor of Philosophy

    Get PDF
    dissertationLinked data are the de-facto standard in publishing and sharing data on the web. To date, we have been inundated with large amounts of ever-increasing linked data in constantly evolving structures. The proliferation of the data and the need to access and harvest knowledge from distributed data sources motivate us to revisit several classic problems in query processing and query optimization. The problem of answering queries over views is commonly encountered in a number of settings, including while enforcing security policies to access linked data, or when integrating data from disparate sources. We approach this problem by efficiently rewriting queries over the views to equivalent queries over the underlying linked data, thus avoiding the costs entailed by view materialization and maintenance. An outstanding problem of query rewriting is the number of rewritten queries is exponential to the size of the query and the views, which motivates us to study problem of multiquery optimization in the context of linked data. Our solutions are declarative and make no assumption for the underlying storage, i.e., being store-independent. Unlike relational and XML data, linked data are schema-less. While tracking the evolution of schema for linked data is hard, keyword search is an ideal tool to perform data integration. Existing works make crippling assumptions for the data and hence fall short in handling massive linked data with tens to hundreds of millions of facts. Our study for keyword search on linked data brought together the classical techniques in the literature and our novel ideas, which leads to much better query efficiency and quality of the results. Linked data also contain rich temporal semantics. To cope with the ever-increasing data, we have investigated how to partition and store large temporal or multiversion linked data for distributed and parallel computation, in an effort to achieve load-balancing to support scalable data analytics for massive linked data

    Privacy-Preserving Shortest Path Computation

    Full text link
    Navigation is one of the most popular cloud computing services. But in virtually all cloud-based navigation systems, the client must reveal her location and destination to the cloud service provider in order to learn the fastest route. In this work, we present a cryptographic protocol for navigation on city streets that provides privacy for both the client's location and the service provider's routing data. Our key ingredient is a novel method for compressing the next-hop routing matrices in networks such as city street maps. Applying our compression method to the map of Los Angeles, for example, we achieve over tenfold reduction in the representation size. In conjunction with other cryptographic techniques, this compressed representation results in an efficient protocol suitable for fully-private real-time navigation on city streets. We demonstrate the practicality of our protocol by benchmarking it on real street map data for major cities such as San Francisco and Washington, D.C.Comment: Extended version of NDSS 2016 pape

    Secure Outsourced Computation on Encrypted Data

    Get PDF
    Homomorphic encryption (HE) is a promising cryptographic technique that supports computations on encrypted data without requiring decryption first. This ability allows sensitive data, such as genomic, financial, or location data, to be outsourced for evaluation in a resourceful third-party such as the cloud without compromising data privacy. Basic homomorphic primitives support addition and multiplication on ciphertexts. These primitives can be utilized to represent essential computations, such as logic gates, which subsequently can support more complex functions. We propose the construction of efficient cryptographic protocols as building blocks (e.g., equality, comparison, and counting) that are commonly used in data analytics and machine learning. We explore the use of these building blocks in two privacy-preserving applications. One application leverages our secure prefix matching algorithm, which builds on top of the equality operation, to process geospatial queries on encrypted locations. The other applies our secure comparison protocol to perform conditional branching in private evaluation of decision trees. There are many outsourced computations that require joint evaluation on private data owned by multiple parties. For example, Genome-Wide Association Study (GWAS) is becoming feasible because of the recent advances of genome sequencing technology. Due to the sensitivity of genomic data, this data is encrypted using different keys possessed by different data owners. Computing on ciphertexts encrypted with multiple keys is a non-trivial task. Current solutions often require a joint key setup before any computation such as in threshold HE or incur large ciphertext size (at best, grows linearly in the number of involved keys) such as in multi-key HE. We propose a hybrid approach that combines the advantages of threshold and multi-key HE to support computations on ciphertexts encrypted with different keys while vastly reducing ciphertext size. Moreover, we propose the SparkFHE framework to support large-scale secure data analytics in the Cloud. SparkFHE integrates Apache Spark with Fully HE to support secure distributed data analytics and machine learning and make two novel contributions: (1) enabling Spark to perform efficient computation on large datasets while preserving user privacy, and (2) accelerating intensive homomorphic computation through parallelization of tasks across clusters of computing nodes. To our best knowledge, SparkFHE is the first addressing these two needs simultaneously

    Distributed Query Execution With Strong Privacy Guarantees

    Get PDF
    As the Internet evolves, we find more applications that involve data originating from multiple sources, and spanning machines located all over the world. Such wide distribution of sensitive data increases the risk of information leakage, and may sometimes inhibit useful applications. For instance, even though banks could share data to detect systemic threats in the US financial network, they hesitate to do so because it can leak business secrets to their competitors. Encryption is an effective way to preserve data confidentiality, but eliminates all processing capabilities. Some approaches enable processing on encrypted data, but they usually have security weaknesses, such as data leakage through side-channels, or require expensive cryptographic computations. In this thesis, we present techniques that address the above limitations. First, we present an efficient symmetric homomorphic encryption scheme, which can aggregate encrypted data at an unprecedented scale. Second, we present a way to efficiently perform secure computations on distributed graphs. To accomplish this, we express large computations as a series of small, parallelizable vertex programs, whose state is safely transferred between vertices using a new cryptographic protocol. Finally, we propose using differential privacy to strengthen the security of trusted processors: noise is added to the side-channels, so that no adversary can extract useful information about individual users. Our experimental results suggest that the presented techniques achieve order-of-magnitude performance improvements over previous approaches, in scenarios such as the business intelligence application of a large corporation and the detection of systemic threats in the US financial network
    • …
    corecore