4,880 research outputs found

    Towards Doubly Efficient Private Information Retrieval

    Get PDF
    Private Information Retrieval (PIR) allows a client to obtain data from a public database without disclosing the locations accessed. Traditionally, the stress is on preserving sublinear work for the client, while the server\u27s work is taken to inevitably be at least linear in the database size. Beimel, Ishai and Malkin (JoC 2004) show PIR schemes where, following a linear-work preprocessing stage, the server\u27s work per query is sublinear in the database size. However, that work only addresses the case of multiple non-colluding servers; the existence of single-server PIR with sublinear server work remained unaddressed. We consider single-server PIR schemes where, following a preprocessing stage in which the server obtains an encoded version of the database and the client obtains a short key, the per-query work of both server and client is polylogarithmic in the database size. We call such schemes doubly efficient. Concentrating on the case where the client\u27s key is secret, we show: - A scheme, based on one-way functions, that works for a bounded number of queries, and where the server storage is linear in the number of queries plus the database size. - A family of schemes for an unbounded number of queries, whose security follows from a corresponding family of new hardness assumption that are related to the hardness of solving a system of noisy linear equations. We also show the insufficiency of a natural approach for obtaining doubly efficient PIR in the setting where the preprocessing is public

    Towards Practical Doubly-Efficient Private Information Retrieval

    Get PDF
    Private information retrieval (PIR) protocols allow clients to access database entries without revealing the queried indices. They have many real-world applications, including privately querying patent-, compromised credential-, and contact databases. While existing PIR protocols that have been implemented perform reasonably well in practice, they all have suboptimal asymptotic complexities. A line of work has explored so-called doubly-efficient PIR (DEPIR), which refers to single-server PIR protocols with optimal asymptotic complexities. Recently, Lin, Mook, and Wichs (STOC 2023) presented the first protocol that completely satisfies the DEPIR constraints and can be rigorously proven secure. Unfortunately, their proposal is purely theoretical in nature. It is even speculated that such protocols are completely impractical, and hence no implementation of any DEPIR protocol exists. In this work, we challenge this assumption. We propose several optimizations for the protocol of Lin, Mook, and Wichs that improve both asymptotic and concrete running times, as well as storage requirements, by orders of magnitude. Furthermore, we implement the resulting protocol and show that for batch queries it outperforms state-of-the-art protocols

    What Storage Access Privacy is Achievable with Small Overhead?

    Get PDF
    Oblivious RAM (ORAM) and private information retrieval (PIR) are classic cryptographic primitives used to hide the access pattern to data whose storage has been outsourced to an untrusted server. Unfortunately, both primitives require considerable overhead compared to plaintext access. For large-scale storage infrastructure with highly frequent access requests, the degradation in response time and the exorbitant increase in resource costs incurred by either ORAM or PIR prevent their usage. In an ideal scenario, a privacy-preserving storage protocols with small overhead would be implemented for these heavily trafficked storage systems to avoid negatively impacting either performance and/or costs. In this work, we study the problem of the best $\mathit{storage\ access\ privacy}thatisachievablewithonly that is achievable with only \mathit{small\ overhead}overplaintextaccess.Toanswerthisquestion,weconsider over plaintext access. To answer this question, we consider \mathit{differential\ privacy\ access}whichisageneralizationofthe which is a generalization of the \mathit{oblivious\ access}securitynotionthatareconsideredbyORAMandPIR.Quitesurprisingly,wepresentstrongevidencethatconstantoverheadstorageschemesmayonlybeachievedwithprivacybudgetsof security notion that are considered by ORAM and PIR. Quite surprisingly, we present strong evidence that constant overhead storage schemes may only be achieved with privacy budgets of \epsilon = \Omega(\log n).WepresentasymptoticallyoptimalconstructionsfordifferentiallyprivatevariantsofbothORAMandPIRwithprivacybudgets. We present asymptotically optimal constructions for differentially private variants of both ORAM and PIR with privacy budgets \epsilon = \Theta(\log n)withonly with only O(1)overhead.Inaddition,weconsideramorecomplexstorageprimitivecalledkeyβˆ’valuestorageinwhichdataisindexedbykeysfromalargeuniverse(asopposedtoconsecutiveintegersinORAMandPIR).Wepresentadifferentiallyprivatekeyβˆ’valuestorageschemewith overhead. In addition, we consider a more complex storage primitive called key-value storage in which data is indexed by keys from a large universe (as opposed to consecutive integers in ORAM and PIR). We present a differentially private key-value storage scheme with \epsilon = \Theta(\log n)and and O(\log\log n)$ overhead. This construction uses a new oblivious, two-choice hashing scheme that may be of independent interest.Comment: To appear at PODS'1

    Counterfactual Estimation and Optimization of Click Metrics for Search Engines

    Full text link
    Optimizing an interactive system against a predefined online metric is particularly challenging, when the metric is computed from user feedback such as clicks and payments. The key challenge is the counterfactual nature: in the case of Web search, any change to a component of the search engine may result in a different search result page for the same query, but we normally cannot infer reliably from search log how users would react to the new result page. Consequently, it appears impossible to accurately estimate online metrics that depend on user feedback, unless the new engine is run to serve users and compared with a baseline in an A/B test. This approach, while valid and successful, is unfortunately expensive and time-consuming. In this paper, we propose to address this problem using causal inference techniques, under the contextual-bandit framework. This approach effectively allows one to run (potentially infinitely) many A/B tests offline from search log, making it possible to estimate and optimize online metrics quickly and inexpensively. Focusing on an important component in a commercial search engine, we show how these ideas can be instantiated and applied, and obtain very promising results that suggest the wide applicability of these techniques

    Efficient construction of metadata-enhanced web corpora

    Get PDF
    International audienceMetadata extraction is known to be a problem in general-purpose Web corpora, and so is extensive crawling with little yield. The contributions of this paper are threefold: a method to find and download large numbers of WordPress pages; a targeted extraction of content featuring much needed metadata; and an analysis of the documents in the corpus with insights of actual blog uses. The study focuses on a publishing software (WordPress), which allows for reliable extraction of structural elements such as metadata, posts, and comments. The download of about 9 million documents in the course of two experiments leads after processing to 2.7 billion tokens with usable metadata. This comparatively high yield is a step towards more efficiency with respect to machine power and " Hi-Fi " web corpora. The resulting corpus complies with formal requirements on metadata-enhanced corpora and on weblogs considered as a series of dated entries. However, existing typologies on Web texts have to be revised in the light of this hybrid genre
    • …
    corecore