119 research outputs found
LIPIcs, Volume 251, ITCS 2023, Complete Volume
LIPIcs, Volume 251, ITCS 2023, Complete Volum
Dynamic Dictionary with Subconstant Wasted Bits per Key
Dictionaries have been one of the central questions in data structures. A
dictionary data structure maintains a set of key-value pairs under insertions
and deletions such that given a query key, the data structure efficiently
returns its value. The state-of-the-art dictionaries [Bender, Farach-Colton,
Kuszmaul, Kuszmaul, Liu 2022] store key-value pairs with only bits of redundancy, and support all operations in time,
for . It was recently shown to be optimal [Li, Liang, Yu, Zhou
2023b].
In this paper, we study the regime where the redundant bits is , and
show that when is at least , all operations can be
supported in time, matching the lower bound in this
regime [Li, Liang, Yu, Zhou 2023b]. We present two data structures based on
which range is in. The data structure for utilizes a
generalization of adapters studied in [Berger, Kuszmaul, Polak, Tidor, Wein
2022] and [Li, Liang, Yu, Zhou 2023a]. The data structure for is based on recursively hashing into buckets with logarithmic
sizes.Comment: 46 pages; SODA 202
A Survey on Malware Detection with Graph Representation Learning
Malware detection has become a major concern due to the increasing number and
complexity of malware. Traditional detection methods based on signatures and
heuristics are used for malware detection, but unfortunately, they suffer from
poor generalization to unknown attacks and can be easily circumvented using
obfuscation techniques. In recent years, Machine Learning (ML) and notably Deep
Learning (DL) achieved impressive results in malware detection by learning
useful representations from data and have become a solution preferred over
traditional methods. More recently, the application of such techniques on
graph-structured data has achieved state-of-the-art performance in various
domains and demonstrates promising results in learning more robust
representations from malware. Yet, no literature review focusing on graph-based
deep learning for malware detection exists. In this survey, we provide an
in-depth literature review to summarize and unify existing works under the
common approaches and architectures. We notably demonstrate that Graph Neural
Networks (GNNs) reach competitive results in learning robust embeddings from
malware represented as expressive graph structures, leading to an efficient
detection by downstream classifiers. This paper also reviews adversarial
attacks that are utilized to fool graph-based detection methods. Challenges and
future research directions are discussed at the end of the paper.Comment: Preprint, submitted to ACM Computing Surveys on March 2023. For any
suggestions or improvements, please contact me directly by e-mai
Unbalanced Private Set Intersection from Homomorphic Encryption and Nested Cuckoo Hashing
Private Set Intersection (PSI) is a well-studied secure two-party computation problem in which a client and a server want to compute the intersection of their input sets without revealing additional information to the other party.
With this work, we present nested Cuckoo hashing, a novel hashing approach that can be combined with additively homomorphic encryption (AHE) to construct an efficient PSI protocol for unbalanced input sets.
We formally prove the security of our protocol against semi-honest adversaries in the standard model.
Our protocol yields client computation and communication complexity that is sublinear in the serverâs set size and is thus of interest to clients with limited resources.
The implementation and empirical evaluation of our protocol using the exponential ElGamal and BGV/BFV encryption schemes attests to state-of-the-art practical performance
Persistent Memory File Systems:A Survey
Persistent Memory (PM) is non-volatile byte-addressable memory that offers read and write latencies in the order of magnitude smaller than flash storage, such as SSDs. This survey discusses how file systems address the most prominent challenges in the implementation of file systems for Persistent Memory. First, we discuss how the properties of Persistent Memory change file system design. Second, we discuss work that aims to optimize small file I/O and the associated meta-data resolution. Third, we address how existing Persistent Memory file systems achieve (meta) data persistence and consistency
Applications
Volume 3 describes how resource-aware machine learning methods and techniques are used to successfully solve real-world problems. The book provides numerous specific application examples: in health and medicine for risk modelling, diagnosis, and treatment selection for diseases in electronics, steel production and milling for quality control during manufacturing processes in traffic, logistics for smart cities and for mobile communications
LIPIcs, Volume 274, ESA 2023, Complete Volume
LIPIcs, Volume 274, ESA 2023, Complete Volum
Panacea: Non-interactive and Stateless Oblivious RAM
Oblivious RAM (ORAM) allows a client to outsource storage to a remote server while hiding the data access pattern from the server. Many ORAM designs have been proposed to reduce the computational overhead and bandwidth blowup for the client. A recent work, Onion Ring ORAM (CCS\u2719), is able to achieve bandwidth blowup in the online phase using fully homomorphic encryption (FHE) techniques, at the cost of a
computationally expensive client-side offline phase. Furthermore, such a scheme can be categorized as a stateful construction, meaning that the client has to locally maintain a dynamic state representing the order of remote database elements.
We present Panacea: a novel design of ORAM based on FHE techniques, that is non-interactive and stateless, achieves bandwidth blowup, and does not require an expensive offline phase for the client to perform; in that sense, our design is the first of its kind among other ORAM designs. To provide the client with such performance benefits, our design delegates all expensive computation to the resourceful server. We additionally show how to boost the server performance significantly using probabilistic batch codes at the cost of only 1.5x in additional bandwidth blowup and 3x expansion in server storage, but less amortized bandwidth. Our experimental results show that our design, with the batching technique, is practical in terms of server computation overhead as well. Specifically, for a database size of , it takes only seconds of amortized computation time for a server to respond to a query. As a result of the statelessness and low computational overhead on the client, and reasonable computational overhead on the server, our design is very suitable to be deployed as a cloud-based privacy-preserving storage outsourcing solution with a portable client running on a lightweight device
Cuckoo Hashing in Cryptography: Optimal Parameters, Robustness and Applications
Cuckoo hashing is a powerful primitive that enables storing items using small space with efficient querying. At a high level, cuckoo hashing maps items into entries storing at most items such that each item is placed into one of randomly chosen entries. Additionally, there is an overflow stash that can store at most items. Many cryptographic primitives rely upon cuckoo hashing to privately embed and query data where it is integral to ensure small failure probability when constructing cuckoo hashing tables as it directly relates to the privacy guarantees.
As our main result, we present a more query-efficient cuckoo hashing construction using more hash functions. For construction failure probability , the query overhead of our scheme is . Our scheme has quadratically smaller query overhead than prior works for any target failure probability . We also prove lower bounds matching our construction. Our improvements come from a new understanding of the locality of cuckoo hashing failures for small sets of items.
We also initiate the study of robust cuckoo hashing where the input set may be chosen with knowledge of the hash functions. We present a cuckoo hashing scheme using more hash functions with query overhead that is robust against poly adversaries. Furthermore, we present lower bounds showing that this construction is tight and that extending previous approaches of large stashes or entries cannot obtain robustness except with query overhead.
As applications of our results, we obtain improved constructions for batch codes and PIR. In particular, we present the most efficient explicit batch code and blackbox reduction from single-query PIR to batch PIR
Scalable Hash Tables
The term scalability with regards to this dissertation has two meanings: It means
taking the best possible advantage of the provided resources (both computational
and memory resources) and it also means scaling data structures in the literal sense,
i.e., growing the capacity, by ârescalingâ the table.
Scaling well to computational resources implies constructing the fastest best per-
forming algorithms and data structures. On todayâs many-core machines the best
performance is immediately associated with parallelism. Since CPU frequencies
have stopped growing about 10-15 years ago, parallelism is the only way to take ad-
vantage of growing computational resources. But for data structures in general and
hash tables in particular performance is not only linked to faster computations. The
most execution time is actually spent waiting for memory. Thus optimizing data
structures to reduce the amount of memory accesses or to take better advantage of
the memory hierarchy especially through predictable access patterns and prefetch-
ing is just as important.
In terms of scaling the size of hash tables we have identified three domains where
scaling hash-based data structures have been lacking previously, i.e., space effi-
cient growing, concurrent hash tables, and Approximate Membership Query data
structures (AMQ-filter). Throughout this dissertation, we describe the problems
in these areas and develop efficient solutions. We highlight three different libraries
that we have developed over the course of this dissertation, each containing mul-
tiple implementations that have shown throughout our testing to be among the
best implementations in their respective domains. In this composition they offer
a comprehensive toolbox that can be used to solve many kinds of hashing related
problems or to develop individual solutions for further ones.
DySECT is a library for space efficient hash tables specifically growing space effi-
cient hash tables that scale with their input size. It contains the namesake DySECT
data structure in addition to a number of different probing and cuckoo based im-
plementations. Growt is a library for highly efficient concurrent hash tables. It
contains a very fast base table and a number of extensions to adapt this table to
match any purpose. All extension can be combined to create a variety of different
interfaces. In our extensive experimental evaluation, each adaptation has shown
to be among the best hash tables for their specific purpose. Lpqfilter is a library
for concurrent approximate membership query (AMQ) data structures. It contains
some original data structures, like the linear probing quotient filter, as well as some
novel approaches to dynamically sized quotient filters
- âŠ