13 research outputs found
Wear Minimization for Cuckoo Hashing: How Not to Throw a Lot of Eggs into One Basket
We study wear-leveling techniques for cuckoo hashing, showing that it is
possible to achieve a memory wear bound of after the
insertion of items into a table of size for a suitable constant
using cuckoo hashing. Moreover, we study our cuckoo hashing method empirically,
showing that it significantly improves on the memory wear performance for
classic cuckoo hashing and linear probing in practice.Comment: 13 pages, 1 table, 7 figures; to appear at the 13th Symposium on
Experimental Algorithms (SEA 2014
A Space Lower Bound for Dynamic Approximate Membership Data Structures
An approximate membership data structure is a randomized data structure representing a set which supports membership queries. It allows for a small false positive error rate but has no false negative errors. Such data structures were first introduced by Bloom in the 1970s and have since had numerous applications, mainly in distributed systems, database systems, and networks. The algorithm of Bloom (known as a Bloom filter) is quite effective: it can store an approximation of a set S of size n by using only ≈ 1.44n log2(1/ε) bits while having false positive error ε. This is within a constant factor of the information-theoretic lower bound of n log2(1/ε) for storing such sets. Closing this gap is an important open problem, as Bloom filters are widely used in situations where storage is at a premium. Bloom filters have another property: they are dynamic. That is, they support the iterative insertions of up to n elements. In fact, if one removes this requirement, there exist static data structures that receive the entire set at once and can almost achieve the information-theoretic lower bound; they require only (1 + o(1))n log2(1/ε) bits. Our main result is a new lower bound for the space requirements of any dynamic approximate membership data structure. We show that for any constant ε > 0, any such data structure that achieves false positive error rate of ε must use at least C(ε) · n log2(1/ε) memory bits, where C(ε) > 1 depends only on ε. This shows that the information-theoretic lower bound cannot be achieved by dynamic data structures for any constant error rate. © 2013 Society for Industrial and Applied Mathematics
An Empirical Evaluation of Extendible Arrays
Abstract. We study the performance of several alternatives for implementing extendible arrays, which allow random access to elements stored in them, whilst allowing the arrays to be grown and shrunk. The study not only looks at the basic operations of grow/shrink and accessing data, but also the effects of memory fragmentation on performance.
Probing the binding specificities of human Siglecs by cell-based glycan arrays
Siglecs are a family of sialic acid-binding receptors expressed by cells of the immune system and a few other cell types capable of modulating immune cell functions upon recognition of sialoglycan ligands. While human Siglecs primarily bind to sialic acid residues on diverse types of glycoproteins and glycolipids that constitute the sialome, their fine binding specificities for elaborated complex glycan structures and the contribution of the glycoconjugate and protein context for recognition of sialoglycans at the cell surface are not fully elucidated. Here, we generated a library of isogenic human HEK293 cells with combinatorial loss/gain of individual sialyltransferase genes and the introduction of sulfotransferases for display of the human sialome and to dissect Siglec interactions in the natural context of glycoconjugates at the cell surface. We found that Siglec-4/7/15 all have distinct binding preferences for sialylated GalNAc-type O-glycans but exhibit selectivity for patterns of O-glycans as presented on distinct protein sequences. We discovered that the sulfotransferase CHST1 drives sialoglycan binding of Siglec-3/8/7/15 and that sulfation can impact the preferences for binding to O-glycan patterns. In particular, the branched Neu5Acα2-3(6-O-sulfo)Galβ1-4GlcNAc (6'-Su-SLacNAc) epitope was discovered as the binding epitope for Siglec-3 (CD33) implicated in late-onset Alzheimer's disease. The cell-based display of the human sialome provides a versatile discovery platform that enables dissection of the genetic and biosynthetic basis for the Siglec glycan interactome and other sialic acid-binding proteins
Improved Generic Algorithms for 3-Collisions
An -collision for a function is a set of distinct inputs with identical outputs. Actually finding -collisions for a random map over a finite set of cardinality requires at least about units of time on a sequential machine. For =2, memoryless and well-parallelisable algorithms are known. The current paper describes memory-efficient and parallelisable algorithms for . The main results are: (1)~A sequential algorithm for 3-collisions, roughly using memory and time for . I.e., given units of storage, on can find 3-collisions in time . Note that there is a time-memory tradeoff which allows to reduce the memory consumption. (2)~A parallelisation of this algorithm using processors running in time . Each single processor only needs a constant amount of memory. (3)~An generalisation of this second approach to -collisions for : given parallel processors, on can generate -collisions roughly in time , using memory on every processor