13 research outputs found

    Wear Minimization for Cuckoo Hashing: How Not to Throw a Lot of Eggs into One Basket

    Full text link
    We study wear-leveling techniques for cuckoo hashing, showing that it is possible to achieve a memory wear bound of loglogn+O(1)\log\log n+O(1) after the insertion of nn items into a table of size CnCn for a suitable constant CC using cuckoo hashing. Moreover, we study our cuckoo hashing method empirically, showing that it significantly improves on the memory wear performance for classic cuckoo hashing and linear probing in practice.Comment: 13 pages, 1 table, 7 figures; to appear at the 13th Symposium on Experimental Algorithms (SEA 2014

    A Space Lower Bound for Dynamic Approximate Membership Data Structures

    No full text
    An approximate membership data structure is a randomized data structure representing a set which supports membership queries. It allows for a small false positive error rate but has no false negative errors. Such data structures were first introduced by Bloom in the 1970s and have since had numerous applications, mainly in distributed systems, database systems, and networks. The algorithm of Bloom (known as a Bloom filter) is quite effective: it can store an approximation of a set S of size n by using only ≈ 1.44n log2(1/ε) bits while having false positive error ε. This is within a constant factor of the information-theoretic lower bound of n log2(1/ε) for storing such sets. Closing this gap is an important open problem, as Bloom filters are widely used in situations where storage is at a premium. Bloom filters have another property: they are dynamic. That is, they support the iterative insertions of up to n elements. In fact, if one removes this requirement, there exist static data structures that receive the entire set at once and can almost achieve the information-theoretic lower bound; they require only (1 + o(1))n log2(1/ε) bits. Our main result is a new lower bound for the space requirements of any dynamic approximate membership data structure. We show that for any constant ε > 0, any such data structure that achieves false positive error rate of ε must use at least C(ε) · n log2(1/ε) memory bits, where C(ε) > 1 depends only on ε. This shows that the information-theoretic lower bound cannot be achieved by dynamic data structures for any constant error rate. © 2013 Society for Industrial and Applied Mathematics

    An Empirical Evaluation of Extendible Arrays

    No full text
    Abstract. We study the performance of several alternatives for implementing extendible arrays, which allow random access to elements stored in them, whilst allowing the arrays to be grown and shrunk. The study not only looks at the basic operations of grow/shrink and accessing data, but also the effects of memory fragmentation on performance.

    Probing the binding specificities of human Siglecs by cell-based glycan arrays

    No full text
    Siglecs are a family of sialic acid-binding receptors expressed by cells of the immune system and a few other cell types capable of modulating immune cell functions upon recognition of sialoglycan ligands. While human Siglecs primarily bind to sialic acid residues on diverse types of glycoproteins and glycolipids that constitute the sialome, their fine binding specificities for elaborated complex glycan structures and the contribution of the glycoconjugate and protein context for recognition of sialoglycans at the cell surface are not fully elucidated. Here, we generated a library of isogenic human HEK293 cells with combinatorial loss/gain of individual sialyltransferase genes and the introduction of sulfotransferases for display of the human sialome and to dissect Siglec interactions in the natural context of glycoconjugates at the cell surface. We found that Siglec-4/7/15 all have distinct binding preferences for sialylated GalNAc-type O-glycans but exhibit selectivity for patterns of O-glycans as presented on distinct protein sequences. We discovered that the sulfotransferase CHST1 drives sialoglycan binding of Siglec-3/8/7/15 and that sulfation can impact the preferences for binding to O-glycan patterns. In particular, the branched Neu5Acα2-3(6-O-sulfo)Galβ1-4GlcNAc (6'-Su-SLacNAc) epitope was discovered as the binding epitope for Siglec-3 (CD33) implicated in late-onset Alzheimer's disease. The cell-based display of the human sialome provides a versatile discovery platform that enables dissection of the genetic and biosynthetic basis for the Siglec glycan interactome and other sialic acid-binding proteins

    Improved Generic Algorithms for 3-Collisions

    No full text
    An rr-collision for a function is a set of rr distinct inputs with identical outputs. Actually finding rr-collisions for a random map over a finite set of cardinality NN requires at least about N(r1)/rN^{(r-1)/r} units of time on a sequential machine. For rr=2, memoryless and well-parallelisable algorithms are known. The current paper describes memory-efficient and parallelisable algorithms for r3r \ge 3. The main results are: (1)~A sequential algorithm for 3-collisions, roughly using memory NαN^\alpha and time N1αN^{1-\alpha} for α1/3\alpha\le1/3. I.e., given N1/3N^{1/3} units of storage, on can find 3-collisions in time N2/3N^{2/3}. Note that there is a time-memory tradeoff which allows to reduce the memory consumption. (2)~A parallelisation of this algorithm using N1/3N^{1/3} processors running in time N1/3N^{1/3}. Each single processor only needs a constant amount of memory. (3)~An generalisation of this second approach to rr-collisions for r3r \ge3: given NsN^s parallel processors, on can generate rr-collisions roughly in time N((r1)/r)sN^{((r-1)/r)-s}, using memory N((r2)/r)sN^{((r-2)/r)-s} on every processor
    corecore