38 research outputs found

    Fast evaluation of union-intersection expressions

    Full text link
    We show how to represent sets in a linear space data structure such that expressions involving unions and intersections of sets can be computed in a worst-case efficient way. This problem has applications in e.g. information retrieval and database systems. We mainly consider the RAM model of computation, and sets of machine words, but also state our results in the I/O model. On a RAM with word size ww, a special case of our result is that the intersection of mm (preprocessed) sets, containing nn elements in total, can be computed in expected time O(n(logw)2/w+km)O(n (\log w)^2 / w + km), where kk is the number of elements in the intersection. If the first of the two terms dominates, this is a factor w1o(1)w^{1-o(1)} faster than the standard solution of merging sorted lists. We show a cell probe lower bound of time Ω(n/(wmlogm)+(1logkw)k)\Omega(n/(w m \log m)+ (1-\tfrac{\log k}{w}) k), meaning that our upper bound is nearly optimal for small mm. Our algorithm uses a novel combination of approximate set representations and word-level parallelism

    Linear probing with constant independence

    Get PDF
    Hashing with linear probing dates back to the 1950s, and is among the most studied algorithms. In recent years it has become one of the most important hash table organizations since it uses the cache of modern computers very well. Unfortunately, previous analyses rely either on complicated and space consuming hash functions, or on the unrealistic assumption of free access to a truly random hash function. Already Carter and Wegman, in their seminal paper on universal hashing, raised the question of extending their analysis to linear probing. However, we show in this paper that linear probing using a pairwise independent family may have expected logarithmic cost per operation. On the positive side, we show that 5-wise independence is enough to ensure constant expected time per operation. This resolves the question of finding a space and time efficient hash function that provably ensures good performance for linear probing

    Simulating Uniform Hashing in Constant Time and Optimal Space

    Get PDF
    Many algorithms and data structures employing hashing have been analyzed under the uniform hashing assumption, i.e., the assumption that hash functions behave like truly random functions. In this paper it is shown how to implement hash functions that can be evaluated on a RAM in constant time, and behave like truly random functions on any set of n inputs, with high probability. The space needed to represent a function is O(n) words, which is the best possible (and a polynomial improvement compared to previous fast hash functions). As a consequence, a broad class of hashing schemes can be implemented to meet, with high probability, the performance guarantees of their uniform hashing analysis

    Continual Counting with Gradual Privacy Expiration

    Full text link
    Differential privacy with gradual expiration models the setting where data items arrive in a stream and at a given time tt the privacy loss guaranteed for a data item seen at time (td)(t-d) is ϵg(d)\epsilon g(d), where gg is a monotonically non-decreasing function. We study the fundamental continual (binary) counting\textit{continual (binary) counting} problem where each data item consists of a bit, and the algorithm needs to output at each time step the sum of all the bits streamed so far. For a stream of length TT and privacy without\textit{without} expiration continual counting is possible with maximum (over all time steps) additive error O(log2(T)/ε)O(\log^2(T)/\varepsilon) and the best known lower bound is Ω(log(T)/ε)\Omega(\log(T)/\varepsilon); closing this gap is a challenging open problem. We show that the situation is very different for privacy with gradual expiration by giving upper and lower bounds for a large set of expiration functions gg. Specifically, our algorithm achieves an additive error of O(log(T)/ϵ) O(\log(T)/\epsilon) for a large set of privacy expiration functions. We also give a lower bound that shows that if CC is the additive error of any ϵ\epsilon-DP algorithm for this problem, then the product of CC and the privacy expiration function after 2C2C steps must be Ω(log(T)/ϵ)\Omega(\log(T)/\epsilon). Our algorithm matches this lower bound as its additive error is O(log(T)/ϵ)O(\log(T)/\epsilon), even when g(2C)=O(1)g(2C) = O(1). Our empirical evaluation shows that we achieve a slowly growing privacy loss with significantly smaller empirical privacy loss for large values of dd than a natural baseline algorithm

    Lectin-Dependent Enhancement of Ebola Virus Infection via Soluble and Transmembrane C-type Lectin Receptors

    Get PDF
    Mannose-binding lectin (MBL) is a key soluble effector of the innate immune system that recognizes pathogen-specific surface glycans. Surprisingly, low-producing MBL genetic variants that may predispose children and immunocompromised individuals to infectious diseases are more common than would be expected in human populations. Since certain immune defense molecules, such as immunoglobulins, can be exploited by invasive pathogens, we hypothesized that MBL might also enhance infections in some circumstances. Consequently, the low and intermediate MBL levels commonly found in human populations might be the result of balancing selection. Using model infection systems with pseudotyped and authentic glycosylated viruses, we demonstrated that MBL indeed enhances infection of Ebola, Hendra, Nipah and West Nile viruses in low complement conditions. Mechanistic studies with Ebola virus (EBOV) glycoprotein pseudotyped lentiviruses confirmed that MBL binds to N-linked glycan epitopes on viral surfaces in a specific manner via the MBL carbohydrate recognition domain, which is necessary for enhanced infection. MBL mediates lipid-raft-dependent macropinocytosis of EBOV via a pathway that appears to require less actin or early endosomal processing compared with the filovirus canonical endocytic pathway. Using a validated RNA interference screen, we identified C1QBP (gC1qR) as a candidate surface receptor that mediates MBL-dependent enhancement of EBOV infection. We also identified dectin-2 (CLEC6A) as a potentially novel candidate attachment factor for EBOV. Our findings support the concept of an innate immune haplotype that represents critical interactions between MBL and complement component C4 genes and that may modify susceptibility or resistance to certain glycosylated pathogens. Therefore, higher levels of native or exogenous MBL could be deleterious in the setting of relative hypocomplementemia which can occur genetically or because of immunodepletion during active infections. Our findings confirm our hypothesis that the pressure of infectious diseases may have contributed in part to evolutionary selection of MBL mutant haplotypes

    Uniform Hashing in Constant Time and Optimal Space

    No full text
    corecore