65 research outputs found

    Analysing Compression Techniques for In-Memory Collaborative Filtering

    Get PDF
    Following the recent trend of in-memory data processing, it is a usual practice to maintain collaborative filtering data in the main memory when generating recommendations in academic and industrial recommender systems. In this paper, we study the impact of integer compression techniques for in-memory collaborative filtering data in terms of space and time efficiency. Our results provide relevant observations about when and how to compress collaborative filtering data. First, we observe that, depending on the memory constraints, compression techniques may speed up or slow down the performance of state-of-the art collaborative filtering algorithms. Second, after comparing different compression techniques, we find the Frame of Reference (FOR) technique to be the best option in terms of space and time efficiency under different memory constraints

    Processing and Transmission of Information

    Get PDF
    Contains research objectives and summary of research.U. S. Army Research Office - Durham (Contract DAHC04-69-C-0042)U. S. Army Research Office - Durham Contract DAHCO4-71 -C-0039)Joint Services Electronics Program (Contract DAAB07-71-C-0300)National Science Foundation (Grant GK-37582)National Aeronautics and Space Administration (Grant NGL 22-009-013

    Processing and Transmission of Information

    Get PDF
    Contains reports on one research project.Joint Services Electronics Program (Contract DAAB07-71-C-0300)National Science Foundation (Grant GK-37582

    Succinct Dictionary Matching With No Slowdown

    Full text link
    The problem of dictionary matching is a classical problem in string matching: given a set S of d strings of total length n characters over an (not necessarily constant) alphabet of size sigma, build a data structure so that we can match in a any text T all occurrences of strings belonging to S. The classical solution for this problem is the Aho-Corasick automaton which finds all occ occurrences in a text T in time O(|T| + occ) using a data structure that occupies O(m log m) bits of space where m <= n + 1 is the number of states in the automaton. In this paper we show that the Aho-Corasick automaton can be represented in just m(log sigma + O(1)) + O(d log(n/d)) bits of space while still maintaining the ability to answer to queries in O(|T| + occ) time. To the best of our knowledge, the currently fastest succinct data structure for the dictionary matching problem uses space O(n log sigma) while answering queries in O(|T|log log n + occ) time. In this paper we also show how the space occupancy can be reduced to m(H0 + O(1)) + O(d log(n/d)) where H0 is the empirical entropy of the characters appearing in the trie representation of the set S, provided that sigma < m^epsilon for any constant 0 < epsilon < 1. The query time remains unchanged.Comment: Corrected typos and other minor error

    Fast and Compact Set Intersection through Recursive Universe Partitioning

    Get PDF
    We present a data structure that encodes a sorted integer sequence in small space allowing, at the same time, fast intersection operations. The data layout is carefully designed to exploit word-level parallelism and SIMD instructions, hence providing good practical performance. The core algorithmic idea is that of recursive partitioning the universe of representation: A markedly different paradigm than the widespread strategy of partitioning the sequence based on its length. Extensive experimentation and comparison against several competitive techniques shows that the proposed solution embodies an improved space/time trade-off for the set intersection problem
    • …
    corecore