Search CORE

6 research outputs found

Minwise-Independent Permutations with Insertion and Deletion of Features

Author: Kulkarni Raghav
Pratap Rameshwar
Publication venue
Publication date: 22/08/2023
Field of study

In their seminal work, Broder \textit{et. al.}~\citep{BroderCFM98} introduces the

\mathrm{minHash}

algorithm that computes a low-dimensional sketch of high-dimensional binary data that closely approximates pairwise Jaccard similarity. Since its invention,

\mathrm{minHash}

has been commonly used by practitioners in various big data applications. Further, the data is dynamic in many real-life scenarios, and their feature sets evolve over time. We consider the case when features are dynamically inserted and deleted in the dataset. We note that a naive solution to this problem is to repeatedly recompute

\mathrm{minHash}

with respect to the updated dimension. However, this is an expensive task as it requires generating fresh random permutations. To the best of our knowledge, no systematic study of

\mathrm{minHash}

is recorded in the context of dynamic insertion and deletion of features. In this work, we initiate this study and suggest algorithms that make the

\mathrm{minHash}

sketches adaptable to the dynamic insertion and deletion of features. We show a rigorous theoretical analysis of our algorithms and complement it with extensive experiments on several real-world datasets. Empirically we observe a significant speed-up in the running time while simultaneously offering comparable performance with respect to running

\mathrm{minHash}

from scratch. Our proposal is efficient, accurate, and easy to implement in practice

arXiv.org e-Print Archive

Exploiting the Computational Power of Ternary Content Addressable Memory

Author: Tirdad Kamran
Publication venue: 'University of Waterloo'
Publication date: 01/01/2011
Field of study

Ternary Content Addressable Memory or in short TCAM is a special type of memory that can execute a certain set of operations in parallel on all of its words. Because of power consumption and relatively small storage capacity, it has only been used in special environments. Over the past few years its cost has been reduced and its storage capacity has increased signifi cantly and these exponential trends are continuing. Hence it can be used in more general environments for larger problems. In this research we study how to exploit its computational power in order to speed up fundamental problems and needless to say that we barely scratched the surface. The main problems that has been addressed in our research are namely Boolean matrix multiplication, approximate subset queries using bloom filters, Fixed universe priority queues and network flow classi cation. For Boolean matrix multiplication our simple algorithm has a run time of O (d(N^2)/w) where N is the size of the square matrices, w is the number of bits in each word of TCAM and d is the maximum number of ones in a row of one of the matrices. For the Fixed universe priority queue problems we propose two data structures one with constant time complexity and space of O((1/ε)n(U^ε)) and the other one in linear space and amortized time complexity of O((lg lg U)/(lg lg lg U)) which beats the best possible data structure in the RAM model namely Y-fast trees. Considering each word of TCAM as a bloom filter, we modify the hash functions of the bloom filter and propose a data structure which can use the information capacity of each word of TCAM more efi ciently by using the co-occurrence probability of possible members. And finally in the last chapter we propose a novel technique for network flow classi fication using TCAM

University of Waterloo's Institutional Repository

On Restricted Min-Wise Independence of Permutations

Author: Ji R Matou Sek
Jiri Matousek
Milos Stojakovic
Publication venue
Publication date
Field of study

A family of permutations Sn with a probability distribution on it is called k-restricted min-wise independent if we have Pr[min #(X) = #(x)] = for every subset X |X | # k, every x X , and # chosen at random. We present a simple proof of a result of Norin: every such family has size at least . Some features of our method might be of independent interest

CiteSeerX

On restricted min-wise independence of permutations

Author: Alon
Babai
Broder
Broder
Broder
Chor
Fill
Gottlieb
Indyk
Itoh
Itoh
Karloff
Koller
Norin
Saks
Spencer
Takei
Tsetlin
Publication venue: 'Wiley'
Publication date
Field of study

Crossref

On Restricted Min-Wise Independence of Permutations

Author: Jiri Matousek
Publication venue
Publication date
Field of study

A family of permutations F Sn with a probability distribution on it is called k-restricted min-wise independent if we have Pr[min (X) = (x)] = jXj for every subset X [n] with jX j k, every x 2 X , and 2 F chosen at random. We present a simple proof of a result of Norin: every such family has size at least 2 c . Some features of our method might be of independent interest

CiteSeerX