Search CORE

946 research outputs found

Efficient Summing over Sliding Windows

Author: Basat Ran Ben
Einziger Gil
Friedman Roy
Kassner Yaron
Publication venue
Publication date: 01/01/2016
Field of study

This paper considers the problem of maintaining statistic aggregates over the last W elements of a data stream. First, the problem of counting the number of 1's in the last W bits of a binary stream is considered. A lower bound of {\Omega}(1/{\epsilon} + log W) memory bits for W{\epsilon}-additive approximations is derived. This is followed by an algorithm whose memory consumption is O(1/{\epsilon} + log W) bits, indicating that the algorithm is optimal and that the bound is tight. Next, the more general problem of maintaining a sum of the last W integers, each in the range of {0,1,...,R}, is addressed. The paper shows that approximating the sum within an additive error of RW{\epsilon} can also be done using {\Theta}(1/{\epsilon} + log W) bits for {\epsilon}={\Omega}(1/W). For {\epsilon}=o(1/W), we present a succinct algorithm which uses B(1 + o(1)) bits, where B={\Theta}(Wlog(1/W{\epsilon})) is the derived lower bound. We show that all lower bounds generalize to randomized algorithms as well. All algorithms process new elements and answer queries in O(1) worst-case time.Comment: A shorter version appears in SWAT 201

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

An Approach for Removing Redundant Data from RFID Data Streams

Author: Bashir
Bloom
Chen
Hairulnizam Mahdin
Jeffery
Jemal Abawajy
Martinez-Sala
Pupunwiwat
Shen
Shin
Publication venue: Molecular Diversity Preservation International (MDPI)
Publication date: 01/01/2011
Field of study

Radio frequency identification (RFID) systems are emerging as the primary object identification mechanism, especially in supply chain management. However, RFID naturally generates a large amount of duplicate readings. Removing these duplicates from the RFID data stream is paramount as it does not contribute new information to the system and wastes system resources. Existing approaches to deal with this problem cannot fulfill the real time demands to process the massive RFID data stream. We propose a data filtering approach that efficiently detects and removes duplicate readings from RFID data streams. Experimental results show that the proposed approach offers a significant improvement as compared to the existing approaches

CiteSeerX

Deakin Research Online

Crossref

Directory of Open Access Journals

PubMed Central

Quotient Hash Tables - Efficiently Detecting Duplicates in Streaming Data

Author: Géraud Rémi
Lombard-Platet Marius
Naccache David
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 14/01/2019
Field of study

This article presents the Quotient Hash Table (QHT) a new data structure for duplicate detection in unbounded streams. QHTs stem from a corrected analysis of streaming quotient filters (SQFs), resulting in a 33\% reduction in memory usage for equal performance. We provide a new and thorough analysis of both algorithms, with results of interest to other existing constructions. We also introduce an optimised version of our new data structure dubbed Queued QHT with Duplicates (QQHTD). Finally we discuss the effect of adversarial inputs for hash-based duplicate filters similar to QHT.Comment: Shorter version was accepted at SIGAPP SAC '1

arXiv.org e-Print Archive

Crossref

FLEET: Butterfly Estimation from a Bipartite Graph Stream

Author: Bar-Yossef R. Kumar Z.
Bera Suman K
Braverman Vladimir
Kane Daniel M
Li Lin
Liu Boge
Mehta Aranyak
Milo Ron
Shin Kijung
Turk Ata
Zhu Rong
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 28/08/2019
Field of study

We consider space-efficient single-pass estimation of the number of butterflies, a fundamental bipartite graph motif, from a massive bipartite graph stream where each edge represents a connection between entities in two different partitions. We present a space lower bound for any streaming algorithm that can estimate the number of butterflies accurately, as well as FLEET, a suite of algorithms for accurately estimating the number of butterflies in the graph stream. Estimates returned by the algorithms come with provable guarantees on the approximation error, and experiments show good tradeoffs between the space used and the accuracy of approximation. We also present space-efficient algorithms for estimating the number of butterflies within a sliding window of the most recent elements in the stream. While there is a significant body of work on counting subgraphs such as triangles in a unipartite graph stream, our work seems to be one of the few to tackle the case of bipartite graph streams.Comment: This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in Seyed-Vahid Sanei-Mehri, Yu Zhang, Ahmet Erdem Sariyuce and Srikanta Tirthapura. "FLEET: Butterfly Estimation from a Bipartite Graph Stream". The 28th ACM International Conference on Information and Knowledge Managemen

arXiv.org e-Print Archive

Crossref

FingerPrint Based Duplicate Detection in Streamed Data

Author: Batra Shalini
Singh Amritpal
Publication venue: Institute of Informatics, Slovak Academy of Sciences
Publication date: 04/02/2019
Field of study

In computing, duplicate data detection refers to identifying duplicate copies of repeating data. Identifying duplicate data items in streamed data and eliminating them before storing, is a complex job. This paper proposes a novel data structure for duplicate detection using a variant of stable Bloom filter named as FingerPrint Stable Bloom Filter (FP-SBF). The proposed approach uses counting Bloom filter with fingerprint bits along with an optimization mechanism for duplicate detection. FP-SBF uses d-left hashing which reduces the computational time and decreases the false positives as well as false negatives. FP-SBF can process unbounded data in single pass, using k hash functions, and successfully differentiate between duplicate and distinct elements in O(k+1) time, independent of the size of incoming data. The performance of FP-SBF has been compared with various Bloom Filters used for stream data duplication detection and it has been theoretically and experimentally proved that the proposed approach efficiently detects the duplicates in streaming data with less memory requirements

Computing and Informatics (E-Journal - Institute of Informatics, SAS, Bratislava)