Search CORE

3 research outputs found

Mining discriminative items in multiple data streams

Author: Lin Zhenhua
Publication venue
Publication date: 01/01/2010
Field of study

How can we maintain a dynamic profile capturing a user’s reading interest against the common interest? What are the queries that have been asked 1,000 times more frequently to a search engine from users in Asia than in Europe? To answer such interesting questions, we need to find discriminative items in multiple data streams. In this thesis, we show that, to exactly find all discriminative items in stream S1 against stream S2 by one scan, the space lower bound is O(|S| log (n1/|S|)), where S is the alphabet of items and n1 is the current size of S1. To tackle the space challenge, we develop three heuristic algorithms that can achieve high precision and recall using sub-linear space and processing time per item with respect to |S|. The complexity of all algorithms is independent to the size of the two streams. The extensive empirical study verifies our design

Simon Fraser University Institutional Repository