Skip to main content
Article thumbnail
Location of Repository

Chapter 2 ON CLUSTERING MASSIVE DATA STREAMS: A SUMMARIZATION PARADIGM

By Cham C. Aggarwal, Jiawei Han, Jianyong Wang and Philip S. Yu

Abstract

In recent years, data streams have become ubiquitous because of the large number of applications which generate huge volumes of data in an automated way. Many existing data mining methods cannot be applied directly on data streams because of the fact that the data needs to be mined in one pass. Furthermore, data streams show a considerable amount of temporal locality because of which a direct application of the existing methods may lead to misleading results. In this paper, we develop an efficient and effective approach for mining fast evolving data streams, which integrates the micro-clustering techniqueDATA STREAMS: MODELS AND ALGORITHMS with the high-level data mining process, and discovers data evolution regularities as well. Our analysis and experiments demonstrate two important data mining problems, namely stream clustering and stream classification, can be performed effectively using this approach, with high quality mining results. We discuss the use of micro-clustering as a general summarization technology to solve data mining problems on streams. Our discussion illustrates the importance of our approach for a variety of mining problems in the data stream domain. 1.

Publisher: 2013-09-21
Year: 2013
OAI identifier: oai:CiteSeerX.psu:10.1.1.352.3431
Provided by: CiteSeerX
Download PDF:
Sorry, we are unable to provide the full text but you may find it at the following location(s):
  • http://citeseerx.ist.psu.edu/v... (external link)
  • http://link.springer.com/conte... (external link)
  • Suggested articles


    To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.