Article thumbnail

PhD Proposal: Functional monitoring problem for distributed large-scale data streams

By Emmanuelle Anceaume, Yann Busnel, Bruno Sericola, Lina Universit√© De Nantes, Inria Rennes and Bretagne Atlantique

Abstract

In this PhD proposal, we consider the setting of large scale distributed systems, in which each node needs to quickly process a huge amount of data received in the form of a stream that may have been tampered with by an adversary. In this situation, several fundamental problems has been raised recently, that concern many domains including machine learning, data mining, databases, information retrieval, and network monitoring. In all these applications, it is necessary to quickly and precisely process a huge amount of data. We propose to combine sampling techniques and information-theoretic methods to extract pertinent information from such a streams (metrics, summaries, pattern matching, etc.). Unfortunately, computing information theoretic measures in the data stream model is challenging essentially because one needs to process a huge amount of data sequentially, on the fly, and by using very little storage with respect to the size of the stream. In addition the analysis must be robust over time to detect any sudden change in the observed streams (which may be the manifestation of routers deny of service attack or worm propagation). On the other hand, very few works have tackled the distributed streaming model, also called the functional monitoring problem [12], which combines features of both the streaming model and communication complexity models. As in the streaming model, the input data is read on the fly, and processed with a minimum workspace and time. In the communication complexity model, each node receives an input data stream, performs some local computation, and communicates only with a coordinator who wishes to continuously compute or estimate a given function of the union of all the input streams. The challenging issue in this model is for the coordinator to compute the given function by minimizing the number of communicated bits [12, 6, 15]

Topics: Large-scale Data Stream, Randomized approximation algorithm, Functional monitoring problem, Byzantine Adversary, Performance Analysis
Year: 2013
OAI identifier: oai:CiteSeerX.psu:10.1.1.383.2171
Provided by: CiteSeerX
Download PDF:
Sorry, we are unable to provide the full text but you may find it at the following location(s):
  • http://citeseerx.ist.psu.edu/v... (external link)
  • http://people.irisa.fr/Emmanue... (external link)
  • Suggested articles


    To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.