10,434 research outputs found

    Causal modeling and prediction over event streams

    Get PDF
    In recent years, there has been a growing need for causal analysis in many modern stream applications such as web page click monitoring, patient health care monitoring, stock market prediction, electric grid monitoring, and network intrusion detection systems. The detection and prediction of causal relationships help in monitoring, planning, decision making, and prevention of unwanted consequences. An event stream is a continuous unbounded sequence of event instances. The availability of a large amount of continuous data along with high data throughput poses new challenges related to causal modeling over event streams, such as (1) the need for incremental causal inference for the unbounded data, (2) the need for fast causal inference for the high throughput data, and (3) the need for real-time prediction of effects from the events seen so far in the continuous event streams. This dissertation research addresses these three problems by focusing on utilizing temporal precedence information which is readily available in event streams: (1) an incremental causal model to update the causal network incrementally with the arrival of a new batch of events instead of storing the complete set of events seen so far and building the causal network from scratch with those stored events, (2) a fast causal model to speed up the causal network inference time, and (3) a real-time top-k predictive query processing mechanism to find the most probable k effects with the highest scores by proposing a run-time causal inference mechanism which addresses cyclic causal relationships. In this dissertation, the motivation, related work, proposed approaches, and the results are presented in each of the three problems

    Collaborative Reuse of Streaming Dataflows in IoT Applications

    Full text link
    Distributed Stream Processing Systems (DSPS) like Apache Storm and Spark Streaming enable composition of continuous dataflows that execute persistently over data streams. They are used by Internet of Things (IoT) applications to analyze sensor data from Smart City cyber-infrastructure, and make active utility management decisions. As the ecosystem of such IoT applications that leverage shared urban sensor streams continue to grow, applications will perform duplicate pre-processing and analytics tasks. This offers the opportunity to collaboratively reuse the outputs of overlapping dataflows, thereby improving the resource efficiency. In this paper, we propose \emph{dataflow reuse algorithms} that given a submitted dataflow, identifies the intersection of reusable tasks and streams from a collection of running dataflows to form a \emph{merged dataflow}. Similar algorithms to unmerge dataflows when they are removed are also proposed. We implement these algorithms for the popular Apache Storm DSPS, and validate their performance and resource savings for 35 synthetic dataflows based on public OPMW workflows with diverse arrival and departure distributions, and on 21 real IoT dataflows from RIoTBench.Comment: To appear in IEEE eScience Conference 201
    • …
    corecore