26 research outputs found
Tractability in probabilistic databases
All in-text references underlined in blue are linked to publications on ResearchGate, letting you access and read them immediately
Approximation trade-offs in Markovian stream processing: An empirical study
A large amount of the world’s data is both sequential and imprecise. Such data is commonly modeled as Markovian streams; examples include words/sentences inferred from raw audio signals, or discrete location sequences inferred from RFID or GPS data. The rich semantics and large volumes of these streams make them difficult to query efficiently. In this paper, we study the effects—on both efficiency and accuracy—of two common stream approximations. Through experiments on a realworld RFID data set, we identify conditions under which these approximations can improve performance by several orders of magnitude, with only minimal effects on query results. We also identify cases when the full rich semantics are necessary
Capturing Data Uncertainty in High-Volume Stream Processing
We present the design and development of a data stream system that captures
data uncertainty from data collection to query processing to final result
generation. Our system focuses on data that is naturally modeled as continuous
random variables. For such data, our system employs an approach grounded in
probability and statistical theory to capture data uncertainty and integrates
this approach into high-volume stream processing. The first component of our
system captures uncertainty of raw data streams from sensing devices. Since
such raw streams can be highly noisy and may not carry sufficient information
for query processing, our system employs probabilistic models of the data
generation process and stream-speed inference to transform raw data into a
desired format with an uncertainty metric. The second component captures
uncertainty as data propagates through query operators. To efficiently quantify
result uncertainty of a query operator, we explore a variety of techniques
based on probability and statistical theory to compute the result distribution
at stream speed. We are currently working with a group of scientists to
evaluate our system using traces collected from the domains of (and eventually
in the real systems for) hazardous weather monitoring and object tracking and
monitoring.Comment: CIDR 200
Extending Event-Driven Architecture for Proactive Systems
ABSTRACT Proactive Event-Driven Computing is a new paradigm, in which a decision is not made due to explicit users' requests nor is it made as a response to past events. Rather, the decision is autonomously triggered by forecasting future states. Proactive event-driven computing requires a departure from current event-driven architectures to ones capable of handling uncertainty and future events, and real-time decision making. We present a proactive event-driven architecture for Scalable Proactive Event-Driven Decision-making (SPEEDD), which combines these capabilities. The proposed architecture is composed of three main components: complex event processing, real-time decision making, and visualization. This architecture is instantiated by a real use case from the traffic management domain. In the future, the results of actual implementations of the use case will help us revise and refine the proposed architecture
RFID-Based Indoor Spatial Query Evaluation with Bayesian Filtering Techniques
People spend a significant amount of time in indoor spaces (e.g., office
buildings, subway systems, etc.) in their daily lives. Therefore, it is
important to develop efficient indoor spatial query algorithms for supporting
various location-based applications. However, indoor spaces differ from outdoor
spaces because users have to follow the indoor floor plan for their movements.
In addition, positioning in indoor environments is mainly based on sensing
devices (e.g., RFID readers) rather than GPS devices. Consequently, we cannot
apply existing spatial query evaluation techniques devised for outdoor
environments for this new challenge. Because Bayesian filtering techniques can
be employed to estimate the state of a system that changes over time using a
sequence of noisy measurements made on the system, in this research, we propose
the Bayesian filtering-based location inference methods as the basis for
evaluating indoor spatial queries with noisy RFID raw data. Furthermore, two
novel models, indoor walking graph model and anchor point indexing model, are
created for tracking object locations in indoor environments. Based on the
inference method and tracking models, we develop innovative indoor range and k
nearest neighbor (kNN) query algorithms. We validate our solution through use
of both synthetic data and real-world data. Our experimental results show that
the proposed algorithms can evaluate indoor spatial queries effectively and
efficiently. We open-source the code, data, and floor plan at
https://github.com/DataScienceLab18/IndoorToolKit