10 research outputs found
Approximation trade-offs in Markovian stream processing: An empirical study
A large amount of the world’s data is both sequential and imprecise. Such data is commonly modeled as Markovian streams; examples include words/sentences inferred from raw audio signals, or discrete location sequences inferred from RFID or GPS data. The rich semantics and large volumes of these streams make them difficult to query efficiently. In this paper, we study the effects—on both efficiency and accuracy—of two common stream approximations. Through experiments on a realworld RFID data set, we identify conditions under which these approximations can improve performance by several orders of magnitude, with only minimal effects on query results. We also identify cases when the full rich semantics are necessary
RFID-Based Indoor Spatial Query Evaluation with Bayesian Filtering Techniques
People spend a significant amount of time in indoor spaces (e.g., office
buildings, subway systems, etc.) in their daily lives. Therefore, it is
important to develop efficient indoor spatial query algorithms for supporting
various location-based applications. However, indoor spaces differ from outdoor
spaces because users have to follow the indoor floor plan for their movements.
In addition, positioning in indoor environments is mainly based on sensing
devices (e.g., RFID readers) rather than GPS devices. Consequently, we cannot
apply existing spatial query evaluation techniques devised for outdoor
environments for this new challenge. Because Bayesian filtering techniques can
be employed to estimate the state of a system that changes over time using a
sequence of noisy measurements made on the system, in this research, we propose
the Bayesian filtering-based location inference methods as the basis for
evaluating indoor spatial queries with noisy RFID raw data. Furthermore, two
novel models, indoor walking graph model and anchor point indexing model, are
created for tracking object locations in indoor environments. Based on the
inference method and tracking models, we develop innovative indoor range and k
nearest neighbor (kNN) query algorithms. We validate our solution through use
of both synthetic data and real-world data. Our experimental results show that
the proposed algorithms can evaluate indoor spatial queries effectively and
efficiently. We open-source the code, data, and floor plan at
https://github.com/DataScienceLab18/IndoorToolKit
Towards Enabling Probabilistic Databases for Participatory Sensing
Participatory sensing has emerged as a new data collection paradigm, in which humans use their own devices (cell phone accelerometers, cameras, etc.) as sensors. This paradigm enables to collect a huge amount of data from the crowd for world-wide applications, without spending cost to buy dedicated sensors. Despite of this benefit, the data collected from human sensors are inherently uncertain due to no quality guarantee from the participants. Moreover, the participatory sensing data are time series that not only exhibit highly irregular dependencies on time, but also vary from sensor to sensor. To overcome these issues, we study in this paper the problem of creating probabilistic data from given (uncertain) time series collected by participatory sensors. We approach the problem in two steps. In the first step, we generate probabilistic times series from raw time series using a dynamical model from the time series literature. In the second step, we combine probabilistic time series from multiple sensors based on the mutual relationship between the reliability of the sensors and the quality of their data. Through extensive experimentation, we demonstrate the efficiency of our approach on both real data and synthetic data