11 research outputs found
A New Fuzzy Query Processing System in Wireless Sensor Networks
The task of acquiring information from sensor networks through generating queries is one of the most important issues in wireless sensor networks. The structure of traditional query processing systems requires defining query criteria in the form of crisp predicates with explicit and numerical thresholds, leading them to be processed in a certain manner. The inherent uncertainty and imprecision of sensor data call for a new approach towards them. Since fuzzy theory provides a toolbox to capture the imprecision associated with both data and query, in this paper, a new system for processing fuzzy queries in wireless sensor networks is introduced. In this system, in addition to presenting a new structure for fuzzy queries, a new algorithm is introduced for processing fuzzy queries in sensor networks. Simulation results indicate that accuracy and precision of the results obtained from fuzzy queries are higher than traditional ones, whereas there is no significant difference between the two regarding their energy consumption
Capturing Data Uncertainty in High-Volume Stream Processing
We present the design and development of a data stream system that captures
data uncertainty from data collection to query processing to final result
generation. Our system focuses on data that is naturally modeled as continuous
random variables. For such data, our system employs an approach grounded in
probability and statistical theory to capture data uncertainty and integrates
this approach into high-volume stream processing. The first component of our
system captures uncertainty of raw data streams from sensing devices. Since
such raw streams can be highly noisy and may not carry sufficient information
for query processing, our system employs probabilistic models of the data
generation process and stream-speed inference to transform raw data into a
desired format with an uncertainty metric. The second component captures
uncertainty as data propagates through query operators. To efficiently quantify
result uncertainty of a query operator, we explore a variety of techniques
based on probability and statistical theory to compute the result distribution
at stream speed. We are currently working with a group of scientists to
evaluate our system using traces collected from the domains of (and eventually
in the real systems for) hazardous weather monitoring and object tracking and
monitoring.Comment: CIDR 200
Multi-route query processing and optimization
A modern query optimizer typically picks a single query plan for all data based on overall data statistics. However, many have observed that real-life datasets tend to have non-uniform distributions. Selecting a single query plan may result in ineffective query execution for possibly large portions of the actual data. In addition most stream query processing systems, given the volume of data, cannot precisely model the system state much less account for uncertainty due to continuous variations. Such systems select a single query plan based upon imprecise statistics. In this paper, we present "Query Mesh" (or QM), a practical alternative to state-of-the-art data stream processing approaches. The main idea of QM is to compute multiple routes (i.e., query plans), each designed for a particular subset of the data with distinct statistical properties. We use terms "plans" and "routes" interchangeably in our work. A classifier model is induced and used to assign the best route to process incoming tuples based upon their data characteristics. We formulate the QM search space and analyze its complexity. Due to the substantial search space, we propose several cost-based query optimization heuristics designed to effectively find nearly optimal QMs. We propose the Self-Routing Fabric (SRF) infrastructure that supports query execution with multiple plans without physically constructing their topologies nor using a central router like Eddy. We also consider how to support uncertain route specification and execution in QM which can occur when imprecise statistics lead to more than one optimal route for a subset of data. Our experimental results indicate that QM consistently provides better query execution performance and incurs negligible overhead compared to the alternative state-of-the-art data stream approaches
Exploiting correlated attributes in acquisitional query processing
Sensor networks and other distributed information systems (such as the Web) must frequently access data that has a high per-attribute acquisition cost, in terms of energy, latency, or computational resources. When executing queries that contain several predicates over such expensive attributes, we observe that it can be beneficial to use correlations to automatically introduce low-cost attributes whose observation will allow the query processor to better estimate the selectivity of these expensive predicates. In particular, we show how to build conditional plans that branch into one or more sub-plans, each with a different ordering for the expensive query predicates, based on the runtime observation of low-cost attributes. We frame the problem of constructing the optimal conditional plan for a given user query and set of candidate low-cost attributes as an optimization problem. We describe an exponential time algorithm for finding such optimal plans, and describe a polynomial-time heuristic for identifying conditional plans that perform well in practice. We also show how to compactly model conditional probability distributions needed to identify correlations and build these plans. We evaluate our algorithms against several real-world sensor-network data sets, showing several-times performance increases for a variety of queries versus traditional optimization techniques. 1
Statistical Models for Querying and Managing Time-Series Data
In recent years we are experiencing a dramatic increase in the amount of available time-series data. Primary sources of time-series data are sensor networks, medical monitoring, financial applications, news feeds and social networking applications. Availability of large amount of time-series data calls for scalable data management techniques that enable efficient querying and analysis of such data in real-time and archival settings. Often the time-series data generated from sensors (environmental, RFID, GPS, etc.), are imprecise and uncertain in nature. Thus, it is necessary to characterize this uncertainty for producing clean answers. In this thesis we propose methods that address these important issues pertaining to time-series data. Particularly, this thesis is centered around the following three topics: Computing Statistical Measures on Large Time-Series Datasets. Computing statistical measures for large databases of time series is a fundamental primitive for querying and mining time-series data [31, 81, 97, 111, 132, 137]. This primitive is gaining importance with the increasing number and rapid growth of time-series databases. In Chapter 3, we introduce the Affinity framework for efficient computation of statistical measures by exploiting the concept of affine relationships [113, 114]. Affine relationships can be used to infer a large number of statistical measures for time series, from other related time series, instead of computing them directly; thus, reducing the overall computational cost significantly. Moreover, the Affinity framework proposes an unified approach for computing several statistical measures at once. Creating Probabilistic Databases from Imprecise Data. A large amount of time-series data produced in the real-world has an inherent element of uncertainty, arising due to the various sources of imprecision affecting its sources (like, sensor data, GPS trajectories, environmental monitoring data, etc.). The primary sources of imprecision in such data are: imprecise sensors, limited communication bandwidth, sensor failures, etc. Recently there has been an exponential rise in the number of such imprecise sensors, which has led to an explosion of imprecise data. Standard database techniques cannot be used to provide clean and consistent answers in such scenarios. Therefore, probabilistic databases that factor-in the inherent uncertainty and produce clean answers are required. An important assumption i while using probabilistic databases is that each data point has a probability distribution associated with it. This is not true in practice — the distributions are absent. As a solution to this fundamental limitation, in Chapter 4 we propose methods for inferring such probability distributions and using them for efficiently creating probabilistic databases [116]. Managing Participatory Sensing Data. Community-driven participatory sensing is a rapidly evolving paradigm in mobile geo-sensor networks. Here, sensors of various sorts (e.g., multi-sensor units monitoring air quality, cell phones, thermal watches, thermometers in vehicles, etc.) are carried by the community (public vehicles, private vehicles, or individuals) during their daily activities, collecting various types of data about their surrounding. Data generated by these devices is in large quantity, and geographically and temporally skewed. Therefore, it is important that systems designed for managing such data should be aware of these unique data characteristics. In Chapter 5, we propose the ConDense (Community-driven Sensing of the Environment) framework for managing and querying community-sensed data [5, 19, 115]. ConDense exploits spatial smoothness of environmental parameters (like, ambient pollution [5] or radiation [2]) to construct statistical models of the data. Since the number of constructed models is significantly smaller than the original data, we show that using our approach leads to dramatic increase in query processing efficiency [19, 115] and significantly reduces memory usage
Recommended from our members
Hybrid intelligent decision support system for distributed detection based on ad hoc integrated WSN & RFID
This thesis was submitted for the award of Doctor of Philosophy and was awarded by Brunel University LondonThe real time monitoring of environment context aware activities, based on distributed detection, is becoming a standard in public safety and service delivery in a wide range of domains (child and elderly care and supervision, logistics, circulation, and other). The safety of people, goods and premises depends on the prompt immediate reaction to potential hazards identified in real time, at an early stage to engage appropriate control actions. Effective emergency response can be supported only by available and acquired expertise or elaborate collaborative knowledge in the domain of distributed detection that include indoor sensing, tracking and localizing. This research proposes a hybrid conceptual multi-agent framework for the acquisition of collaborative knowledge in dynamic complex context aware environments for distributed detection. This framework has been applied for the design and development of a hybrid intelligent multi-agent decision system (HIDSS) that supports a decentralized active sensing, tracking and localizing strategy, and the deployment and configuration of smart detection devices associated to active sensor nodes wirelessly connected in a network topology to configure, deploy and control ad hoc wireless sensor networks (WSNs). This system, which is based on the interactive use of data, models and knowledge base, has been implemented to support fire detection and control access fusion functions aimed at elaborating: An integrated data model, grouping the building information data and WSN-RFID database, composed of the network configuration and captured data, A virtual layout configuration of the controlled premises, based on using a building information model, A knowledge-based support for the design of generic detection devices, A multi-criteria decision making model for generic detection devices distribution, ad hoc WSNs configuration, clustering and deployment, and Predictive data models for evacuation planning, and fire and evacuation simulation. An evaluation of the system prototype has been carried out to enrich information and knowledge fusion requirements and show the scope of the concepts used in data and process modelling. It has shown the practicability of hybrid solutions grouping generic homogeneous smart detection devices enhanced by heterogeneous support devices in their deployment, forming ad hoc networks that integrate WSNs and radio frequency identification (RFID) technology. The novelty in this work is the web-based support system architecture proposed in this framework that is based on the use of intelligent agent modelling and multi-agent systems, and the decoupling of the processes supporting the multi-sensor data fusion from those supporting different context applications. Although this decoupling is essential to appropriately distribute the different fusion functions, the integration of several dimensions of policy settings for the modelling of knowledge processes, and intelligent and pro-active decision making activities, requires the organisation of interactive fusion functions deployed upstream to a safety and emergency response.Saudi government, represented by the Ministry of Interior and General Directorate of Civil Defenc