633 research outputs found

    Continuous Spatial Query Processing:A Survey of Safe Region Based Techniques

    Get PDF
    In the past decade, positioning system-enabled devices such as smartphones have become most prevalent. This functionality brings the increasing popularity of location-based services in business as well as daily applications such as navigation, targeted advertising, and location-based social networking. Continuous spatial queries serve as a building block for location-based services. As an example, an Uber driver may want to be kept aware of the nearest customers or service stations. Continuous spatial queries require updates to the query result as the query or data objects are moving. This poses challenges to the query efficiency, which is crucial to the user experience of a service. A large number of approaches address this efficiency issue using the concept of safe region . A safe region is a region within which arbitrary movement of an object leaves the query result unchanged. Such a region helps reduce the frequency of query result update and hence improves query efficiency. As a result, safe region-based approaches have been popular for processing various types of continuous spatial queries. Safe regions have interesting theoretical properties and are worth in-depth analysis. We provide a comparative study of safe region-based approaches. We describe how safe regions are computed for different types of continuous spatial queries, showing how they improve query efficiency. We compare the different safe region-based approaches and discuss possible further improvements

    Monitoring distributed fragmented skylines

    Get PDF
    Distributed skyline computation is important for a wide range of domains, from distributed and web-based systems to ISP-network monitoring and distributed databases. The problem is particularly challenging in dynamic distributed settings, where the goal is to efficiently monitor a continuous skyline query over a collection of distributed streams. All existing work relies on the assumption of a single point of reference for object attributes/dimensions: objects may be vertically or horizontally partitioned, but the accurate value of each dimension for each object is always maintained by a single site. This assumption is unrealistic for several distributed applications, where object information is fragmented over a set of distributed streams (each monitored by a different site) and needs to be aggregated (e.g., averaged) across several sites. Furthermore, it is frequently useful to define skyline dimensions through complex functions over the aggregated objects, which raises further challenges for dealing with distribution and object fragmentation. We present the first known distributed algorithms for continuous monitoring of skylines over complex functions of fragmented multi-dimensional objects. Our algorithms rely on decomposition of the skyline monitoring problem to a select set of distributed threshold-crossing queries, which can be monitored locally at each site. We propose several optimizations, including: (a) a technique for adaptively determining the most efficient monitoring strategy for each object, (b) an approximate monitoring technique, and (c) a strategy that reduces communication overhead by grouping together threshold-crossing queries. Furthermore, we discuss how our proposed algorithms can be used to address other continuous query types. A thorough experimental study with synthetic and real-life data sets verifies the effectiveness of our schemes and demonstrates order-of-magnitude improvements in communication costs compared to the only alternative centralized solution

    Distributed Query Monitoring through Convex Analysis: Towards Composable Safe Zones

    Get PDF
    Continuous tracking of complex data analytics queries over high-speed distributed streams is becoming increasingly important. Query tracking can be reduced to continuous monitoring of a condition over the global stream. Communication-efficient monitoring relies on locally processing stream data at the sites where it is generated, by deriving site-local conditions which collectively guarantee the global condition. Recently proposed geometric techniques offer a generic approach for splitting an arbitrary global condition into local geometric monitoring constraints (known as "Safe Zones"); still, their application to various problem domains has so far been based on heuristics and lacking a principled, compositional methodology. In this paper, we present the first known formal results on the difficult problem of effective Safe Zone (SZ) design for complex query monitoring over distributed streams. Exploiting tools from convex analysis, our approach relies on an algebraic representation of SZs which allows us to: (1) Formally define the notion of a "good" SZ for distributed monitoring problems; and, most importantly, (2) Tackle and solve the important problem of systematically composing SZs for monitored conditions expressed as Boolean formulas over simpler conditions (for which SZs are known); furthermore, we prove that, under broad assumptions, the composed SZ is good if the component SZs are good. Our results are, therefore, a first step towards a principled compositional solution to SZ design for distributed query monitoring. Finally, we discuss a number of important applications for our SZ design algorithms, also demonstrating how earlier geometric techniques can be seen as special cases of our framework

    Towards Mobility Data Science (Vision Paper)

    Full text link
    Mobility data captures the locations of moving objects such as humans, animals, and cars. With the availability of GPS-equipped mobile devices and other inexpensive location-tracking technologies, mobility data is collected ubiquitously. In recent years, the use of mobility data has demonstrated significant impact in various domains including traffic management, urban planning, and health sciences. In this paper, we present the emerging domain of mobility data science. Towards a unified approach to mobility data science, we envision a pipeline having the following components: mobility data collection, cleaning, analysis, management, and privacy. For each of these components, we explain how mobility data science differs from general data science, we survey the current state of the art and describe open challenges for the research community in the coming years.Comment: Updated arXiv metadata to include two authors that were missing from the metadata. PDF has not been change

    Similarity-aware query refinement for data exploration

    Get PDF

    Data centric storage framework for an intelligent wireless sensor network

    Get PDF
    In the last decade research into Wireless Sensor Networks (WSN) has triggered extensive growth in flexible and previously difficult to achieve scientific activities carried out in the most demanding and often remote areas of the world. This success has provoked research into new WSN related challenges including finding techniques for data management, analysis, and how to gather information from large, diverse, distributed and heterogeneous data sets. The shift in focus to research into a scalable, accessible and sustainable intelligent sensor networks reflects the ongoing improvements made in the design, development, deployment and operation of WSNs. However, one of the key and prime pre-requisites of an intelligent network is to have the ability of in-network data storage and processing which is referred to as Data Centric Storage (DCS). This research project has successfully proposed, developed and implemented a comprehensive DCS framework for WSN. Range query mechanism, similarity search, load balancing, multi-dimensional data search, as well as limited and constrained resources have driven the research focus. The architecture of the deployed network, referred to as Disk Based Data Centric Storage (DBDCS), was inspired by the magnetic disk storage platter consisting of tracks and sectors. The core contributions made in this research can be summarized as: a) An optimally synchronized routing algorithm, referred to Sector Based Distance (SBD) routing for the DBDCS architecture; b) DCS Metric based Similarity Searching (DCSMSS) with the realization of three exemplar queries – Range query, K-nearest neighbor query (KNN) and Skyline query; and c) A Decentralized Distributed Erasure Coding (DDEC) algorithm that achieves a similar level of reliability with less redundancy. SBD achieves high power efficiency whilst reducing updates and query traffic, end-to-end delay, and collisions. In order to guarantee reliability and minimizing end-to-end latency, a simple Grid Coloring Algorithm (GCA) is used to derive the time division multiple access (TDMA) schedules. The GCA uses a slot reuse concept to minimize the TDMA frame length. A performance evaluation was conducted with simulation results showing that SBD achieves a throughput enhancement by a factor of two, extension of network life time by 30%, and reduced end-to-end latency. DCSMSS takes advantage of a vector distance index, called iDistance, transforming the issue of similarity searching into the problem of an interval search in one dimension. DCSMSS balances the load across the network and provides efficient similarity searching in terms of three types of queries – range query, k-query and skyline query. Extensive simulation results reveal that DCSMSS is highly efficient and significantly outperforms previous approaches in processing similarity search queries. DDEC encoded the acquired information into n fragments and disseminated across n nodes inside a sector so that the original source packets can be recovered from any k surviving nodes. A lost fragment can also be regenerated from any d helper nodes. DDEC was evaluated against 3-Way Replication using different performance matrices. The results have highlighted that the use of erasure encoding in network storage can provide the desired level of data availability at a smaller memory overhead when compared to replication
    • …
    corecore