63,940 research outputs found

    Distributed top-k aggregation queries at large

    Get PDF
    Top-k query processing is a fundamental building block for efficient ranking in a large number of applications. Efficiency is a central issue, especially for distributed settings, when the data is spread across different nodes in a network. This paper introduces novel optimization methods for top-k aggregation queries in such distributed environments. The optimizations can be applied to all algorithms that fall into the frameworks of the prior TPUT and KLEE methods. The optimizations address three degrees of freedom: 1) hierarchically grouping input lists into top-k operator trees and optimizing the tree structure, 2) computing data-adaptive scan depths for different input sources, and 3) data-adaptive sampling of a small subset of input sources in scenarios with hundreds or thousands of query-relevant network nodes. All optimizations are based on a statistical cost model that utilizes local synopses, e.g., in the form of histograms, efficiently computed convolutions, and estimators based on order statistics. The paper presents comprehensive experiments, with three different real-life datasets and using the ns-2 network simulator for a packet-level simulation of a large Internet-style network

    Adaptive network protocols to support queries in dynamic networks

    Get PDF
    Recent technological advancements have led to the popularity of mobile devices, which can dynamically form wireless networks. In order to discover and obtain distributed information, queries are widely used by applications in opportunistically formed mobile networks. Given the popularity of this approach, application developers can choose from a number of implementations of query processing protocols to support the distributed execution of a query over the network. However, different inquiry strategies (i.e., the query processing protocol and associated parameters used to execute a query) have different tradeoffs between the quality of the query's result and the cost required for execution under different operating conditions. The application developer's choice of inquiry strategy is important to meet the application's needs while considering the limited resources of the mobile devices that form the network. We propose adaptive approaches to choose the most appropriate inquiry strategy in dynamic mobile environments. We introduce an architecture for adaptive queries which employs knowledge about the current state of the dynamic mobile network and the history of previous query results to learn the most appropriate inquiry strategy to balance quality and cost tradeoffs in a given setting, and use this information to dynamically adapt the continuous query's execution

    An adaptive approach to P2P resource discovery in distributed scientific research communities

    Get PDF
    Resource discovery in a distributed environment is always a challenging issue. It is even more difficult to provide an efficient query routing mechanism while still able to support complex query processing in a decentralised P2P environment. This paper presents an adaptive approach to P2P resource discovery. It separates the routing of queries from query matching mechanism so that an effective combination could be explored. Three properties of scientific research communities provide the grounding for the method: the existence of common interest groups, the willingness to share resources of common interests and the transitive relationship in the sharing behaviour. By exploiting these properties, search queries can be efficiently forwarded to those who are more likely to have the answers to improve the quality of search results and to reduce the network traffic. Experimental results have provided some evidence to confirm the efficiency of this adaptive approach

    Cracking KD-Tree: The first multidimensional adaptive indexing

    Get PDF
    Workload-aware physical data access structures are crucial to achieve short response time with (exploratory) data analysis tasks as commonly required for Big Data and Data Science applications. Recently proposed techniques such as automatic index advisers (for a priori known static workloads) and query-driven adaptive incremental indexing (for a priori unknown dynamic workloads) form the state-of-the-art to build single-dimensional indexes for single-attribute query predicates. However, similar techniques for more demanding multi-attribute query predicates, which are vital for any data analysis task, have not been proposed, yet. In this paper, we present our on-going work on a new set of workload-adaptive indexing techniques that focus on creating multidimensional indexes. We present our proof-of-concept, the Cracking KD-Tree, an adaptive indexing approach that generates a KD-Tree based on multidimensional range query predicates. It works by incrementally creating partial multidimensional indexes as a by-product of query processing. The indexes are produced only on those parts of the data that are accessed, and their creation cost is effectively distributed across a stream of queries. Experimental results show that the Cracking KD-Tree is three times faster than creating a full KD-Tree, one order of magnitude faster than executing full scans and two orders of magnitude faster than using uni-dimensional full or adaptive indexes on multiple columns

    Network-Aware Stream Query Processing in Mobile Ad-Hoc Networks

    Get PDF

    Adaptive Processing of Spatial-Keyword Data Over a Distributed Streaming Cluster

    Full text link
    The widespread use of GPS-enabled smartphones along with the popularity of micro-blogging and social networking applications, e.g., Twitter and Facebook, has resulted in the generation of huge streams of geo-tagged textual data. Many applications require real-time processing of these streams. For example, location-based e-coupon and ad-targeting systems enable advertisers to register millions of ads to millions of users. The number of users is typically very high and they are continuously moving, and the ads change frequently as well. Hence sending the right ad to the matching users is very challenging. Existing streaming systems are either centralized or are not spatial-keyword aware, and cannot efficiently support the processing of rapidly arriving spatial-keyword data streams. This paper presents Tornado, a distributed spatial-keyword stream processing system. Tornado features routing units to fairly distribute the workload, and furthermore, co-locate the data objects and the corresponding queries at the same processing units. The routing units use the Augmented-Grid, a novel structure that is equipped with an efficient search algorithm for distributing the data objects and queries. Tornado uses evaluators to process the data objects against the queries. The routing units minimize the redundant communication by not sending data updates for processing when these updates do not match any query. By applying dynamically evaluated cost formulae that continuously represent the processing overhead at each evaluator, Tornado is adaptive to changes in the workload. Extensive experimental evaluation using spatio-textual range queries over real Twitter data indicates that Tornado outperforms the non-spatio-textually aware approaches by up to two orders of magnitude in terms of the overall system throughput
    corecore