47,530 research outputs found

    SQPR: Stream Query Planning with Reuse

    Get PDF
    When users submit new queries to a distributed stream processing system (DSPS), a query planner must allocate physical resources, such as CPU cores, memory and network bandwidth, from a set of hosts to queries. Allocation decisions must provide the correct mix of resources required by queries, while achieving an efficient overall allocation to scale in the number of admitted queries. By exploiting overlap between queries and reusing partial results, a query planner can conserve resources but has to carry out more complex planning decisions. In this paper, we describe SQPR, a query planner that targets DSPSs in data centre environments with heterogeneous resources. SQPR models query admission, allocation and reuse as a single constrained optimisation problem and solves an approximate version to achieve scalability. It prevents individual resources from becoming bottlenecks by re-planning past allocation decisions and supports different allocation objectives. As our experimental evaluation in comparison with a state-of-the-art planner shows SQPR makes efficient resource allocation decisions, even with a high utilisation of resources, with acceptable overheads

    Resource allocation for query processing in grid systems: A survey

    Get PDF
    Grid systems are very useful platforms for distributed databases, especially in some situations in which the scale of data sources and user requests is very high. However, the main characteristics of grid systems such as dynamicity, large size and heterogeneity, bring new problems to the query processing domain such as resource discovery and resource allocation. In this paper, we provide a survey related to resource allocation methods for query processing In data grid systems. We provide a classification for existing studies considering their approaches to the resource allocation problem. We provide a synthesis of the studies and propose evaluations and comparisons for the different classes of studies. ©2012 CRL Publishing Ltd

    SAP HANA distributed in-memory database system: Transaction, session, and metadata management

    Get PDF
    One of the core principles of the SAP HANA database system is the comprehensive support of distributed query facility. Supporting scale-out scenarios was one of the major design principles of the system from the very beginning. Within this paper, we first give an overview of the overall functionality with respect to data allocation, metadata caching and query routing. We then dive into some level of detail for specific topics and explain features and methods not common in traditional disk-based database systems. In summary, the paper provides a comprehensive overview of distributed query processing in SAP HANA database to achieve scalability to handle large databases and heterogeneous types of workloads

    The data cyclotron query processing scheme

    Get PDF
    Distributed database systems exploit static workload characteristics to steer data fragmentation and data allocation schemes. However, the grand challenge of distributed query processing is to come up with a self-organizing architecture, which exploits all resources to manage the hot data set, minimize query response time, and maximize throughput without global co-ordination. In this paper, we introduce the Data Cyclotron architecture which addresses the challenges using turbulent data movement through a storage ring built from distributed main memory capitalizing modern remote-DMA facilities. Queries assigned to individual nodes interact with the Data Cyclotron by picking up data fragments continuously flowing around, i.e., the hot set. Each data fragment carries a level of interest (LOI) metric, which represents the cumulative query interest as the fragment passes around the ring multiple times. A fragment with a LOI below a given threshold, inversely proportional to the ring load, is pulled o

    Just-in-time Data Distribution for Analytical Query Processing

    Get PDF
    Distributed processing commonly requires data spread across machines using a priori static or hash-based data allocation. In this paper, we explore an alternative approach that starts from a master node in control of the complete database, and a variable number of worker nodes for delegated query processing. Data is shipped just-in-time to the worker nodes using a need to know policy, and is being reused, if possible, in subsequent queries. A bidding mechanism among the workers yields a scheduling with the most efficient reuse of previously shipped data, minimizing the data transfer costs. Just-in-time data shipment allows our system to benefit from locally available idle resources to boost overall performance. The system is maintenance-free and allocation is fully transparent to users. Our experiments show that the proposed adaptive distributed architecture is a viable and flexible alternative for small scale MapReduce-type of settings

    RkNN Query Processing in Distributed Spatial Infrastructures: A Performance Study

    Get PDF
    The Reverse k-Nearest Neighbor (RkNN) problem, i.e. finding all objects in a dataset that have a given query point among their corresponding k-nearest neighbors, has received increasing attention in the past years. RkNN queries are of particular interest in a wide range of applications such as decision support systems, resource allocation, profile-based marketing, location-based services, etc. With the current increasing volume of spatial data, it is difficult to perform RkNN queries efficiently in spatial data-intensive applications, because of the limited computational capability and storage resources. In this paper, we investigate how to design and implement distributed RkNN query algorithms using shared-nothing spatial cloud infrastructures as SpatialHadoop and LocationSpark. SpatialHadoop is a framework that inherently supports spatial indexing on top of Hadoop to perform efficiently spatial queries. LocationSpark is a recent spatial data processing system built on top of Spark. We have evaluated the performance of the distributed RkNN query algorithms on both SpatialHadoop and LocationSpark with big real-world datasets. The experiments have demonstrated the efficiency and scalability of our proposal in both distributed spatial data management systems, showing the performance advantages of LocationSpark

    An optimized cost-based data allocation model for heterogeneous distributed computing systems

    Get PDF
    Continuous attempts have been made to improve the flexibility and effectiveness of distributed computing systems. Extensive effort in the fields of connectivity technologies, network programs, high processing components, and storage helps to improvise results. However, concerns such as slowness in response, long execution time, and long completion time have been identified as stumbling blocks that hinder performance and require additional attention. These defects increased the total system cost and made the data allocation procedure for a geographically dispersed setup difficult. The load-based architectural model has been strengthened to improve data allocation performance. To do this, an abstract job model is employed, and a data query file containing input data is processed on a directed acyclic graph. The jobs are executed on the processing engine with the lowest execution cost, and the system's total cost is calculated. The total cost is computed by summing the costs of communication, computation, and network. The total cost of the system will be reduced using a Swarm intelligence algorithm. In heterogeneous distributed computing systems, the suggested approach attempts to reduce the system's total cost and improve data distribution. According to simulation results, the technique efficiently lowers total system cost and optimizes partitioned data allocation

    GeoLoc: Robust Resource Allocation Method for Query Optimization in Data Grid Systems

    Get PDF
    International audienceResource allocation (RA) is one of the key stages of distributed query processing in the Data Grid environment. In the last decade were published a number of works in the field that deals with different aspects of the problem. We believe that in those studies authors paid less attention to such important aspects as definition of allocation space and criterion of parallelism degree determination. In this paper we propose a method of RA that extends existing solutions in those two points of interest and resolves the problem in the specific conditions of the large scale heterogeneous environment of Data Grids. Firstly, we propose to use a geographical proximity of nodes to data sources to define the Allocation Space (AS). Secondly, we present the principle of execution time parity between scan and join (build and probe) operations for determination of parallelism degree and for generation of load balanced query execution plans. We conducted an experiment that proved the superiority of our GeoLoc method in terms of response time over the RA method that we chose for the comparison. The present study provides also a brief description of existing methods and their qualitative comparison with respect to proposed method

    Partout: A Distributed Engine for Efficient RDF Processing

    Full text link
    The increasing interest in Semantic Web technologies has led not only to a rapid growth of semantic data on the Web but also to an increasing number of backend applications with already more than a trillion triples in some cases. Confronted with such huge amounts of data and the future growth, existing state-of-the-art systems for storing RDF and processing SPARQL queries are no longer sufficient. In this paper, we introduce Partout, a distributed engine for efficient RDF processing in a cluster of machines. We propose an effective approach for fragmenting RDF data sets based on a query log, allocating the fragments to nodes in a cluster, and finding the optimal configuration. Partout can efficiently handle updates and its query optimizer produces efficient query execution plans for ad-hoc SPARQL queries. Our experiments show the superiority of our approach to state-of-the-art approaches for partitioning and distributed SPARQL query processing
    • 

    corecore