2,724 research outputs found

    Multi-Resource Parallel Query Scheduling and Optimization

    Full text link
    Scheduling query execution plans is a particularly complex problem in shared-nothing parallel systems, where each site consists of a collection of local time-shared (e.g., CPU(s) or disk(s)) and space-shared (e.g., memory) resources and communicates with remote sites by message-passing. Earlier work on parallel query scheduling employs either (a) one-dimensional models of parallel task scheduling, effectively ignoring the potential benefits of resource sharing, or (b) models of globally accessible resource units, which are appropriate only for shared-memory architectures, since they cannot capture the affinity of system resources to sites. In this paper, we develop a general approach capturing the full complexity of scheduling distributed, multi-dimensional resource units for all forms of parallelism within and across queries and operators. We present a level-based list scheduling heuristic algorithm for independent query tasks (i.e., physical operator pipelines) that is provably near-optimal for given degrees of partitioned parallelism (with a worst-case performance ratio that depends on the number of time-shared and space-shared resources per site and the granularity of the clones). We also propose extensions to handle blocking constraints in logical operator (e.g., hash-join) pipelines and bushy query plans as well as on-line task arrivals (e.g., in a dynamic or multi-query execution environment). Experiments with our scheduling algorithms implemented on top of a detailed simulation model verify their effectiveness compared to existing approaches in a realistic setting. Based on our analytical and experimental results, we revisit the open problem of designing efficient cost models for parallel query optimization and propose a solution that captures all the important parameters of parallel execution.Comment: 50 pages; Conference version of the paper has appeared in the Proceedings of the 23rd International Conference on Very Large Databases (VLDB'1997), Athens, Greece, August 199

    Optimization of Imperative Programs in a Relational Database

    Full text link
    For decades, RDBMSs have supported declarative SQL as well as imperative functions and procedures as ways for users to express data processing tasks. While the evaluation of declarative SQL has received a lot of attention resulting in highly sophisticated techniques, the evaluation of imperative programs has remained naive and highly inefficient. Imperative programs offer several benefits over SQL and hence are often preferred and widely used. But unfortunately, their abysmal performance discourages, and even prohibits their use in many situations. We address this important problem that has hitherto received little attention. We present Froid, an extensible framework for optimizing imperative programs in relational databases. Froid's novel approach automatically transforms entire User Defined Functions (UDFs) into relational algebraic expressions, and embeds them into the calling SQL query. This form is now amenable to cost-based optimization and results in efficient, set-oriented, parallel plans as opposed to inefficient, iterative, serial execution of UDFs. Froid's approach additionally brings the benefits of many compiler optimizations to UDFs with no additional implementation effort. We describe the design of Froid and present our experimental evaluation that demonstrates performance improvements of up to multiple orders of magnitude on real workloads.Comment: Extended version of the paper titled "FROID: Optimization of Imperative Programs in a Relational Database" in PVLDB 11(4), 2017. DOI: 10.1145/3164135.316414

    Parallel Weighted Random Sampling

    Get PDF
    Data structures for efficient sampling from a set of weighted items are an important building block of many applications. However, few parallel solutions are known. We close many of these gaps both for shared-memory and distributed-memory machines. We give efficient, fast, and practicable algorithms for sampling single items, k items with/without replacement, permutations, subsets, and reservoirs. We also give improved sequential algorithms for alias table construction and for sampling with replacement. Experiments on shared-memory parallel machines with up to 158 threads show near linear speedups both for construction and queries

    A Delta Debugger for ILP Query Execution

    Full text link
    Because query execution is the most crucial part of Inductive Logic Programming (ILP) algorithms, a lot of effort is invested in developing faster execution mechanisms. These execution mechanisms typically have a low-level implementation, making them hard to debug. Moreover, other factors such as the complexity of the problems handled by ILP algorithms and size of the code base of ILP data mining systems make debugging at this level a very difficult job. In this work, we present the trace-based debugging approach currently used in the development of new execution mechanisms in hipP, the engine underlying the ACE Data Mining system. This debugger uses the delta debugging algorithm to automatically reduce the total time needed to expose bugs in ILP execution, thus making manual debugging step much lighter.Comment: Paper presented at the 16th Workshop on Logic-based Methods in Programming Environments (WLPE2006

    Runtime Optimizations for Prediction with Tree-Based Models

    Full text link
    Tree-based models have proven to be an effective solution for web ranking as well as other problems in diverse domains. This paper focuses on optimizing the runtime performance of applying such models to make predictions, given an already-trained model. Although exceedingly simple conceptually, most implementations of tree-based models do not efficiently utilize modern superscalar processor architectures. By laying out data structures in memory in a more cache-conscious fashion, removing branches from the execution flow using a technique called predication, and micro-batching predictions using a technique called vectorization, we are able to better exploit modern processor architectures and significantly improve the speed of tree-based models over hard-coded if-else blocks. Our work contributes to the exploration of architecture-conscious runtime implementations of machine learning algorithms

    Lower Bound On the Computational Complexity of Discounted Markov Decision Problems

    Full text link
    We study the computational complexity of the infinite-horizon discounted-reward Markov Decision Problem (MDP) with a finite state space ∣S∣|\mathcal{S}| and a finite action space ∣A∣|\mathcal{A}|. We show that any randomized algorithm needs a running time at least Ω(∣S∣2∣A∣)\Omega(|\mathcal{S}|^2|\mathcal{A}|) to compute an ϵ\epsilon-optimal policy with high probability. We consider two variants of the MDP where the input is given in specific data structures, including arrays of cumulative probabilities and binary trees of transition probabilities. For these cases, we show that the complexity lower bound reduces to Ω(∣S∣∣A∣ϵ)\Omega\left( \frac{|\mathcal{S}| |\mathcal{A}|}{\epsilon} \right). These results reveal a surprising observation that the computational complexity of the MDP depends on the data structure of input

    LMSFC: A Novel Multidimensional Index based on Learned Monotonic Space Filling Curves

    Full text link
    The recently proposed learned indexes have attracted much attention as they can adapt to the actual data and query distributions to attain better search efficiency. Based on this technique, several existing works build up indexes for multi-dimensional data and achieve improved query performance. A common paradigm of these works is to (i) map multi-dimensional data points to a one-dimensional space using a fixed space-filling curve (SFC) or its variant and (ii) then apply the learned indexing techniques. We notice that the first step typically uses a fixed SFC method, such as row-major order and z-order. It definitely limits the potential of learned multi-dimensional indexes to adapt variable data distributions via different query workloads. In this paper, we propose a novel idea of learning a space-filling curve that is carefully designed and actively optimized for efficient query processing. We also identify innovative offline and online optimization opportunities common to SFC-based learned indexes and offer optimal and/or heuristic solutions. Experimental results demonstrate that our proposed method, LMSFC, outperforms state-of-the-art non-learned or learned methods across three commonly used real-world datasets and diverse experimental settings.Comment: Extended Version. Accepted by VLDB 202

    Spherical Indexing for Neighborhood Queries

    Full text link
    This is an algorithm for finding neighbors when the objects can freely move and have no predefined position. The query consists in finding neighbors for a center location and a given radius. Space is discretized in cubic cells. This algorithm introduces a direct spherical indexing that gives the list of all cells making up the query sphere, for any radius and any center location. It can additionally take in account both cyclic and non-cyclic regions of interest. Finding only the K nearest neighbors naturally benefits from the spherical indexing by minimally running through the sphere from center to edge, and reducing the maximum distance when K neighbors have been found.Comment: 9 pages, 10 figures. The source code is available at http://nicolas.brodu.free.fr/en/programmation/neighand/index.htm

    A Simple and Practical Concurrent Non-blocking Unbounded Graph with Reachability Queries

    Full text link
    Graph algorithms applied in many applications, including social networks, communication networks, VLSI design, graphics, and several others, require dynamic modifications -- addition and removal of vertices and/or edges -- in the graph. This paper presents a novel concurrent non-blocking algorithm to implement a dynamic unbounded directed graph in a shared-memory machine. The addition and removal operations of vertices and edges are lock-free. For a finite sized graph, the lookup operations are wait-free. Most significant component of the presented algorithm is the reachability query in a concurrent graph. The reachability queries in our algorithm are obstruction-free and thus impose minimal additional synchronization cost over other operations. We prove that each of the data structure operations are linearizable. We extensively evaluate a sample C/C++ implementation of the algorithm through a number of micro-benchmarks. The experimental results show that the proposed algorithm scales well with the number of threads and on an average provides 5 to 7x performance improvement over a concurrent graph implementation using coarse-grained locking.Comment: 10 pages, 5 figs, submitted to ICDCN-201

    Issues in providing a reliable multicast facility

    Get PDF
    Issues involved in point-to-multipoint communication are presented and the literature for proposed solutions and approaches surveyed. Particular attention is focused on the ideas and implementations that align with the requirements of the environment of interest. The attributes of multicast receiver groups that might lead to useful classifications, what the functionality of a management scheme should be, and how the group management module can be implemented are examined. The services that multicasting facilities can offer are presented, followed by mechanisms within the communications protocol that implements these services. The metrics of interest when evaluating a reliable multicast facility are identified and applied to four transport layer protocols that incorporate reliable multicast
    • …
    corecore