2,724 research outputs found
Multi-Resource Parallel Query Scheduling and Optimization
Scheduling query execution plans is a particularly complex problem in
shared-nothing parallel systems, where each site consists of a collection of
local time-shared (e.g., CPU(s) or disk(s)) and space-shared (e.g., memory)
resources and communicates with remote sites by message-passing. Earlier work
on parallel query scheduling employs either (a) one-dimensional models of
parallel task scheduling, effectively ignoring the potential benefits of
resource sharing, or (b) models of globally accessible resource units, which
are appropriate only for shared-memory architectures, since they cannot capture
the affinity of system resources to sites. In this paper, we develop a general
approach capturing the full complexity of scheduling distributed,
multi-dimensional resource units for all forms of parallelism within and across
queries and operators. We present a level-based list scheduling heuristic
algorithm for independent query tasks (i.e., physical operator pipelines) that
is provably near-optimal for given degrees of partitioned parallelism (with a
worst-case performance ratio that depends on the number of time-shared and
space-shared resources per site and the granularity of the clones). We also
propose extensions to handle blocking constraints in logical operator (e.g.,
hash-join) pipelines and bushy query plans as well as on-line task arrivals
(e.g., in a dynamic or multi-query execution environment). Experiments with our
scheduling algorithms implemented on top of a detailed simulation model verify
their effectiveness compared to existing approaches in a realistic setting.
Based on our analytical and experimental results, we revisit the open problem
of designing efficient cost models for parallel query optimization and propose
a solution that captures all the important parameters of parallel execution.Comment: 50 pages; Conference version of the paper has appeared in the
Proceedings of the 23rd International Conference on Very Large Databases
(VLDB'1997), Athens, Greece, August 199
Optimization of Imperative Programs in a Relational Database
For decades, RDBMSs have supported declarative SQL as well as imperative
functions and procedures as ways for users to express data processing tasks.
While the evaluation of declarative SQL has received a lot of attention
resulting in highly sophisticated techniques, the evaluation of imperative
programs has remained naive and highly inefficient. Imperative programs offer
several benefits over SQL and hence are often preferred and widely used. But
unfortunately, their abysmal performance discourages, and even prohibits their
use in many situations. We address this important problem that has hitherto
received little attention.
We present Froid, an extensible framework for optimizing imperative programs
in relational databases. Froid's novel approach automatically transforms entire
User Defined Functions (UDFs) into relational algebraic expressions, and embeds
them into the calling SQL query. This form is now amenable to cost-based
optimization and results in efficient, set-oriented, parallel plans as opposed
to inefficient, iterative, serial execution of UDFs. Froid's approach
additionally brings the benefits of many compiler optimizations to UDFs with no
additional implementation effort. We describe the design of Froid and present
our experimental evaluation that demonstrates performance improvements of up to
multiple orders of magnitude on real workloads.Comment: Extended version of the paper titled "FROID: Optimization of
Imperative Programs in a Relational Database" in PVLDB 11(4), 2017. DOI:
10.1145/3164135.316414
Parallel Weighted Random Sampling
Data structures for efficient sampling from a set of weighted items are an important building block of many applications. However, few parallel solutions are known. We close many of these gaps both for shared-memory and distributed-memory machines. We give efficient, fast, and practicable algorithms for sampling single items, k items with/without replacement, permutations, subsets, and reservoirs. We also give improved sequential algorithms for alias table construction and for sampling with replacement. Experiments on shared-memory parallel machines with up to 158 threads show near linear speedups both for construction and queries
A Delta Debugger for ILP Query Execution
Because query execution is the most crucial part of Inductive Logic
Programming (ILP) algorithms, a lot of effort is invested in developing faster
execution mechanisms. These execution mechanisms typically have a low-level
implementation, making them hard to debug. Moreover, other factors such as the
complexity of the problems handled by ILP algorithms and size of the code base
of ILP data mining systems make debugging at this level a very difficult job.
In this work, we present the trace-based debugging approach currently used in
the development of new execution mechanisms in hipP, the engine underlying the
ACE Data Mining system. This debugger uses the delta debugging algorithm to
automatically reduce the total time needed to expose bugs in ILP execution,
thus making manual debugging step much lighter.Comment: Paper presented at the 16th Workshop on Logic-based Methods in
Programming Environments (WLPE2006
Runtime Optimizations for Prediction with Tree-Based Models
Tree-based models have proven to be an effective solution for web ranking as
well as other problems in diverse domains. This paper focuses on optimizing the
runtime performance of applying such models to make predictions, given an
already-trained model. Although exceedingly simple conceptually, most
implementations of tree-based models do not efficiently utilize modern
superscalar processor architectures. By laying out data structures in memory in
a more cache-conscious fashion, removing branches from the execution flow using
a technique called predication, and micro-batching predictions using a
technique called vectorization, we are able to better exploit modern processor
architectures and significantly improve the speed of tree-based models over
hard-coded if-else blocks. Our work contributes to the exploration of
architecture-conscious runtime implementations of machine learning algorithms
Lower Bound On the Computational Complexity of Discounted Markov Decision Problems
We study the computational complexity of the infinite-horizon
discounted-reward Markov Decision Problem (MDP) with a finite state space
and a finite action space . We show that any
randomized algorithm needs a running time at least
to compute an -optimal policy
with high probability. We consider two variants of the MDP where the input is
given in specific data structures, including arrays of cumulative probabilities
and binary trees of transition probabilities. For these cases, we show that the
complexity lower bound reduces to . These results reveal a surprising
observation that the computational complexity of the MDP depends on the data
structure of input
LMSFC: A Novel Multidimensional Index based on Learned Monotonic Space Filling Curves
The recently proposed learned indexes have attracted much attention as they
can adapt to the actual data and query distributions to attain better search
efficiency. Based on this technique, several existing works build up indexes
for multi-dimensional data and achieve improved query performance. A common
paradigm of these works is to (i) map multi-dimensional data points to a
one-dimensional space using a fixed space-filling curve (SFC) or its variant
and (ii) then apply the learned indexing techniques. We notice that the first
step typically uses a fixed SFC method, such as row-major order and z-order. It
definitely limits the potential of learned multi-dimensional indexes to adapt
variable data distributions via different query workloads. In this paper, we
propose a novel idea of learning a space-filling curve that is carefully
designed and actively optimized for efficient query processing. We also
identify innovative offline and online optimization opportunities common to
SFC-based learned indexes and offer optimal and/or heuristic solutions.
Experimental results demonstrate that our proposed method, LMSFC, outperforms
state-of-the-art non-learned or learned methods across three commonly used
real-world datasets and diverse experimental settings.Comment: Extended Version. Accepted by VLDB 202
Spherical Indexing for Neighborhood Queries
This is an algorithm for finding neighbors when the objects can freely move
and have no predefined position. The query consists in finding neighbors for a
center location and a given radius. Space is discretized in cubic cells. This
algorithm introduces a direct spherical indexing that gives the list of all
cells making up the query sphere, for any radius and any center location. It
can additionally take in account both cyclic and non-cyclic regions of
interest. Finding only the K nearest neighbors naturally benefits from the
spherical indexing by minimally running through the sphere from center to edge,
and reducing the maximum distance when K neighbors have been found.Comment: 9 pages, 10 figures. The source code is available at
http://nicolas.brodu.free.fr/en/programmation/neighand/index.htm
A Simple and Practical Concurrent Non-blocking Unbounded Graph with Reachability Queries
Graph algorithms applied in many applications, including social networks,
communication networks, VLSI design, graphics, and several others, require
dynamic modifications -- addition and removal of vertices and/or edges -- in
the graph. This paper presents a novel concurrent non-blocking algorithm to
implement a dynamic unbounded directed graph in a shared-memory machine. The
addition and removal operations of vertices and edges are lock-free. For a
finite sized graph, the lookup operations are wait-free. Most significant
component of the presented algorithm is the reachability query in a concurrent
graph. The reachability queries in our algorithm are obstruction-free and thus
impose minimal additional synchronization cost over other operations. We prove
that each of the data structure operations are linearizable. We extensively
evaluate a sample C/C++ implementation of the algorithm through a number of
micro-benchmarks. The experimental results show that the proposed algorithm
scales well with the number of threads and on an average provides 5 to 7x
performance improvement over a concurrent graph implementation using
coarse-grained locking.Comment: 10 pages, 5 figs, submitted to ICDCN-201
Issues in providing a reliable multicast facility
Issues involved in point-to-multipoint communication are presented and the literature for proposed solutions and approaches surveyed. Particular attention is focused on the ideas and implementations that align with the requirements of the environment of interest. The attributes of multicast receiver groups that might lead to useful classifications, what the functionality of a management scheme should be, and how the group management module can be implemented are examined. The services that multicasting facilities can offer are presented, followed by mechanisms within the communications protocol that implements these services. The metrics of interest when evaluating a reliable multicast facility are identified and applied to four transport layer protocols that incorporate reliable multicast
- …