44,796 research outputs found
Forecasting the cost of processing multi-join queries via hashing for main-memory databases (Extended version)
Database management systems (DBMSs) carefully optimize complex multi-join
queries to avoid expensive disk I/O. As servers today feature tens or hundreds
of gigabytes of RAM, a significant fraction of many analytic databases becomes
memory-resident. Even after careful tuning for an in-memory environment, a
linear disk I/O model such as the one implemented in PostgreSQL may make query
response time predictions that are up to 2X slower than the optimal multi-join
query plan over memory-resident data. This paper introduces a memory I/O cost
model to identify good evaluation strategies for complex query plans with
multiple hash-based equi-joins over memory-resident data. The proposed cost
model is carefully validated for accuracy using three different systems,
including an Amazon EC2 instance, to control for hardware-specific differences.
Prior work in parallel query evaluation has advocated right-deep and bushy
trees for multi-join queries due to their greater parallelization and
pipelining potential. A surprising finding is that the conventional wisdom from
shared-nothing disk-based systems does not directly apply to the modern
shared-everything memory hierarchy. As corroborated by our model, the
performance gap between the optimal left-deep and right-deep query plan can
grow to about 10X as the number of joins in the query increases.Comment: 15 pages, 8 figures, extended version of the paper to appear in
SoCC'1
Algorithms for Fast Aggregated Convergecast in Sensor Networks
Fast and periodic collection of aggregated data
is of considerable interest for mission-critical and continuous
monitoring applications in sensor networks. In the many-to-one
communication paradigm, referred to as convergecast, we focus
on applications wherein data packets are aggregated at each hop
en-route to the sink along a tree-based routing topology, and
address the problem of minimizing the convergecast schedule
length by utilizing multiple frequency channels. The primary
hindrance in minimizing the schedule length is the presence of
interfering links. We prove that it is NP-complete to determine
whether all the interfering links in an arbitrary network can
be removed using at most a constant number of frequencies.
We give a sufficient condition on the number of frequencies for
which all the interfering links can be removed, and propose a
polynomial time algorithm that minimizes the schedule length
in this case. We also prove that minimizing the schedule length
for a given number of frequencies on an arbitrary network is
NP-complete, and describe a greedy scheme that gives a constant
factor approximation on unit disk graphs. When the routing tree
is not given as an input to the problem, we prove that a constant
factor approximation is still achievable for degree-bounded trees.
Finally, we evaluate our algorithms through simulations and
compare their performance under different network parameters
- âŠ