44,796 research outputs found

    Forecasting the cost of processing multi-join queries via hashing for main-memory databases (Extended version)

    Full text link
    Database management systems (DBMSs) carefully optimize complex multi-join queries to avoid expensive disk I/O. As servers today feature tens or hundreds of gigabytes of RAM, a significant fraction of many analytic databases becomes memory-resident. Even after careful tuning for an in-memory environment, a linear disk I/O model such as the one implemented in PostgreSQL may make query response time predictions that are up to 2X slower than the optimal multi-join query plan over memory-resident data. This paper introduces a memory I/O cost model to identify good evaluation strategies for complex query plans with multiple hash-based equi-joins over memory-resident data. The proposed cost model is carefully validated for accuracy using three different systems, including an Amazon EC2 instance, to control for hardware-specific differences. Prior work in parallel query evaluation has advocated right-deep and bushy trees for multi-join queries due to their greater parallelization and pipelining potential. A surprising finding is that the conventional wisdom from shared-nothing disk-based systems does not directly apply to the modern shared-everything memory hierarchy. As corroborated by our model, the performance gap between the optimal left-deep and right-deep query plan can grow to about 10X as the number of joins in the query increases.Comment: 15 pages, 8 figures, extended version of the paper to appear in SoCC'1

    Algorithms for Fast Aggregated Convergecast in Sensor Networks

    Get PDF
    Fast and periodic collection of aggregated data is of considerable interest for mission-critical and continuous monitoring applications in sensor networks. In the many-to-one communication paradigm, referred to as convergecast, we focus on applications wherein data packets are aggregated at each hop en-route to the sink along a tree-based routing topology, and address the problem of minimizing the convergecast schedule length by utilizing multiple frequency channels. The primary hindrance in minimizing the schedule length is the presence of interfering links. We prove that it is NP-complete to determine whether all the interfering links in an arbitrary network can be removed using at most a constant number of frequencies. We give a sufficient condition on the number of frequencies for which all the interfering links can be removed, and propose a polynomial time algorithm that minimizes the schedule length in this case. We also prove that minimizing the schedule length for a given number of frequencies on an arbitrary network is NP-complete, and describe a greedy scheme that gives a constant factor approximation on unit disk graphs. When the routing tree is not given as an input to the problem, we prove that a constant factor approximation is still achievable for degree-bounded trees. Finally, we evaluate our algorithms through simulations and compare their performance under different network parameters
    • 

    corecore