761 research outputs found

    Optimal column layout for hybrid workloads

    Get PDF
    Data-intensive analytical applications need to support both efficient reads and writes. However, what is usually a good data layout for an update-heavy workload, is not well-suited for a read-mostly one and vice versa. Modern analytical data systems rely on columnar layouts and employ delta stores to inject new data and updates. We show that for hybrid workloads we can achieve close to one order of magnitude better performance by tailoring the column layout design to the data and query workload. Our approach navigates the possible design space of the physical layout: it organizes each column’s data by determining the number of partitions, their corresponding sizes and ranges, and the amount of buffer space and how it is allocated. We frame these design decisions as an optimization problem that, given workload knowledge and performance requirements, provides an optimal physical layout for the workload at hand. To evaluate this work, we build an in-memory storage engine, Casper, and we show that it outperforms state-of-the-art data layouts of analytical systems for hybrid workloads. Casper delivers up to 2.32x higher throughput for update-intensive workloads and up to 2.14x higher throughput for hybrid workloads. We further show how to make data layout decisions robust to workload variation by carefully selecting the input of the optimization.http://www.vldb.org/pvldb/vol12/p2393-athanassoulis.pdfPublished versionPublished versio

    High-Dimensional Spatio-Temporal Indexing

    Get PDF
    There exist numerous indexing methods which handle either spatio-temporal or high-dimensional data well. However, those indexing methods which handle spatio-temporal data well have certain drawbacks when confronted with high-dimensional data. As the most efficient spatio-temporal indexing methods are based on the R-tree and its variants, they face the well known problems in high-dimensional space. Furthermore, most high-dimensional indexing methods try to reduce the number of dimensions in the data being indexed and compress the information given by all dimensions into few dimensions but are not able to store now - relative data. One of the most efficient high-dimensional indexing methods, the Pyramid Technique, is able to handle high-dimensional point-data only. Nonetheless, we take this technique and extend it such that it is able to handle spatio-temporal data as well. We introduce a technique for querying in this structure with spatio-temporal queries. We compare our technique, the Spatio-Temporal Pyramid Adapter (STPA), to the RST-tree for in-memory and on-disk applications. We show that for high dimensions, the extra query-cost for reducing the dimensionality in the Pyramid Technique is clearly exceeded by the rising query-cost in the RST-tree. Concluding, we address the main drawbacks and advantages of our technique

    End-to-end deep reinforcement learning in computer systems

    Get PDF
    Abstract The growing complexity of data processing systems has long led systems designers to imagine systems (e.g. databases, schedulers) which can self-configure and adapt based on environmental cues. In this context, reinforcement learning (RL) methods have since their inception appealed to systems developers. They promise to acquire complex decision policies from raw feedback signals. Despite their conceptual popularity, RL methods are scarcely found in real-world data processing systems. Recently, RL has seen explosive growth in interest due to high profile successes when utilising large neural networks (deep reinforcement learning). Newly emerging machine learning frameworks and powerful hardware accelerators have given rise to a plethora of new potential applications. In this dissertation, I first argue that in order to design and execute deep RL algorithms efficiently, novel software abstractions are required which can accommodate the distinct computational patterns of communication-intensive and fast-evolving algorithms. I propose an architecture which decouples logical algorithm construction from local and distributed execution semantics. I further present RLgraph, my proof-of-concept implementation of this architecture. In RLgraph, algorithm developers can explore novel designs by constructing a high-level data flow graph through combination of logical components. This dataflow graph is independent of specific backend frameworks or notions of execution, and is only later mapped to execution semantics via a staged build process. RLgraph enables high-performing algorithm implementations while maintaining flexibility for rapid prototyping. Second, I investigate reasons for the scarcity of RL applications in systems themselves. I argue that progress in applied RL is hindered by a lack of tools for task model design which bridge the gap between systems and algorithms, and also by missing shared standards for evaluation of model capabilities. I introduce Wield, a first-of-its-kind tool for incremental model design in applied RL. Wield provides a small set of primitives which decouple systems interfaces and deployment-specific configuration from representation. Core to Wield is a novel instructive experiment protocol called progressive randomisation which helps practitioners to incrementally evaluate different dimensions of non-determinism. I demonstrate how Wield and progressive randomisation can be used to reproduce and assess prior work, and to guide implementation of novel RL applications
    • …
    corecore