185,796 research outputs found

    Compression-Aware In-Memory Query Processing: Vision, System Design and Beyond

    Get PDF
    In-memory database systems have to keep base data as well as intermediate results generated during query processing in main memory. In addition, the effort to access intermediate results is equivalent to the effort to access the base data. Therefore, the optimization of intermediate results is interesting and has a high impact on the performance of the query execution. For this domain, we propose the continuous use of lightweight compression methods for intermediate results and have the aim of developing a balanced query processing approach based on compressed intermediate results. To minimize the overall query execution time, it is important to find a balance between the reduced transfer times and the increased computational effort. This paper provides an overview and presents a system design for our vision. Our system design addresses the challenge of integrating a large and evolving corpus of lightweight data compression algorithms in an in-memory column store. In detail, we present our model-driven approach and describe ongoing research topics to realize our compression-aware query processing vision

    Doctor of Philosophy

    Get PDF
    dissertationThe internet-based information infrastructure that has powered the growth of modern personal/mobile computing is composed of powerful, warehouse-scale computers or datacenters. These heavily subscribed datacenters perform data-processing jobs under intense quality of service guarantees. Further, high-performance compute platforms are being used to model and analyze increasingly complex scientific problems and natural phenomena. To ensure that the high-performance needs of these machines are met, it is necessary to increase the efficiency of the memory system that supplies data to the processing cores. Many of the microarchitectural innovations that were designed to scale the memory wall (e.g., out-of-order instruction execution, on-chip caches) are being rendered less effective due to several emerging trends (e.g., increased emphasis on energy consumption, limited access locality). This motivates the optimization of the main memory system itself. The key to an efficient main memory system is the memory controller. In particular, the scheduling algorithm in the memory controller greatly influences its performance. This dissertation explores this hypothesis in several contexts. It develops tools to better understand memory scheduling and develops scheduling innovations for CPUs and GPUs. We propose novel memory scheduling techniques that are strongly aware of the access patterns of the clients as well as the microarchitecture of the memory device. Based on these, we present (i) a Dynamic Random Access Memory (DRAM) chip microarchitecture optimized for reducing write-induced slowdown, (ii) a memory scheduling algorithm that exploits these features, (iii) several memory scheduling algorithms to reduce the memory-related stall experienced by irregular General Purpose Graphics Processing Unit (GPGPU) applications, and (iv) the Utah Simulated Memory Module (USIMM), a detailed, validated simulator for DRAM main memory that we use for analyzing and proposing scheduler algorithms

    BUZZARD: A NUMA-Aware In-Memory Indexing System

    Get PDF
    With the availability of large main memory capacities, in-memory index structures have become an important component of modern data management platforms. Current research even suggests index-based query processing as an alternative or supplement for traditional tuple-at-a-time processing models. However, while simple sequential scan operations can fully exploit the high bandwidth provided by main memory, indexes are mainly latency bound and spend most of their time waiting for memory accesses. Considering current hardware trends, the problem of high memory latency is further exacerbated as modern shared-memory multiprocessors with non-uniform memory access (NUMA) become increasingly common. On those NUMA platforms, the execution time of index operations is dominated by memory access latency that increases dramatically when accessing memory on remote sockets. Therefore, good index performance can only be achieved through careful optimization of the index structure to the given topology. BUZZARD is a NUMA-aware in-memory indexing system. Using adaptive data partitioning techniques, BUZZARD distributes a prefix-tree-based index across the NUMA system and hands off incoming requests to worker threads located on each partition's respective NUMA node. This approach reduces the number of remote memory accesses to a minimum and improves cache utilization. In addition, all indexes inside BUZZARD are only accessed by their respective owner, eliminating the need for synchronization primitives like compare-and-swap

    Query Processing on Prefix Trees Live

    Get PDF
    Modern database systems have to process huge amounts of data and should provide results with low latency at the same time. To achieve this, data is nowadays typically hold completely in main memory, to benefit of its high bandwidth and low access latency that could never be reached with disks. Current in-memory databases are usually column-stores that exchange columns or vectors between operators and suffer from a high tuple reconstruction overhead. In this demonstration proposal, we present DexterDB, which implements our novel prefix tree-based processing model that makes indexes the first-class citizen of the database system. The core idea is that each operator takes a set of indexes as input and builds a new index as output that is indexed on the attribute requested by the successive operator. With that, we are able to build composed operators, like the multi-way-select-join-group. Such operators speed up the processing of complex OLAP queries so that DexterDB outperforms state-of-the-art in-memory databases. Our demonstration focuses on the different optimization options for such query plans. Hence, we built an interactive GUI that connects to a DexterDB instance and allows the manipulation of query optimization parameters. The generated query plans and important execution statistics are visualized to help the visitor to understand our processing model

    Evaluation an der FH Bund

    Get PDF
    The host-SIMD style heterogeneous multi-processor architecture offers high computing performance and user friendly programmability. It explores both task level parallelism and data level parallelism by the on-chip multiple SIMD coprocessors. For embedded DSP applications with predictable computing feature, this architecture can be further optimized for performance, implementation cost and power consumption. The optimization could be done by improving the SIMD processing efficiency and reducing redundant memory accesses and data shuffle operations. This paper introduces one effective approach by designing a software programmable multi-bank memory system for SIMD processors. Both the hardware architecture and software programming model are described in this paper, with an implementation example of the BLAS syrk routine. The proposed memory system offers high SIMD data access flexibility by using lookup table based address generators, and applying data permutations on both DMA controller interface and SIMD data access. The evaluation results show that the SIMD processor with this memory system can achieve high execution efficiency, with only 10% to 30% overhead. The proposed memory system also saves the implementation cost on SIMD local registers, in our system, each SIMD core has only 8 128-bit vector registers.ePUM

    GraphStep: A System Architecture for Sparse-Graph Algorithms

    Get PDF
    Many important applications are organized around long-lived, irregular sparse graphs (e.g., data and knowledge bases, CAD optimization, numerical problems, simulations). The graph structures are large, and the applications need regular access to a large, data-dependent portion of the graph for each operation (e.g., the algorithm may need to walk the graph, visiting all nodes, or propagate changes through many nodes in the graph). On conventional microprocessors, the graph structures exceed on-chip cache capacities, making main-memory bandwidth and latency the key performance limiters. To avoid this “memory wall,” we introduce a concurrent system architecture for sparse graph algorithms that places graph nodes in small distributed memories paired with specialized graph processing nodes interconnected by a lightweight network. This gives us a scalable way to map these applications so that they can exploit the high-bandwidth and low-latency capabilities of embedded memories (e.g., FPGA Block RAMs). On typical spreading activation queries on the ConceptNet Knowledge Base, a sample application, this translates into an order of magnitude speedup per FPGA compared to a state-of-the-art Pentium processor

    Memory-aware platform description and framework for source-level embedded MPSoC software optimization

    Get PDF
    Developing optimizing source-level transformations, consists of numerous non-trivial subtasks. Besides identifying actual optimization goals within a particular target-platform and compiler setup, the actual implementation is a tedious, error-prone and often recurring work. Providing appropriate support for this development work is a challenging task. Defining and implementing a well-suited target-platform description which can be used by a wide set of optimization techniques while being precise and easy to maintain is one dimension of this challenging task. Another dimension, which has also been tackled in this work, deals with provision of an infrastructure for optimization-step representation, interaction and data retention. Finally, an appropriate source-code representation has been integrated into this approach. These contributions are tightly related to each other, they have been bundled into the MACCv2 framework, a fullfledged optimization-technique implementation and integration approach. Together, they significantly alleviate the effort required for implementation of source-level memory-aware optimization techniques for Multi Processor Systems on a Chip (MPSoCs). The system-modeling approach presented in this dissertation has been located at the processor-memory-switch (PMS) abstraction level. It offers a novel combined structural and semantical description. It combines a locally-scoped, structural modeling approach, as preferred by system designers, and a fast, database-like interface, best suited for optimization technique developers. It supports model refinement and requires only limited effort for an initial abstract system model. The general structure consists of components and channels. Based on this structure, the system model provides mechanisms for database-like access to system-global target-platform properties, while requiring only definition of locally-scoped input data annotated to system-model items. A typical set of these properties contains energy-consumption and access-latency values. The request-based retrieval of system properties is a unique feature, which makes this approach superior to state-of-the-art table-lookup-based or full-system-simulation-based approaches. Combining such component-local properties to system-global target-platform data is performed via aspect handlers. These handlers define computational rules which are applied to correlated locally-scoped data along access paths in the memory-subsystem hierarchy. This approach is capable of calculating these system-global values at a rate similar to plain table lookups, while maintaining a precision close to full-system-simulation-based estimations. This has been shown for both, energy-consumption values as well as access-latency values of the MPARM platform. The MACCv2 framework provides a set of fundamental services to the optimization technique developer. On top of these services, a system model and source-code representation are provided. Further, framework-based optimization-technique implementations are encapsulated into self-contained entities exposing well-defined interfaces. This framework has been successfully used within the European Commission funded MNEMEE project. The hierarchical processing-step representation in MACCv2 allows for encapsulation of tasks at various granularity levels. For simplified reuse in future projects, the entire toolchain as well as individual optimization techniques have been represented as processing-step entities in terms of MACCv2. A common notion of target-platform structure and properties as well as inter-processing-step communication, is achieved via framework-provided services. The system-modeling approach and the framework show the right set of properties needed to support development of memory-aware optimization techniques. The MNEMEE project, continued research work, teaching activities and PhD theses have been successfully founded on approaches and the framework proposed in this dissertation

    Adaptive Merging on Phase Change Memory

    Full text link
    Indexing is a well-known database technique used to facilitate data access and speed up query processing. Nevertheless, the construction and modification of indexes are very expensive. In traditional approaches, all records in the database table are equally covered by the index. It is not effective, since some records may be queried very often and some never. To avoid this problem, adaptive merging has been introduced. The key idea is to create index adaptively and incrementally as a side-product of query processing. As a result, the database table is indexed partially depending on the query workload. This paper faces a problem of adaptive merging for phase change memory (PCM). The most important features of this memory type are: limited write endurance and high write latency. As a consequence, adaptive merging should be investigated from the scratch. We solve this problem in two steps. First, we apply several PCM optimization techniques to the traditional adaptive merging approach. We prove that the proposed method (eAM) outperforms a traditional approach by 60%. After that, we invent the framework for adaptive merging (PAM) and a new PCM-optimized index. It further improves the system performance by 20% for databases where search queries interleave with data modifications
    • …
    corecore