968 research outputs found

    On the Impact of Memory Allocation on High-Performance Query Processing

    Full text link
    Somewhat surprisingly, the behavior of analytical query engines is crucially affected by the dynamic memory allocator used. Memory allocators highly influence performance, scalability, memory efficiency and memory fairness to other processes. In this work, we provide the first comprehensive experimental analysis on the impact of memory allocation for high-performance query engines. We test five state-of-the-art dynamic memory allocators and discuss their strengths and weaknesses within our DBMS. The right allocator can increase the performance of TPC-DS (SF 100) by 2.7x on a 4-socket Intel Xeon server

    A Non-blocking Buddy System for Scalable Memory Allocation on Multi-core Machines

    Get PDF
    Common implementations of core memory allocation components handle concurrent allocation/release requests by synchronizing threads via spin-locks. This approach is not prone to scale with large thread counts, a problem that has been addressed in the literature by introducing layered allocation services or replicating the core allocators - the bottom most ones within the layered architecture. Both these solutions tend to reduce the pressure of actual concurrent accesses to each individual core allocator. In this article we explore an alternative approach to scalability of memory allocation/release, which can be still combined with those literature proposals. We present a fully non-blocking buddy-system, that allows threads to proceed in parallel, and commit their allocations/releases unless a conflict is materialized while handling its metadata. Beyond improving scalability and performance it is resilient to performance degradation in face of concurrent accesses independently of the current level of fragmentation of the handled memory blocks

    NBBS: A Non-blocking Buddy System for Multi-core Machines

    Get PDF
    Common implementations of core memory allocation components, like the Linux buddy system, handle concurrent allocation/release requests by synchronizing threads via spinlocks. This approach is not prone to scale with large thread counts, a problem that has been addressed in the literature by introducing layered allocation services or replicating the core allocators—the bottom most ones within the layered architecture. Both these solutions tend to reduce the pressure of actual concurrent accesses to each individual core allocator. In this article we explore an alternative approach to scalability of memory allocation/release, which can be still combined with those literature proposals. We present a fully non-blocking buddy-system, where threads performing concurrent allocations/releases do not undergo any spinlock based synchronization. Our solution allows threads to proceed in parallel, and commit their allocations/releases unless a conflict is materialized while handling its metadata. Conflict detection relies on conventional atomic machine instructions in the Read-Modify-Write (RMW) class. Beyond improving scalability and performance, our solution can also avoid wasting clock cycles for spin-lock operations by threads that could in principle carry out their memory allocation/release in full concurrency. Thus, it is resilient to performance degradation—in face of concurrent accesses—independently of the current level of fragmentation of the handled memory blocks

    PRADA: Predictable Allocations by Deferred Actions

    Get PDF
    Modern hard real-time systems still employ static memory management. However, dynamic storage allocation (DSA) can improve the flexibility and readability of programs as well as drastically shorten their development times. But allocators introduce unpredictability that makes deriving tight bounds on an application\u27s worst-case execution time even more challenging. Especially their statically unpredictable influence on the cache, paired with zero knowledge about the cache set mapping of dynamically allocated objects leads to prohibitively large overestimations of execution times when dynamic memory allocation is employed. Recently, a cache-aware memory allocator, called CAMA, was proposed that gives strong guarantees about its cache influence and the cache set mapping of allocated objects. CAMA itself is rather complex due to its cache-aware implementations of split and merge operations. This paper proposes PRADA, a lighter but less general dynamic memory allocator with equally strong guarantees about its influence on the cache. We compare the memory consumption of PRADA and CAMA for a small set of real-time applications as well as synthetical (de-) allocation sequences to investigate whether a simpler approach to cache awareness is still sufficient for the current generation of real-time applications

    Pre- and post-scheduling memory allocation strategies on MPSoCs

    Get PDF
    6 pagesInternational audienceThis paper introduces and assesses a new method to allocate memory for applications implemented on a shared memory Multiprocessor System-on-Chip (MPSoC). This method first consists of deriving, from a Synchronous Dataflow (SDF) algorithm description, a Memory Exclusion Graph (MEG) that models all the memory objects of the application and their allocation constraints. Based on the MEG, memory allocation can be performed at three different stages of the implementation process: prior to the scheduling process, after an untimed multicore schedule is decided, or after a timed multicore schedule is decided. Each of these three alternatives offers a distinct trade-off between the amount of allocated memory and the flexibility of the application multicore execution. Tested use cases are based on descriptions of real applications and a set of random SDF graphs generated with the SDF For Free (SDF3) tool. Experimental results compare several allocation heuristics at the three implementation stages. They show that allocating memory after an untimed schedule of the application has been decided offers a reduced memory footprint as well as a flexible multicore execution

    Storage Coalescing

    Get PDF
    Typically, when a program executes, it creates objects dynamically and requests storage for its objects from the underlying storage allocator. The patterns of such requests can potentially lead to internal fragmentation as well as external fragmentation. Internal fragmentation occurs when the storage allocator allocates a contiguous block of storage to a program, but the program uses only a fraction of that block to satisfy a request. The unused portion of that block is wasted since the allocator cannot use it to satisfy a subsequent allocation request. External fragmentation, on the other hand, concerns chunks of memory that reside between allocated blocks. External fragmentation becomes problematic when these chunks are not large enough to satisfy an allocation request individually. Consequently, these chunks exist as useless holes in the memory system. In this thesis, we present necessary and sufficient storage conditions for satisfying allocation and deallocation sequences for programs that run on systems that use a binary-buddy allocator. We show that these sequences can be serviced without the need for defragmentation. We also explore the effects of buddy-coalescing on defragmentation and on overall program performance when using a defragmentation algorithm that implements buddy system policies. Our approach involves experimenting with Sun’s Java Virtual Machine and a buddy system simulator that embodies our defragmentation algorithm. We examine our algorithm in the presence of two approximate collection strategies, namely Reference Counting and Contaminated Garbage Collection, and one complete collection strategy - Mark and Sweep Garbage Collection. We analyze the effectiveness of these approaches with regards to how well they manage storage when we alter the coalescing strategy of our simulator. Our analysis indicates that prompt coalescing minimizes defragmentation and delayed coalescing minimizes number of coalescing in the three collection approaches
    • …
    corecore