Search CORE

968 research outputs found

On the Impact of Memory Allocation on High-Performance Query Processing

Author: Durner Dominik
Leis Viktor
Neumann Thomas
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2019
Field of study

Somewhat surprisingly, the behavior of analytical query engines is crucially affected by the dynamic memory allocator used. Memory allocators highly influence performance, scalability, memory efficiency and memory fairness to other processes. In this work, we provide the first comprehensive experimental analysis on the impact of memory allocation for high-performance query engines. We test five state-of-the-art dynamic memory allocators and discuss their strengths and weaknesses within our DBMS. The right allocator can increase the performance of TPC-DS (SF 100) by 2.7x on a 4-socket Intel Xeon server

arXiv.org e-Print Archive

Crossref

A Non-blocking Buddy System for Scalable Memory Allocation on Multi-core Machines

Author: Ianni Mauro
Marotta Romolo
Pellegrini Alessandro
Quaglia Francesco
Scarselli Andrea
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2018
Field of study

Common implementations of core memory allocation components handle concurrent allocation/release requests by synchronizing threads via spin-locks. This approach is not prone to scale with large thread counts, a problem that has been addressed in the literature by introducing layered allocation services or replicating the core allocators - the bottom most ones within the layered architecture. Both these solutions tend to reduce the pressure of actual concurrent accesses to each individual core allocator. In this article we explore an alternative approach to scalability of memory allocation/release, which can be still combined with those literature proposals. We present a fully non-blocking buddy-system, that allows threads to proceed in parallel, and commit their allocations/releases unless a conflict is materialized while handling its metadata. Beyond improving scalability and performance it is resilient to performance degradation in face of concurrent accesses independently of the current level of fragmentation of the handled memory blocks

Crossref

Archivio della ricerca- Università di Roma La Sapienza

NBBS: A Non-blocking Buddy System for Multi-core Machines

Author: Ianni Mauro
Marotta Romolo
Pellegrini Alessandro
Quaglia Francesco
Scarselli Andrea
Publication venue
Publication date: 01/01/2019
Field of study

Common implementations of core memory allocation components, like the Linux buddy system, handle concurrent allocation/release requests by synchronizing threads via spinlocks. This approach is not prone to scale with large thread counts, a problem that has been addressed in the literature by introducing layered allocation services or replicating the core allocators—the bottom most ones within the layered architecture. Both these solutions tend to reduce the pressure of actual concurrent accesses to each individual core allocator. In this article we explore an alternative approach to scalability of memory allocation/release, which can be still combined with those literature proposals. We present a fully non-blocking buddy-system, where threads performing concurrent allocations/releases do not undergo any spinlock based synchronization. Our solution allows threads to proceed in parallel, and commit their allocations/releases unless a conflict is materialized while handling its metadata. Conflict detection relies on conventional atomic machine instructions in the Read-Modify-Write (RMW) class. Beyond improving scalability and performance, our solution can also avoid wasting clock cycles for spin-lock operations by threads that could in principle carry out their memory allocation/release in full concurrency. Thus, it is resilient to performance degradation—in face of concurrent accesses—independently of the current level of fragmentation of the handled memory blocks

Archivio della ricerca- Università di Roma La Sapienza

Multiple viewpoints oil computer supported team work: A case study on ambulance dispatch

Author: Blandford A
Connell L
Green T
Wong BLW
Publication venue: SPRINGER-VERLAG LONDON LTD
Publication date: 01/01/2002
Field of study

UCL Discovery

PRADA: Predictable Allocations by Deferred Actions

Author: Haupenthal Florian
Publication venue: OASIcs - OpenAccess Series in Informatics. 13th International Workshop on Worst-Case Execution Time Analysis
Publication date: 01/01/2013
Field of study

Modern hard real-time systems still employ static memory management. However, dynamic storage allocation (DSA) can improve the flexibility and readability of programs as well as drastically shorten their development times. But allocators introduce unpredictability that makes deriving tight bounds on an application\u27s worst-case execution time even more challenging. Especially their statically unpredictable influence on the cache, paired with zero knowledge about the cache set mapping of dynamically allocated objects leads to prohibitively large overestimations of execution times when dynamic memory allocation is employed. Recently, a cache-aware memory allocator, called CAMA, was proposed that gives strong guarantees about its cache influence and the cache set mapping of allocated objects. CAMA itself is rather complex due to its cache-aware implementations of split and merge operations. This paper proposes PRADA, a lighter but less general dynamic memory allocator with equally strong guarantees about its influence on the cache. We compare the memory consumption of PRADA and CAMA for a small set of real-time applications as well as synthetical (de-) allocation sequences to investigate whether a simpler approach to cache awareness is still sufficient for the current generation of real-time applications

Dagstuhl Research Online Publication Server

Pre- and post-scheduling memory allocation strategies on MPSoCs

Author: Aridhi Slaheddine
Desnos Karol
Nezan Jean François
Pelcat Maxime
Publication venue: HAL CCSD
Publication date: 31/05/2013
Field of study

6 pagesInternational audienceThis paper introduces and assesses a new method to allocate memory for applications implemented on a shared memory Multiprocessor System-on-Chip (MPSoC). This method first consists of deriving, from a Synchronous Dataflow (SDF) algorithm description, a Memory Exclusion Graph (MEG) that models all the memory objects of the application and their allocation constraints. Based on the MEG, memory allocation can be performed at three different stages of the implementation process: prior to the scheduling process, after an untimed multicore schedule is decided, or after a timed multicore schedule is decided. Each of these three alternatives offers a distinct trade-off between the amount of allocated memory and the flexibility of the application multicore execution. Tested use cases are based on descriptions of real applications and a set of random SDF graphs generated with the SDF For Free (SDF3) tool. Experimental results compare several allocation heuristics at the three implementation stages. They show that allocating memory after an untimed schedule of the application has been decided offers a reduced memory footprint as well as a flexible multicore execution

Storage Coalescing

Author: Defoe Delvin C.
Publication venue: Washington University Open Scholarship
Publication date: 10/10/2003
Field of study

Typically, when a program executes, it creates objects dynamically and requests storage for its objects from the underlying storage allocator. The patterns of such requests can potentially lead to internal fragmentation as well as external fragmentation. Internal fragmentation occurs when the storage allocator allocates a contiguous block of storage to a program, but the program uses only a fraction of that block to satisfy a request. The unused portion of that block is wasted since the allocator cannot use it to satisfy a subsequent allocation request. External fragmentation, on the other hand, concerns chunks of memory that reside between allocated blocks. External fragmentation becomes problematic when these chunks are not large enough to satisfy an allocation request individually. Consequently, these chunks exist as useless holes in the memory system. In this thesis, we present necessary and suﬃcient storage conditions for satisfying allocation and deallocation sequences for programs that run on systems that use a binary-buddy allocator. We show that these sequences can be serviced without the need for defragmentation. We also explore the eﬀects of buddy-coalescing on defragmentation and on overall program performance when using a defragmentation algorithm that implements buddy system policies. Our approach involves experimenting with Sun’s Java Virtual Machine and a buddy system simulator that embodies our defragmentation algorithm. We examine our algorithm in the presence of two approximate collection strategies, namely Reference Counting and Contaminated Garbage Collection, and one complete collection strategy - Mark and Sweep Garbage Collection. We analyze the eﬀectiveness of these approaches with regards to how well they manage storage when we alter the coalescing strategy of our simulator. Our analysis indicates that prompt coalescing minimizes defragmentation and delayed coalescing minimizes number of coalescing in the three collection approaches

Washington University St. Louis: Open Scholarship