Search CORE

831 research outputs found

A Non-blocking Buddy System for Scalable Memory Allocation on Multi-core Machines

Author: Ianni Mauro
Marotta Romolo
Pellegrini Alessandro
Quaglia Francesco
Scarselli Andrea
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2018
Field of study

Common implementations of core memory allocation components handle concurrent allocation/release requests by synchronizing threads via spin-locks. This approach is not prone to scale with large thread counts, a problem that has been addressed in the literature by introducing layered allocation services or replicating the core allocators - the bottom most ones within the layered architecture. Both these solutions tend to reduce the pressure of actual concurrent accesses to each individual core allocator. In this article we explore an alternative approach to scalability of memory allocation/release, which can be still combined with those literature proposals. We present a fully non-blocking buddy-system, that allows threads to proceed in parallel, and commit their allocations/releases unless a conflict is materialized while handling its metadata. Beyond improving scalability and performance it is resilient to performance degradation in face of concurrent accesses independently of the current level of fragmentation of the handled memory blocks

Crossref

Archivio della ricerca- Università di Roma La Sapienza

NBBS: A Non-blocking Buddy System for Multi-core Machines

Author: Ianni Mauro
Marotta Romolo
Pellegrini Alessandro
Quaglia Francesco
Scarselli Andrea
Publication venue
Publication date: 01/01/2019
Field of study

Common implementations of core memory allocation components, like the Linux buddy system, handle concurrent allocation/release requests by synchronizing threads via spinlocks. This approach is not prone to scale with large thread counts, a problem that has been addressed in the literature by introducing layered allocation services or replicating the core allocators—the bottom most ones within the layered architecture. Both these solutions tend to reduce the pressure of actual concurrent accesses to each individual core allocator. In this article we explore an alternative approach to scalability of memory allocation/release, which can be still combined with those literature proposals. We present a fully non-blocking buddy-system, where threads performing concurrent allocations/releases do not undergo any spinlock based synchronization. Our solution allows threads to proceed in parallel, and commit their allocations/releases unless a conflict is materialized while handling its metadata. Conflict detection relies on conventional atomic machine instructions in the Read-Modify-Write (RMW) class. Beyond improving scalability and performance, our solution can also avoid wasting clock cycles for spin-lock operations by threads that could in principle carry out their memory allocation/release in full concurrency. Thus, it is resilient to performance degradation—in face of concurrent accesses—independently of the current level of fragmentation of the handled memory blocks

Archivio della ricerca- Università di Roma La Sapienza

Fine-Grain Checkpointing with In-Cache-Line Logging

Author: Aksun David T.
Avni Hillel
Cohen Nachshon
Larus James R.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 02/02/2019
Field of study

Non-Volatile Memory offers the possibility of implementing high-performance, durable data structures. However, achieving performance comparable to well-designed data structures in non-persistent (transient) memory is difficult, primarily because of the cost of ensuring the order in which memory writes reach NVM. Often, this requires flushing data to NVM and waiting a full memory round-trip time. In this paper, we introduce two new techniques: Fine-Grained Checkpointing, which ensures a consistent, quickly recoverable data structure in NVM after a system failure, and In-Cache-Line Logging, an undo-logging technique that enables recovery of earlier state without requiring cache-line flushes in the normal case. We implemented these techniques in the Masstree data structure, making it persistent and demonstrating the ease of applying them to a highly optimized system and their low (5.9-15.4\%) runtime overhead cost.Comment: In 2019 Architectural Support for Programming Languages and Operating Systems (ASPLOS 19), April 13, 2019, Providence, RI, US

arXiv.org e-Print Archive

Crossref

Optimizing Memory Usage in L4-Based Microkernel

Author: Carabaş Mihai
Deaconescu Răzvan
Eftime Petre
Gheorghe Laura
Mogoşanu Lucian
Voiculescu Valentin Gabriel
Publication venue: Institute of Informatics, Slovak Academy of Sciences
Publication date: 29/11/2017
Field of study

Memory allocation is a critical aspect of any modern operating system kernel because it must run continuously for long periods of time, therefore memory leaks and inefficiency must be eliminated. This paper presents different memory management algorithms and their aplicability to an L4-based microkernel. We aim to reduce memory usage and increase the performance of allocation and deallocation of memory

Computing and Informatics (E-Journal - Institute of Informatics, SAS, Bratislava)

A Non-blocking Buddy System for Scalable Memory Allocation on Multi-core Machines

Author: Ianni Mauro
Marotta Romolo
Pellegrini Alessandro
Quaglia Francesco
Scarselli Andrea
Publication venue
Publication date: 01/01/2018
Field of study

Common implementations of core memory allocation components, like the Linux buddy system, handle concurrent allocation/release requests by synchronizing threads via spin-locks. This approach is clearly not prone to scale with large thread counts, a problem that has been addressed in the literature by introducing layered allocation services or replicating the core allocators-the bottom most ones within the layered architecture. Both these solutions tend to reduce the pressure of actual concurrent accesses to each individual core allocator. In this article we explore an alternative approach to scalability of memory allocation/release, which can be still combined with those literature proposals. Conflict detection relies on conventional atomic machine instructions in the Read-Modify-Write (RMW) class. Furthermore, beyond improving scalability and performance, it can also avoid wasting clock cycles for spin-lock operations by threads that could in principle carry out their memory allocation/release in full concurrency. Thus, it is resilient to performance degradation---in face of concurrent accesses---independently of the current level of fragmentation of the handled memory blocks

arXiv.org e-Print Archive

Crossref

ART

Archivio della ricerca- Università di Roma La Sapienza

EbbRT: a framework for building per-application library operating systems

Author: Appavoo Jonathan
Cadden James
Dong Han
Krieger Orran
Schatzberg Dan
Publication venue: Computer Science Department, Boston University
Publication date: 23/02/2016
Field of study

Efficient use of high speed hardware requires operating system components be customized to the application work- load. Our general purpose operating systems are ill-suited for this task. We present EbbRT, a framework for constructing per-application library operating systems for cloud applications. The primary objective of EbbRT is to enable high-performance in a tractable and maintainable fashion. This paper describes the design and implementation of EbbRT, and evaluates its ability to improve the performance of common cloud applications. The evaluation of the EbbRT prototype demonstrates memcached, run within a VM, can outperform memcached run on an unvirtualized Linux. The prototype evaluation also demonstrates an 14% performance improvement of a V8 JavaScript engine benchmark, and a node.js webserver that achieves a 50% reduction in 99th percentile latency compared to it run on Linux

Boston University Institutional Repository (OpenBU)