831 research outputs found
A Non-blocking Buddy System for Scalable Memory Allocation on Multi-core Machines
Common implementations of core memory allocation components handle concurrent allocation/release requests by synchronizing threads via spin-locks. This approach is not prone to scale with large thread counts, a problem that has been addressed in the literature by introducing layered allocation services or replicating the core allocators - the bottom most ones within the layered architecture. Both these solutions tend to reduce the pressure of actual concurrent accesses to each individual core allocator. In this article we explore an alternative approach to scalability of memory allocation/release, which can be still combined with those literature proposals. We present a fully non-blocking buddy-system, that allows threads to proceed in parallel, and commit their allocations/releases unless a conflict is materialized while handling its metadata. Beyond improving scalability and performance it is resilient to performance degradation in face of concurrent accesses independently of the current level of fragmentation of the handled memory blocks
NBBS: A Non-blocking Buddy System for Multi-core Machines
Common implementations of core memory allocation components, like the Linux buddy system, handle concurrent allocation/release requests by synchronizing threads via spinlocks. This approach is not prone to scale with large thread counts, a problem that has been addressed in the literature by introducing layered allocation services or replicating the core allocators—the bottom most ones within the layered architecture. Both these solutions tend to reduce the pressure of actual concurrent accesses to each individual core allocator. In this article we explore an alternative approach to scalability of memory allocation/release, which can be still combined with those literature proposals. We present a fully non-blocking buddy-system, where threads performing concurrent allocations/releases do not undergo any spinlock based synchronization. Our solution allows threads to proceed in parallel, and commit their allocations/releases unless a conflict is materialized while handling its metadata. Conflict detection relies on conventional atomic machine instructions in the Read-Modify-Write (RMW) class. Beyond improving scalability and performance, our solution can also avoid wasting clock cycles for spin-lock operations by threads that could in principle carry out their memory allocation/release in full concurrency. Thus, it is resilient to performance degradation—in face of concurrent accesses—independently of the current level of fragmentation of the handled memory blocks
Fine-Grain Checkpointing with In-Cache-Line Logging
Non-Volatile Memory offers the possibility of implementing high-performance,
durable data structures. However, achieving performance comparable to
well-designed data structures in non-persistent (transient) memory is
difficult, primarily because of the cost of ensuring the order in which memory
writes reach NVM. Often, this requires flushing data to NVM and waiting a full
memory round-trip time.
In this paper, we introduce two new techniques: Fine-Grained Checkpointing,
which ensures a consistent, quickly recoverable data structure in NVM after a
system failure, and In-Cache-Line Logging, an undo-logging technique that
enables recovery of earlier state without requiring cache-line flushes in the
normal case. We implemented these techniques in the Masstree data structure,
making it persistent and demonstrating the ease of applying them to a highly
optimized system and their low (5.9-15.4\%) runtime overhead cost.Comment: In 2019 Architectural Support for Programming Languages and Operating
Systems (ASPLOS 19), April 13, 2019, Providence, RI, US
Optimizing Memory Usage in L4-Based Microkernel
Memory allocation is a critical aspect of any modern operating system kernel because it must run continuously for long periods of time, therefore memory leaks and inefficiency must be eliminated. This paper presents different memory management algorithms and their aplicability to an L4-based microkernel. We aim to reduce memory usage and increase the performance of allocation and deallocation of memory
A Non-blocking Buddy System for Scalable Memory Allocation on Multi-core Machines
Common implementations of core memory allocation components, like the Linux
buddy system, handle concurrent allocation/release requests by synchronizing
threads via spin-locks. This approach is clearly not prone to scale with large
thread counts, a problem that has been addressed in the literature by
introducing layered allocation services or replicating the core allocators-the
bottom most ones within the layered architecture. Both these solutions tend to
reduce the pressure of actual concurrent accesses to each individual core
allocator. In this article we explore an alternative approach to scalability of
memory allocation/release, which can be still combined with those literature
proposals. Conflict detection relies on conventional atomic machine
instructions in the Read-Modify-Write (RMW) class. Furthermore, beyond
improving scalability and performance, it can also avoid wasting clock cycles
for spin-lock operations by threads that could in principle carry out their
memory allocation/release in full concurrency. Thus, it is resilient to
performance degradation---in face of concurrent accesses---independently of the
current level of fragmentation of the handled memory blocks
EbbRT: a framework for building per-application library operating systems
Efficient use of high speed hardware requires operating system components be customized to the application work- load. Our general purpose operating systems are ill-suited for this task. We present EbbRT, a framework for constructing per-application library operating systems for cloud applications. The primary objective of EbbRT is to enable high-performance in a tractable and maintainable fashion. This paper describes the design and implementation of EbbRT, and evaluates its ability to improve the performance of common cloud applications. The evaluation of the EbbRT prototype demonstrates memcached, run within a VM, can outperform memcached run on an unvirtualized Linux. The prototype evaluation also demonstrates an 14% performance improvement of a V8 JavaScript engine benchmark, and a node.js webserver that achieves a 50% reduction in 99th percentile latency compared to it run on Linux
- …