5 research outputs found
A Wait-free Multi-word Atomic (1,N) Register for Large-scale Data Sharing on Multi-core Machines
We present a multi-word atomic (1,N) register for multi-core machines
exploiting Read-Modify-Write (RMW) instructions to coordinate the writer and
the readers in a wait-free manner. Our proposal, called Anonymous Readers
Counting (ARC), enables large-scale data sharing by admitting up to
concurrent readers on off-the-shelf 64-bits machines, as opposed to the most
advanced RMW-based approach which is limited to 58 readers. Further, ARC avoids
multiple copies of the register content when accessing it---this affects
classical register's algorithms based on atomic read/write operations on single
words. Thus it allows for higher scalability with respect to the register size.
Moreover, ARC explicitly reduces improves performance via a proper limitation
of RMW instructions in case of read operations, and by supporting constant time
for read operations and amortized constant time for write operations. A proof
of correctness of our register algorithm is also provided, together with
experimental data for a comparison with literature proposals. Beyond assessing
ARC on physical platforms, we carry out as well an experimentation on
virtualized infrastructures, which shows the resilience of wait-free
synchronization as provided by ARC with respect to CPU-steal times, proper of
more modern paradigms such as cloud computing.Comment: non
A Non-blocking Buddy System for Scalable Memory Allocation on Multi-core Machines
Common implementations of core memory allocation components handle concurrent allocation/release requests by synchronizing threads via spin-locks. This approach is not prone to scale with large thread counts, a problem that has been addressed in the literature by introducing layered allocation services or replicating the core allocators - the bottom most ones within the layered architecture. Both these solutions tend to reduce the pressure of actual concurrent accesses to each individual core allocator. In this article we explore an alternative approach to scalability of memory allocation/release, which can be still combined with those literature proposals. We present a fully non-blocking buddy-system, that allows threads to proceed in parallel, and commit their allocations/releases unless a conflict is materialized while handling its metadata. Beyond improving scalability and performance it is resilient to performance degradation in face of concurrent accesses independently of the current level of fragmentation of the handled memory blocks
NBBS: A Non-blocking Buddy System for Multi-core Machines
Common implementations of core memory allocation components, like the Linux buddy system, handle concurrent allocation/release requests by synchronizing threads via spinlocks. This approach is not prone to scale with large thread counts, a problem that has been addressed in the literature by introducing layered allocation services or replicating the core allocators—the bottom most ones within the layered architecture. Both these solutions tend to reduce the pressure of actual concurrent accesses to each individual core allocator. In this article we explore an alternative approach to scalability of memory allocation/release, which can be still combined with those literature proposals. We present a fully non-blocking buddy-system, where threads performing concurrent allocations/releases do not undergo any spinlock based synchronization. Our solution allows threads to proceed in parallel, and commit their allocations/releases unless a conflict is materialized while handling its metadata. Conflict detection relies on conventional atomic machine instructions in the Read-Modify-Write (RMW) class. Beyond improving scalability and performance, our solution can also avoid wasting clock cycles for spin-lock operations by threads that could in principle carry out their memory allocation/release in full concurrency. Thus, it is resilient to performance degradation—in face of concurrent accesses—independently of the current level of fragmentation of the handled memory blocks
A Wait-Free Multi-word Atomic (1,N) Register for Large-Scale Data Sharing on Multi-core Machines
We present a multi-word atomic (1,N) register for multi-core machines exploiting Read-Modify-Write (RMW) instructions to coordinate the writer and the readers in a wait free manner. Our proposal, called Anonymous Readers Counting (ARC), enables large-scale data sharing by admitting up to 2(32)-2 concurrent readers on off-the-shelf 64-bit machines, as opposed to the most advanced RMW-based approach which is limited to 58 readers. Further, ARC avoids multiple copies of the register content while accessing it-this affects classical register's algorithms based on atomic read/write operations on single words. Thus, ARC allows for higher scalability with respect to the register size
A Non-blocking Buddy System for Scalable Memory Allocation on Multi-core Machines
Common implementations of core memory allocation components, like the Linux
buddy system, handle concurrent allocation/release requests by synchronizing
threads via spin-locks. This approach is clearly not prone to scale with large
thread counts, a problem that has been addressed in the literature by
introducing layered allocation services or replicating the core allocators-the
bottom most ones within the layered architecture. Both these solutions tend to
reduce the pressure of actual concurrent accesses to each individual core
allocator. In this article we explore an alternative approach to scalability of
memory allocation/release, which can be still combined with those literature
proposals. Conflict detection relies on conventional atomic machine
instructions in the Read-Modify-Write (RMW) class. Furthermore, beyond
improving scalability and performance, it can also avoid wasting clock cycles
for spin-lock operations by threads that could in principle carry out their
memory allocation/release in full concurrency. Thus, it is resilient to
performance degradation---in face of concurrent accesses---independently of the
current level of fragmentation of the handled memory blocks