1,068 research outputs found
Evaluating Cache Coherent Shared Virtual Memory for Heterogeneous Multicore Chips
The trend in industry is towards heterogeneous multicore processors (HMCs),
including chips with CPUs and massively-threaded throughput-oriented processors
(MTTOPs) such as GPUs. Although current homogeneous chips tightly couple the
cores with cache-coherent shared virtual memory (CCSVM), this is not the
communication paradigm used by any current HMC. In this paper, we present a
CCSVM design for a CPU/MTTOP chip, as well as an extension of the pthreads
programming model, called xthreads, for programming this HMC. Our goal is to
evaluate the potential performance benefits of tightly coupling heterogeneous
cores with CCSVM
Insights into the Fallback Path of Best-Effort Hardware Transactional Memory Systems
DOI 10.1007/978-3-319-43659-3Current industry proposals for Hardware Transactional Memory (HTM) focus on best-effort solutions (BE-HTM) where hardware limits are imposed on transactions. These designs may show a significant performance degradation due
to high contention scenarios and different hardware and operating system limitations that abort transactions, e.g. cache overflows, hardware and software exceptions, etc. To deal with these events and to ensure forward progress, BE-HTM systems usually provide a software fallback path to execute a lock-based version of the code.
In this paper, we propose a hardware implementation of an irrevocability mechanism as an alternative to the software fallback path to gain insight into the hardware improvements that could enhance the execution of such a fallback. Our mechanism anticipates the abort that causes the transaction serialization, and stalls other transactions in the system so that transactional work loss is mini-
mized. In addition, we evaluate the main software fallback path approaches and propose the use of ticket locks that hold precise information of the number of transactions waiting to enter the fallback. Thus, the separation of transactional
and fallback execution can be achieved in a precise manner. The evaluation is carried out using the Simics/GEMS simulator and the complete range of STAMP transactional suite benchmarks. We obtain significant performance benefits of around twice the speedup and an abort reduction of 50% over the software fallback path for a number of benchmarks.Universidad de Málaga. Campus de Excelencia Internacional AndalucÃa Tech
A Wait-free Multi-word Atomic (1,N) Register for Large-scale Data Sharing on Multi-core Machines
We present a multi-word atomic (1,N) register for multi-core machines
exploiting Read-Modify-Write (RMW) instructions to coordinate the writer and
the readers in a wait-free manner. Our proposal, called Anonymous Readers
Counting (ARC), enables large-scale data sharing by admitting up to
concurrent readers on off-the-shelf 64-bits machines, as opposed to the most
advanced RMW-based approach which is limited to 58 readers. Further, ARC avoids
multiple copies of the register content when accessing it---this affects
classical register's algorithms based on atomic read/write operations on single
words. Thus it allows for higher scalability with respect to the register size.
Moreover, ARC explicitly reduces improves performance via a proper limitation
of RMW instructions in case of read operations, and by supporting constant time
for read operations and amortized constant time for write operations. A proof
of correctness of our register algorithm is also provided, together with
experimental data for a comparison with literature proposals. Beyond assessing
ARC on physical platforms, we carry out as well an experimentation on
virtualized infrastructures, which shows the resilience of wait-free
synchronization as provided by ARC with respect to CPU-steal times, proper of
more modern paradigms such as cloud computing.Comment: non
Lagarto I RISC-V Multi-core: Research Challenges to Build and Integrate a Network-on-Chip
Current compute-intensive applications largely exceed the resources of single-core processors. To face this problem, multi-core processors along with parallel computing techniques have become a solution to increase the computational performance. Likewise, multi-processors are fundamental to support new technologies and new science applications challenges. A specific objective of the Lagarto project developed at the National Polytechnic Institute of Mexico is to generate an ecosystem of high-performance processors for the industry and HPC in Mexico, supporting new technologies and scientific applications. This work presents the first approach of the Lagarto project to the design of multi-core processors and the research challenges to build an infrastructure that allows the flagship core of the Lagarto project to scale to multi- and many-cores. Using the OpenPiton platform with the Ariane RISC-V core, a functional tile has been built, integrating a Lagarto I core with memory coherence that executes atomic instructions, and a NoC that allows scaling the project to many-core versions. This work represents the initial state of the design of mexican multi-and many-cores processors
- …