Search CORE

34,525 research outputs found

Multicore-aware parallel temporal blocking of stencil codes for shared and distributed memory

Author: Hager Georg
Wellein Gerhard
Wittmann Markus
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 22/12/2009
Field of study

New algorithms and optimization techniques are needed to balance the accelerating trend towards bandwidth-starved multicore chips. It is well known that the performance of stencil codes can be improved by temporal blocking, lessening the pressure on the memory interface. We introduce a new pipelined approach that makes explicit use of shared caches in multicore environments and minimizes synchronization and boundary overhead. For clusters of shared-memory nodes we demonstrate how temporal blocking can be employed successfully in a hybrid shared/distributed-memory environment.Comment: 9 pages, 6 figure

arXiv.org e-Print Archive

Recommended from our members

Percolation scheduling for non-VLIW machines

Author: Brownhill Carrie J.
Nicolau Alexandru
Publication venue: eScholarship, University of California
Publication date: 15/01/1990
Field of study

Percolation Scheduling, a technique for compile-time code parallelization, has proven very successful for exploiting fine-grain irregular parallelism in ordinary programs. Currently, this technology is targeted only to VLIW (Very Long Instruction Word) machines, which have the advantages of 'free' synchronization and communication. Shared memory multi-processors can simulate the execution characteristics of VLIW machines with the use of static barriers. Preliminary results show that Percolation Scheduling can be used with good results on this type of architecture by increasing the granularity from operation level to source statement level, removing any redundant synchronization, and providing an efficient implementation of multi-way jumps

eScholarship - University of California

Shared-Semaphored Cache Implementation for Parallel Program Execution in Multi-Core Systems

Author: Milik Adam
Walichiewicz Michał
Publication venue: Electronics and Telecommunications Committee
Publication date: 18/05/2023
Field of study

The paper brings forward the idea of multi-threadedcomputation synchronization based on the shared semaphoredcache in the multi-core CPUs. It is dedicated to the implementationof multi-core PLC control, embedded solution or parallelcomputation of models described using hardware description languages.The shared semaphored cache is implemented as guardedmemory cells within a dedicated section of the cache memory thatis shared by multiple cores. This enables the cores to speed up thedata exchange and seamlessly synchronize the computation. Theidea has been verified by creating a multi-core system model usingVerilog HDL. The simulation of task synchronization methodsallows for proving the benefits of shared semaphored memorycells over standard synchronization methods. The proposed ideaenhances the computation in the algorithms that consist ofrelatively short tasks that can be processed in parallel andrequires fast synchronization mechanisms to avoid data raceconditions