34,525 research outputs found

    Multicore-aware parallel temporal blocking of stencil codes for shared and distributed memory

    Full text link
    New algorithms and optimization techniques are needed to balance the accelerating trend towards bandwidth-starved multicore chips. It is well known that the performance of stencil codes can be improved by temporal blocking, lessening the pressure on the memory interface. We introduce a new pipelined approach that makes explicit use of shared caches in multicore environments and minimizes synchronization and boundary overhead. For clusters of shared-memory nodes we demonstrate how temporal blocking can be employed successfully in a hybrid shared/distributed-memory environment.Comment: 9 pages, 6 figure

    Shared-Semaphored Cache Implementation for Parallel Program Execution in Multi-Core Systems

    Get PDF
    The paper brings forward the idea of multi-threadedcomputation synchronization based on the shared semaphoredcache in the multi-core CPUs. It is dedicated to the implementationof multi-core PLC control, embedded solution or parallelcomputation of models described using hardware description languages.The shared semaphored cache is implemented as guardedmemory cells within a dedicated section of the cache memory thatis shared by multiple cores. This enables the cores to speed up thedata exchange and seamlessly synchronize the computation. Theidea has been verified by creating a multi-core system model usingVerilog HDL. The simulation of task synchronization methodsallows for proving the benefits of shared semaphored memorycells over standard synchronization methods. The proposed ideaenhances the computation in the algorithms that consist ofrelatively short tasks that can be processed in parallel andrequires fast synchronization mechanisms to avoid data raceconditions
    • …
    corecore