5 research outputs found
MEDEA: A Hybrid Shared-memory/Message-passing Multiprocessor NoC-based Architecture
The shared-memory model has been adopted, both for data exchange as well as synchronization using semaphores in almost every on-chip multiprocessor implementation, ranging from general purpose chip multiprocessors (CMPs) to domain specific multi-core graphics processing units (GPUs). Low-latency synchronization is desirable but is hard to achieve in practice due to the memory hierarchy. On the contrary, an explicit exchange of synchronization tokens among the processing elements through dedicated on-chip links would be beneficial for the overall system performance. In this paper we propose the Medea NoC-based framework, a hybrid shared-memory/message-passing approach. Medea has been modeled with a fast, cycle-accurate SystemC implementation enabling a fast system exploration varying several parameters like number and types of cores, cache size and policy and NoC features. In addition, every SystemC block has its RTL counterpart for physical implementation on FPGAs and ASICs. A parallel version of the Jacobi algorithm has been used as a test application to validate the metodology. Results confirm expectations about performance and effectiveness of system exploration and desig
MEDEA: A Hybrid Shared-memory/Message-passing Multiprocessor NoC-based Architecture
The shared-memory model has been adopted, both for data exchange as well as synchronization using semaphores in almost every on-chip multiprocessor implementation, ranging from general purpose chip multiprocessors (CMPs) to domain specific multi-core graphics processing units (GPUs). Low-latency synchronization is desirable but is hard to achieve in practice due to the memory hierarchy. On the contrary, an explicit exchange of synchronization tokens among the processing elements through dedicated on-chip links would be beneficial for the overall system performance. In this paper we propose the Medea NoC-based framework, a hybrid shared-memory/message-passing approach. Medea has been modeled with a fast, cycle-accurate SystemC implementation enabling a fast system exploration varying several parameters like number and types of cores, cache size and policy and NoC features. In addition, every SystemC block has its RTL counterpart for physical implementation on FPGAs and ASICs. A parallel version of the Jacobi algorithm has been used as a test application to validate the metodology. Results confirm expectations about performance and effectiveness of system exploration and design
Implementation Analysis of NoC: A MPSoC Trace-Driven Approach
This paper proposes to tackle Networks-on-chip design for MPSoC on the assumption that area and power overhead control is the primary goal. Analysis of topologies and routing strategies is performed by comparing two approaches, the wormhole and hot potato, both theoretically and using real synthesized data in 0.13ĂŽÂĽm technology. It is shown that the hot potato solution is competitive and possibly better for both occupation and dissipation, while its performance, measured by simulation on real multiprocessor traces, is not worse than the wormhole case
The NoCRay Graphic Accelerator: a Case-study for MP-SoC Network-on-Chip Design Methodology
The many-core design paradigm requires flexible and modular hardware and software components to provide the required scalability of next-generation on-chip multiproces- sor architectures. A multidisciplinary approach is necessary to consider all the interactions between the different components of the design. In this work a complete design methodology is proposed, tackling at once the aspects of hardware architecture, programming model and design automation. The proposed design flow has been used in the implementation of a multiprocessor Network-on-Chip based system, the NoCRay graphic accelerator. The system uses 8 Tensilica LX processors and has been physi- cally implemented on a Xilinx Virtex-4 LX-160 FPGA reporting a 17.3M equivalent gate-count. Performance are compared with a commercial general purpose processor and show good results considering the low frequency of the prototyp