6 research outputs found

    Power/Performance Hardware Optimization for Synchronization Intensive Applications in MPSoCs

    Get PDF
    This paper explores optimization techniques of the synchronization mechanisms for MPSoCs based on complex interconnect (Network-on-Chip), targeted at future poweref cient systems. The proposed solution is based on the idea of locally performing synchronization operations which require the continuous polling of a shared variable, thus featuring large contention (e.g. spin locks). We introduce a HW module, the Synchronization-operation Buffer (SB), which queues and manages the requests issued by the processors. Experimental validation has been carried out by using GRAPES, a cycle-accurate performance/power simulation platform. For 8-processor target architecture, we show that the proposed solution achieves up to 40% performance improvement and 30% energy saving with respect to synchronization based on directory-based coherence protocol

    Programming models for many-core architectures: a co-design approach

    Get PDF
    Common many-core processors contain tens of cores and distributed memory. Compared to a multicore system, which only has a few tightly coupled cores sharing a single bus and memory, several complex problems arise. Notably, many cores require many parallel tasks to fully utilize the cores, and communication happens in a distributed and decentralized way. Therefore, programming such a processor requires the application to exhibit concurrency. In contrast to a single-core application, a concurrent application has to deal with memory state changes with an observable (non-deterministic) intermediate state. The complexity introduced by these problems makes programming a many-core system with a single-core-based programming approach notoriously hard.\ud \ud The central concept of this thesis is that abstractions, which are related to (many-core) programming, are structured in a single platform model. A platform is a layered view of the hardware, a memory model, a concurrency model, a model of computation, and compile-time and run-time tooling. Then, a programming model is a specific view on this platform, which is used by a programmer. In this view, some details can be hidden from the programmer's perspective, some details cannot. For example, an operating system presents an infinite number of parallel virtual execution units to the application whilst it hides details regarding scheduling. On the other hand, a programmer usually has balance workload among threads by hand.\ud \ud This thesis presents modifications to different abstraction layers of a many-core architecture, in order to make the system as a whole more efficient, and to reduce the programming complexity. These modifications influence other abstractions in the platform, and especially the programming model. Therefore, this thesis applies co-design on all models. Notably, co-design of the memory model, concurrency model, and model of computation is required for a scalable implementation of lambda-calculus. Moreover, only the combination of requirements of the many-core hardware from one side and the concurrency model from the other leads to a memory model abstraction. Hence, this thesis shows that to cope with the current trends in many-core architectures from a programming perspective, it is essential and feasible to inspect and adapt all abstractions collectively

    On The Energy-efficiency Of Software Transactional Memory

    No full text
    Traditional software transactional memory designs are targeted towards performance and therefore little is known about their impact on energy consumption. We provide, in this paper, a comprehensive energy analysis of a standard STM design and propose novel scratchpad-based energy-aware STM design strategies. Experimental results collected through a state-of-the-art MPSoC simulation infrastructure show that our approach can achieve an energy improvement of up to 36% with regard to the base STM for applications characterized by short-lived transactions and relatively high abort rate. Copyright 2009 ACM.ACM SIGDA,Sociedade Brasileira de Computacao, SBC,IEEE Circuits and Systems Society, CAS,IEEE,ifipBanakar, R., Steinke, S., Lee, B.-S., Balakrishnan, M., Marwedel, P., Scratchpad memory: Design alternative for cache on-chip memory in embedded systems (2002) Proc. of CODES/ISSSBarroso, L.A., Holzle, U., The case for energy-proportional computing (2007) Computer, 40 (12), pp. 33-37Dice, D., Shalev, O., Shavit, N., Transactional locking II (2006) Proc. of the 20th DISCFelber, P., Fetzer, C., Riegel, T., Dynamic performance tuning of word-based software transactional memory (2008) Proc. of the 13th PPoPP, pp. 237-246Ferri, C., Viescas, A., Moreshet, T., Bahar, R.I., Herlihy, M., Energy efficient synchronization techniques for embedded architectures (2008) Proc. of the 18th GLSVLSI, pp. 435-440Harris, T., Cristal, A., Unsal, O., Ayguade, E., Gagliardi, F., Smith, B., Valero, M., Transactional memory: An overview (2007) IEEE Micro, 27 (3), pp. 8-29Larus, J.R., Rajwar, R., (2007) Transactional Memory, , Morgan & Claypool PublishersLi, J., Martinez, J.F., Huang, M.C., The thrifty barrier: Energy-aware synchronization in shared-memory multiprocessors (2004) Proc. of the HPCA, pp. 14-23Loghi, M., Poncino, M., Benini, L., Cycle-accurate power analysis for multiprocessor systems-on-a-chip (2004) Proc. of the 14th GLSVLSI, pp. 410-406Loghi, M., Poncino, M., Benini, L., Cache coherence tradeoffs in shared-memory MPSoCs (2006) ACM TECS, 5 (2), pp. 383-407Macii, A., Benini, L., Poncino, M., (2002) Memory Design Techniques for Low Energy Embedded SystemsMinh, C.C., Chung, J.W., Kozyrakis, C., Olukotun, K., STAMP: Stanford transactional applications for multi-processing (2008) Proc. of the IEEE IISWC, pp. 35-46Monchiero, M., Palermo, G., Silvano, C., Villa, O., Power/performance hardware optimization for synchronization intensive applications in MPSoCs (2006) Proc. of DATE, pp. 606-611Moreshet, T., Bahar, R.I., Herlihy, M., Energy reduction in multiprocessor systems using transactional memory (2005) Proc. of ISLPEDPark, S., Jiang, W., Zhou, Y., Adve, S., Managing energy-performance tradeoffs for multithreaded applications on multiprocessor architectures (2007) Proc. of the ACM SIGMETRICSSaha, B., Adl-Tabatabai, A.-R., Hudson, R.L., Minh, C.C., Hertzberg, B., McRT-STM: A high performance software transactional memory system for a multi-core runtime (2006) Proc. of the PPoPPSutter, H., Larus, J.R., Software and the concurrency revolution (2005) Queue, 3 (7), pp. 54-62Udayakumaran, S., Dominguez, A., Barua, R., Dynamic allocation for scratch-pad memory using compile-time decisions (2006) TECS, 5 (2), pp. 472-511Verma, M., Marwedel, P., (2007) Advanced Memory Optimization Techniques for Low-Power Embedded Processor
    corecore