8 research outputs found

    SIMULATION PLATFORM IN TLM OF SYSTEM ON CHIP USING RETARGETABLE ISS

    Get PDF
    System-on-Chip  (SoC) designs are increasingly becoming more complex. One of the major constraints is the time to market New design methods are necessary, and the tendency is with the integration of the software and hardware parts on the same chip.  Efficient on-chip communication architectures are critical for achieving desired performance in these systems  Thus, the development of codesign’s modern methods and  the appearance of hardware description languages  (HDL) based on C/C++ such as SystemC or specC allowing to employ the same language to describe the software and the hardware, and returning of this fact easier and more effective Co-simulation. These methods would be able to generate an optimal solution starting from a functional specification by reducing the time and the cost of the design. Thus, one of the main objectives of this paper is the development  of  a SystemC  platform  for multiprocessors architectural exploration at  the compromise  level  (TLM) by using SystemC/TLM.  It must  lead  to partition  system  into hw/sw and also  to validate  it by simulation or  to move easily modules from hardware to software (or vice versa) during the architectural exploration. Except for the software task priorities that could be modified, we only need to recompile and simulate 

    MPSoCBench : um framework para avaliação de ferramentas e metodologias para sistemas multiprocessados em chip

    Get PDF
    Orientador: Rodolfo Jardim de AzevedoTese (doutorado) - Universidade Estadual de Campinas, Instituto de ComputaçãoResumo: Recentes metodologias e ferramentas de projetos de sistemas multiprocessados em chip (MPSoC) aumentam a produtividade por meio da utilização de plataformas baseadas em simuladores, antes de definir os últimos detalhes da arquitetura. No entanto, a simulação só é eficiente quando utiliza ferramentas de modelagem que suportem a descrição do comportamento do sistema em um elevado nível de abstração. A escassez de plataformas virtuais de MPSoCs que integrem hardware e software escaláveis nos motivou a desenvolver o MPSoCBench, que consiste de um conjunto escalável de MPSoCs incluindo quatro modelos de processadores (PowerPC, MIPS, SPARC e ARM), organizado em plataformas com 1, 2, 4, 8, 16, 32 e 64 núcleos, cross-compiladores, IPs, interconexões, 17 aplicações paralelas e estimativa de consumo de energia para os principais componentes (processadores, roteadores, memória principal e caches). Uma importante demanda em projetos MPSoC é atender às restrições de consumo de energia o mais cedo possível. Considerando que o desempenho do processador está diretamente relacionado ao consumo, há um crescente interesse em explorar o trade-off entre consumo de energia e desempenho, tendo em conta o domínio da aplicação alvo. Técnicas de escalabilidade dinâmica de freqüência e voltagem fundamentam-se em gerenciar o nível de tensão e frequência da CPU, permitindo que o sistema alcance apenas o desempenho suficiente para processar a carga de trabalho, reduzindo, consequentemente, o consumo de energia. Para explorar a eficiência energética e desempenho, foram adicionados recursos ao MPSoCBench, visando explorar escalabilidade dinâmica de voltaegem e frequência (DVFS) e foram validados três mecanismos com base na estimativa dinâmica de energia e taxa de uso de CPUAbstract: Recent design methodologies and tools aim at enhancing the design productivity by providing a software development platform before the definition of the final Multiprocessor System on Chip (MPSoC) architecture details. However, simulation can only be efficiently performed when using a modeling and simulation engine that supports system behavior description at a high abstraction level. The lack of MPSoC virtual platform prototyping integrating both scalable hardware and software in order to create and evaluate new methodologies and tools motivated us to develop the MPSoCBench, a scalable set of MPSoCs including four different ISAs (PowerPC, MIPS, SPARC, and ARM) organized in platforms with 1, 2, 4, 8, 16, 32, and 64 cores, cross-compilers, IPs, interconnections, 17 parallel version of software from well-known benchmarks, and power consumption estimation for main components (processors, routers, memory, and caches). An important demand in MPSoC designs is the addressing of energy consumption constraints as early as possible. Whereas processor performance comes with a high power cost, there is an increasing interest in exploring the trade-off between power and performance, taking into account the target application domain. Dynamic Voltage and Frequency Scaling techniques adaptively scale the voltage and frequency levels of the CPU allowing it to reach just enough performance to process the system workload while meeting throughput constraints, and thereby, reducing the energy consumption. To explore this wide design space for energy efficiency and performance, both for hardware and software components, we provided MPSoCBench features to explore dynamic voltage and frequency scalability (DVFS) and evaluated three mechanisms based on energy estimation and CPU usage rateDoutoradoCiência da ComputaçãoDoutora em Ciência da Computaçã

    Exploiting software transactional memory in the context of asymmetric architectures

    Get PDF
    Orientador: Paulo Cesar CentoducatteTese (doutorado) - Universidade Estadual de Campinas, Instituto de ComputaçãoResumo: A adoção dos microprocessadores com múltiplos núcleos de execução pela indústria semicondutora tem criado uma crescente necessidade por novas linguagens, metodologias e ferramentas que tornem o desenvolvimento de sistemas concorrentes mais rápido, eficiente e acessível aos programadores de todos os níveis. Uma das principais dificuldades em programação concorrente com memória compartilhada é garantir a correta sincronização do código, evitando assim condições de corrida que podem levar o sistema a um estado inconsistente. A sincronização tem sido tradicionalmente realizada através de métodos baseados em travas, reconhecidos amplamente por serem de difícil uso e pelas anomalias causadas. Um novo mecanismo, conhecido como memória transacional (TM), tem sido alvo de muita pesquisa recentemente e promete simplificar o processo de sincronização, além de possibilitar maior oportunidade para extração de paralelismo e consequente desempenho. O cerne desta tese é formado por três trabalhos desenvolvidos no contexto dos sistemas de memória transacional em software (STM). Primeiramente, apresentamos uma implementação de STM para processadores assimétricos, usando a arquitetura Cell/B.E. como foco. Como principal resultado, constatamos que o uso de sistemas transacionais em arquiteturas assimétricas também é promissor, principalmente pelo fator escalabilidade. No segundo trabalho, adotamos uma abordagem diferente e sugerimos um sistema de STM especialmente voltado para o domínio de jogos computacionais. O principal motivo que nos levou nesta direção é o baixo desempenho das implementações atuais de STM. Um estudo de caso conduzido a partir de um jogo complexo mostra a eficácia do sistema proposto. Finalmente, apresentamos pela primeira vez uma caracterização do consumo de energia de um sistema de STM considerado estado da arte. Além da caracterização, também propomos uma técnica para redução do consumo em casos de alta contenção. Resultados obtidos a partir dessa técnica revelam ganhos de até 87% no consumo de energiaAbstract: The shift towards multicore processors taken by the semiconductor industry has initiated an era in which new languages, methodologies and tools are of paramount importance to the development of efficient concurrent systems that can be built in a timely way by all kinds of programmers. One of the main obstacles faced by programmers when dealing with shared memory programming concerns the use of synchronization mechanisms so as to avoid race conditions that could possibly lead the system to an inconsistent state. Synchronization has been traditionally achieved by means of locks (or variations thereof), widely known by their anomalies and hard-to-get-it-right facets. A new mechanism, known as transactional memory (TM), has recently been the focus of a lot of research and shows potential to simplify code synchronization as well as delivering more parallelism and, therefore, better performance. This thesis presents three works focused on different aspects of software transactional memory (STM) systems. Firstly, we show an STM implementation for asymmetric processors, focusing on the architecture of Cell/B.E. As an important result, we find out that memory transactions are indeed promising for asymmetric architectures, specially due to their scalability. Secondly, we take a different approach to STM implementation by devising a system specially targeted at computer games. The decision was guided by poor performance figures usually seen on current STM implementations. We also conduct a case study using a complex game that effectively shows the system's efficiency. Finally, we present the energy consumption characterization of a state-of-the-art STM for the first time. Based on the observed characterization, we also propose a technique aimed at reducing energy consumption in highly contended scenarios. Our results show that the technique is indeed effective in such cases, improving the energy consumption by up to 87%DoutoradoSistemas de ComputaçãoDoutor em Ciência da Computaçã

    Effiziente externe Beobachtung von CPU-Aktivitäten auf SoCs

    Get PDF
    Die umfassende Beobachtbarkeit von System‐on‐Chips (SoCs) ist eine wichtige Voraussetzung für das effiziente Testen und Debuggen eingebetteter Systeme. Ausgehend von einer Analyse verschiedener Anwendungsfälle ergibt sich ein Katalog von Anforderungen an die Beobachtbarkeit von SoCs. Ein wichtiges Kriterium ist hier die Vollständigkeit der Beobachtung und umfasst die Aktivitäten der CPU (ausgeführte Instruktionen, gelesene und geschriebene Daten, Verhalten des Caches, Ausführungszeiten), des Bussystems und von Umgebungsbedingungen. Weitere Kriterien sind die Echtzeitfähigkeit und die Kontinuität der Beobachtung sowie die gleichzeitige Durchführung verschiedener Beobachtungsaufgaben. Dabei soll es zu einer möglichst geringen Beeinflussung des SoCs kommen. Weitere wichtige Aspekt sind die Kosten der Lösung, die Universalität, die Skalierbarkeit sowie die Latenz der Verfügbarkeit der Beobachtungsergebnisse. Für viele Anwendungen, besonders in sicherheitskritischen Bereichen, muss zudem nachgewiesen werden, dass das Beobachtungsverfahren kein Fehlverhalten des SoCs bewirkt bzw. ein solches maskiert. Eine besondere Herausforderung stellen Multiprozessor‐SoCs (MPSoCs) dar, da hier die Kommunikation zwischen den einzelnen CPUs im Inneren des SoC stattfindet und entsprechend schwierig für einen externen Bobachter sichtbar zu machen ist. Der Stand der Technik zur Beobachtung von SoCs wird im Wesentlichen durch zwei Verfahren dargestellt. Bei der Software‐Instrumentierung wird zum funktionalen Programmcode zusätzlicher Code hinzugefügt, welcher zur Beobachtung des Programms dient. Diese Methode ist einfach und universell anwendbar, erfüllt aber die genannten Kriterien nur sehr eingeschränkt. Nachteilig ist hier der Ressourcenverbrauch im Falle des Verbleibs der Instrumentierung im fertigen Produkt. Wird die Instrumentierung nur temporär dem Code hinzugefügt, muss sichergestellt werden, dass das Beobachtungsergebnis auch für den finalen Code anwendbar ist – was besonders bei ressourcen‐abhängigen Integrationstests nur schwierig erfüllbar ist. Eine alternative Lösung stellt eine spezielle Hardware‐Unterstützung in SoCs („embedded Trace“) dar. Hier werden im SoC Zustandsinformationen (z.B. Taskwechsel, ausgeführte Instruktionen, Datentransfers) gesammelt und mittels Trace‐Nachrichten an den Beobachter übermittelt. Dabei stellt die Bandbreite, die zur Ausgabe der Trace‐Nachrichten vom SoC verfügbar ist, ein entscheidendes Nadelöhr dar ‐ im SoC sind viel mehr den Beobachter interessierende Informationen verfügbar als nach außen transferiert werden können. Damit haben beide dem gegenwärtige Stand der Technik entsprechende Beobachtungsverfahren eine Reihe von Einschränkungen, die sich besonders bei der Vollständigkeit der Beobachtung, der Flexibilität, der Kontinuität und der Unterstützung von MPSoCs zeigen. In dieser Arbeit wird nun ein neuer Ansatz vorgestellt, welcher gegenüber dem Stand der Technik in einigen Bereichen deutliche Verbesserungen bietet. Dabei werden die Trace‐Daten nicht vom zu beobachtenden SoC direkt, sondern aus einer parallel mitlaufenden Emulation gewonnen. Die Bandbreite der für die Synchronisation der Emulation erforderlichen Daten ist in vielen Fällen deutlich geringer als bei der Ausgabe von umfassenden Trace‐Nachrichten mittels „embedded Trace“‐Lösungen. Gleichzeitig ist eine vollständige, äußerst detaillierte Beobachtung der Vorgänge innerhalb des SoC möglich. Das neue Beobachtungsverfahren wurde mittels verschiedener FPGA-basierter Implementierungen evaluiert, hier konnte auch die Anwendbarkeit für MPSoCs gezeigt werden

    On The Energy-efficiency Of Software Transactional Memory

    No full text
    Traditional software transactional memory designs are targeted towards performance and therefore little is known about their impact on energy consumption. We provide, in this paper, a comprehensive energy analysis of a standard STM design and propose novel scratchpad-based energy-aware STM design strategies. Experimental results collected through a state-of-the-art MPSoC simulation infrastructure show that our approach can achieve an energy improvement of up to 36% with regard to the base STM for applications characterized by short-lived transactions and relatively high abort rate. Copyright 2009 ACM.ACM SIGDA,Sociedade Brasileira de Computacao, SBC,IEEE Circuits and Systems Society, CAS,IEEE,ifipBanakar, R., Steinke, S., Lee, B.-S., Balakrishnan, M., Marwedel, P., Scratchpad memory: Design alternative for cache on-chip memory in embedded systems (2002) Proc. of CODES/ISSSBarroso, L.A., Holzle, U., The case for energy-proportional computing (2007) Computer, 40 (12), pp. 33-37Dice, D., Shalev, O., Shavit, N., Transactional locking II (2006) Proc. of the 20th DISCFelber, P., Fetzer, C., Riegel, T., Dynamic performance tuning of word-based software transactional memory (2008) Proc. of the 13th PPoPP, pp. 237-246Ferri, C., Viescas, A., Moreshet, T., Bahar, R.I., Herlihy, M., Energy efficient synchronization techniques for embedded architectures (2008) Proc. of the 18th GLSVLSI, pp. 435-440Harris, T., Cristal, A., Unsal, O., Ayguade, E., Gagliardi, F., Smith, B., Valero, M., Transactional memory: An overview (2007) IEEE Micro, 27 (3), pp. 8-29Larus, J.R., Rajwar, R., (2007) Transactional Memory, , Morgan & Claypool PublishersLi, J., Martinez, J.F., Huang, M.C., The thrifty barrier: Energy-aware synchronization in shared-memory multiprocessors (2004) Proc. of the HPCA, pp. 14-23Loghi, M., Poncino, M., Benini, L., Cycle-accurate power analysis for multiprocessor systems-on-a-chip (2004) Proc. of the 14th GLSVLSI, pp. 410-406Loghi, M., Poncino, M., Benini, L., Cache coherence tradeoffs in shared-memory MPSoCs (2006) ACM TECS, 5 (2), pp. 383-407Macii, A., Benini, L., Poncino, M., (2002) Memory Design Techniques for Low Energy Embedded SystemsMinh, C.C., Chung, J.W., Kozyrakis, C., Olukotun, K., STAMP: Stanford transactional applications for multi-processing (2008) Proc. of the IEEE IISWC, pp. 35-46Monchiero, M., Palermo, G., Silvano, C., Villa, O., Power/performance hardware optimization for synchronization intensive applications in MPSoCs (2006) Proc. of DATE, pp. 606-611Moreshet, T., Bahar, R.I., Herlihy, M., Energy reduction in multiprocessor systems using transactional memory (2005) Proc. of ISLPEDPark, S., Jiang, W., Zhou, Y., Adve, S., Managing energy-performance tradeoffs for multithreaded applications on multiprocessor architectures (2007) Proc. of the ACM SIGMETRICSSaha, B., Adl-Tabatabai, A.-R., Hudson, R.L., Minh, C.C., Hertzberg, B., McRT-STM: A high performance software transactional memory system for a multi-core runtime (2006) Proc. of the PPoPPSutter, H., Larus, J.R., Software and the concurrency revolution (2005) Queue, 3 (7), pp. 54-62Udayakumaran, S., Dominguez, A., Barua, R., Dynamic allocation for scratch-pad memory using compile-time decisions (2006) TECS, 5 (2), pp. 472-511Verma, M., Marwedel, P., (2007) Advanced Memory Optimization Techniques for Low-Power Embedded Processor

    Using Multiple Abstraction Levels To Speedup An Mpsoc Virtual Platform Simulator

    No full text
    Virtual platforms are of paramount importance for design space exploration and their usage in early software development and verification is crucial. In particular, enabling accurate and fast simulation is specially useful, but such features are usually conflicting and tradeoffs have to be made. In this paper we describe how we integrated TLM communication mechanisms into a state-of-the-art, cycle-accurate, MPSoC simulation platform. More specifically, we show how we adapted ArchC fast functional instruction set simulators to the MPARM platform in order to achieve both fast simulation speed and accuracy. Our implementation led to a much faster hybrid platform, reaching speedups of up to 2.9 and 2.1x on average with negligible impact on power estimation accuracy (average 3.26% and 2.25% of standard deviation). © 2011 IEEE.99105 Institution of Electrical Engineers (IEEE) Reliability Society,Karlsruhe Institute of Technology (KIT)Benini, L., Bertozzi, D., Bogliolo, A., Menichelli, F., Olivieri, M., MPARM: Exploring the multi-processor SoC design space with systemC (2005) Journal of VLSI Signal Processing Systems for Signal, Image, and Video Technology, 41 (2), pp. 169-182. , DOI 10.1007/s11265-005-6648-1Chen, J., Dubois, M., Stenstrom, P., Integrating complete-system and user-level performance/power simulators: The simwattch approach (2003) ISPASS '03: Proceedings of the 2003 IEEE International Symposium on Performance Analysis of Systems and Software, pp. 1-10Rigo, S., Araujo, G., Bartholomeu, M., Azevedo, R., ArchC: A SystemC-based architecture description language (2004) Proceedings - Symposium on Computer Architecture and High Performance Computing, pp. 66-73. , Proceedings - 16th Symposium on Computer Architecture and High Performance ComputingBlack, D.C., Donovan, J., (2004) SystemC: From the Ground UpDales, M., (2003), www.cl.cam.ac.uk/mwd24/phd/swarm.html, FebruaryLoghi, M., Poncino, M., Benini, L., Cycle-accurate power analysis for multiprocessor systems-on-a-chip (2004) GLSVLSI '04: Proceedings of the 14th ACM Great Lakes Symposium on VLSI, pp. 410-406Ferri, C., Moreshet, T., Bahar, R.I., Benini, L., Herlihy, M., A hardware/software framework for supporting transactional memory in a mpsoc environment (2007) SIGARCH Comput. Archit. News, 35 (1), pp. 47-54Baldassin, A., Klein, F., Araujo, G., Azevedo, R., Centoducatte, P., Characterizing the energy consumption of software transactional memory (2009) Computer Architecture Letters, 8 (2), pp. 56-59. , FebMagnusson, P.S., Christensson, M., Eskilson, J., Forsgren, D., Hallberg, G., Hogberg, J., Larsson, F., Werner, B., Simics: A full system simulation platform (2002) Computer, 35, pp. 50-58Brooks, D.M., Bose, P., Schuster, S.E., Jacobson, H., Kudva, P.N., Buyuktosunoglu, A., Wellman, J.-D., Cook, P.W., Power-aware microarchitecture: Design and modeling challenges for next-generation microprocessors (2000) IEEE Micro, 20 (6), pp. 26-44. , DOI 10.1109/40.888701Austin, T., Larson, E., Ernst, D., Simplescalar: An infrastructure for computer system modeling (2002) Computer, 35 (2), pp. 59-67Azevedo, R., Rigo, S., Bartholomeu, M., Araujo, G., Araujo, C., Barros, E., The ArchC architecture description language and tools (2005) International Journal of Parallel Programming, 33 (5), pp. 453-484. , DOI 10.1007/s10766-005-7301-0Ghenassia, F., (2006) Transaction-Level Modeling with Systemc: Tlm Concepts and Applications for Embedded SystemsCao Minh, C., Chung, J., Kozyrakis, C., Olukotun, K., STAMP: Stanford transactional applications for multi-processing (2008) IISWC '08: Proceedings of the IEEE International Symposium on Workload Characterization, , Septembe
    corecore