9 research outputs found

    How to speedup fault-tolerant clock generation in VLSI systems-on-chip via pipelining

    Get PDF
    Fault-tolerant clocking schemes become inevitable when it comes to highly-reliable chip designs. Because of the additional hardware overhead, existing solutions are considerably slower than their non-reliable counterparts. In this paper, we demonstrate that pipelining is a viable approach to speed up the distributed fault-tolerant DARTS clock generation approach introduced in (Függer, Schmid, Fuchs, Kempf, EDCC'06), where a distributed Byzantine fault-tolerant tick generation algorithm has been used to replace the traditional quartz oscillator and highly balanced clock tree in VLSI Systems-on-Chip (SoCs). We provide a pipelined version of the original DARTS algorithm, termed pDARTS, together with a novel modeling and analysis framework for hardware-implemented asynchronous fault-tolerant distributed algorithms, which is employed for rigorously analyzing its correctness & performance. Our results, which have also been confirmed by the experimental evaluation of an FPGA prototype implementation, reveal that pipelining indeed allows to entirely remove the adverse effect of large interconnect delays on the achievable clock frequency, and demonstrate again that methods and results from distributed algorithms research can successfully be applied in the VLSI context

    A hardware/software framework for supporting transactional memory in a MPSoC environment

    No full text
    Manufacturers are focusing on multiprocessor-system-on-a-chip (MPSoC) architectures in order to provide increased concurrency, rather than increased clock speed, for both large-scale as well as embedded systems. Traditionally lock-based synchronization is provided to support concurrency; however, managing locks can be very difficult and error prone. In addition, the performance and power cost of lock-based synchronization can be high. Transactional memories have been extensively investigated as an alternative to lock-based synchronization in general-purpose systems. It has been shown that transactional memory has advantages over locks in terms of ease of programming, performance and energy consumption. However, their applicability to embedded multi-core platforms has not been explored yet. In this paper, we demonstrate a complete hardware transactional memory solution for an embedded multi-core architecture, consisting of a cache-coherent ARM-based cluster, similar to ARM's MPCore. Using cycle accurate power and performance models for the transactional memory hardware, we evaluate our architectural framework over a set of different system and application settings, and show that transactional memory is a promising solution, even for resource-constrained embedded multiprocessors

    Exploiting software transactional memory in the context of asymmetric architectures

    Get PDF
    Orientador: Paulo Cesar CentoducatteTese (doutorado) - Universidade Estadual de Campinas, Instituto de ComputaçãoResumo: A adoção dos microprocessadores com múltiplos núcleos de execução pela indústria semicondutora tem criado uma crescente necessidade por novas linguagens, metodologias e ferramentas que tornem o desenvolvimento de sistemas concorrentes mais rápido, eficiente e acessível aos programadores de todos os níveis. Uma das principais dificuldades em programação concorrente com memória compartilhada é garantir a correta sincronização do código, evitando assim condições de corrida que podem levar o sistema a um estado inconsistente. A sincronização tem sido tradicionalmente realizada através de métodos baseados em travas, reconhecidos amplamente por serem de difícil uso e pelas anomalias causadas. Um novo mecanismo, conhecido como memória transacional (TM), tem sido alvo de muita pesquisa recentemente e promete simplificar o processo de sincronização, além de possibilitar maior oportunidade para extração de paralelismo e consequente desempenho. O cerne desta tese é formado por três trabalhos desenvolvidos no contexto dos sistemas de memória transacional em software (STM). Primeiramente, apresentamos uma implementação de STM para processadores assimétricos, usando a arquitetura Cell/B.E. como foco. Como principal resultado, constatamos que o uso de sistemas transacionais em arquiteturas assimétricas também é promissor, principalmente pelo fator escalabilidade. No segundo trabalho, adotamos uma abordagem diferente e sugerimos um sistema de STM especialmente voltado para o domínio de jogos computacionais. O principal motivo que nos levou nesta direção é o baixo desempenho das implementações atuais de STM. Um estudo de caso conduzido a partir de um jogo complexo mostra a eficácia do sistema proposto. Finalmente, apresentamos pela primeira vez uma caracterização do consumo de energia de um sistema de STM considerado estado da arte. Além da caracterização, também propomos uma técnica para redução do consumo em casos de alta contenção. Resultados obtidos a partir dessa técnica revelam ganhos de até 87% no consumo de energiaAbstract: The shift towards multicore processors taken by the semiconductor industry has initiated an era in which new languages, methodologies and tools are of paramount importance to the development of efficient concurrent systems that can be built in a timely way by all kinds of programmers. One of the main obstacles faced by programmers when dealing with shared memory programming concerns the use of synchronization mechanisms so as to avoid race conditions that could possibly lead the system to an inconsistent state. Synchronization has been traditionally achieved by means of locks (or variations thereof), widely known by their anomalies and hard-to-get-it-right facets. A new mechanism, known as transactional memory (TM), has recently been the focus of a lot of research and shows potential to simplify code synchronization as well as delivering more parallelism and, therefore, better performance. This thesis presents three works focused on different aspects of software transactional memory (STM) systems. Firstly, we show an STM implementation for asymmetric processors, focusing on the architecture of Cell/B.E. As an important result, we find out that memory transactions are indeed promising for asymmetric architectures, specially due to their scalability. Secondly, we take a different approach to STM implementation by devising a system specially targeted at computer games. The decision was guided by poor performance figures usually seen on current STM implementations. We also conduct a case study using a complex game that effectively shows the system's efficiency. Finally, we present the energy consumption characterization of a state-of-the-art STM for the first time. Based on the observed characterization, we also propose a technique aimed at reducing energy consumption in highly contended scenarios. Our results show that the technique is indeed effective in such cases, improving the energy consumption by up to 87%DoutoradoSistemas de ComputaçãoDoutor em Ciência da Computaçã

    Performance Optimization Strategies for Transactional Memory Applications

    Get PDF
    This thesis presents tools for Transactional Memory (TM) applications that cover multiple TM systems (Software, Hardware, and hybrid TM) and use information of all different layers of the TM software stack. Therefore, this thesis addresses a number of challenges to extract static information, information about the run time behavior, and expert-level knowledge to develop these new methods and strategies for the optimization of TM applications

    Using Multiple Abstraction Levels To Speedup An Mpsoc Virtual Platform Simulator

    No full text
    Virtual platforms are of paramount importance for design space exploration and their usage in early software development and verification is crucial. In particular, enabling accurate and fast simulation is specially useful, but such features are usually conflicting and tradeoffs have to be made. In this paper we describe how we integrated TLM communication mechanisms into a state-of-the-art, cycle-accurate, MPSoC simulation platform. More specifically, we show how we adapted ArchC fast functional instruction set simulators to the MPARM platform in order to achieve both fast simulation speed and accuracy. Our implementation led to a much faster hybrid platform, reaching speedups of up to 2.9 and 2.1x on average with negligible impact on power estimation accuracy (average 3.26% and 2.25% of standard deviation). © 2011 IEEE.99105 Institution of Electrical Engineers (IEEE) Reliability Society,Karlsruhe Institute of Technology (KIT)Benini, L., Bertozzi, D., Bogliolo, A., Menichelli, F., Olivieri, M., MPARM: Exploring the multi-processor SoC design space with systemC (2005) Journal of VLSI Signal Processing Systems for Signal, Image, and Video Technology, 41 (2), pp. 169-182. , DOI 10.1007/s11265-005-6648-1Chen, J., Dubois, M., Stenstrom, P., Integrating complete-system and user-level performance/power simulators: The simwattch approach (2003) ISPASS '03: Proceedings of the 2003 IEEE International Symposium on Performance Analysis of Systems and Software, pp. 1-10Rigo, S., Araujo, G., Bartholomeu, M., Azevedo, R., ArchC: A SystemC-based architecture description language (2004) Proceedings - Symposium on Computer Architecture and High Performance Computing, pp. 66-73. , Proceedings - 16th Symposium on Computer Architecture and High Performance ComputingBlack, D.C., Donovan, J., (2004) SystemC: From the Ground UpDales, M., (2003), www.cl.cam.ac.uk/mwd24/phd/swarm.html, FebruaryLoghi, M., Poncino, M., Benini, L., Cycle-accurate power analysis for multiprocessor systems-on-a-chip (2004) GLSVLSI '04: Proceedings of the 14th ACM Great Lakes Symposium on VLSI, pp. 410-406Ferri, C., Moreshet, T., Bahar, R.I., Benini, L., Herlihy, M., A hardware/software framework for supporting transactional memory in a mpsoc environment (2007) SIGARCH Comput. Archit. News, 35 (1), pp. 47-54Baldassin, A., Klein, F., Araujo, G., Azevedo, R., Centoducatte, P., Characterizing the energy consumption of software transactional memory (2009) Computer Architecture Letters, 8 (2), pp. 56-59. , FebMagnusson, P.S., Christensson, M., Eskilson, J., Forsgren, D., Hallberg, G., Hogberg, J., Larsson, F., Werner, B., Simics: A full system simulation platform (2002) Computer, 35, pp. 50-58Brooks, D.M., Bose, P., Schuster, S.E., Jacobson, H., Kudva, P.N., Buyuktosunoglu, A., Wellman, J.-D., Cook, P.W., Power-aware microarchitecture: Design and modeling challenges for next-generation microprocessors (2000) IEEE Micro, 20 (6), pp. 26-44. , DOI 10.1109/40.888701Austin, T., Larson, E., Ernst, D., Simplescalar: An infrastructure for computer system modeling (2002) Computer, 35 (2), pp. 59-67Azevedo, R., Rigo, S., Bartholomeu, M., Araujo, G., Araujo, C., Barros, E., The ArchC architecture description language and tools (2005) International Journal of Parallel Programming, 33 (5), pp. 453-484. , DOI 10.1007/s10766-005-7301-0Ghenassia, F., (2006) Transaction-Level Modeling with Systemc: Tlm Concepts and Applications for Embedded SystemsCao Minh, C., Chung, J., Kozyrakis, C., Olukotun, K., STAMP: Stanford transactional applications for multi-processing (2008) IISWC '08: Proceedings of the IEEE International Symposium on Workload Characterization, , Septembe
    corecore