7 research outputs found

    매니코어 NoC 아키텍처에 대한 고속 사이클-근사 시뮬레이션 기법

    Get PDF
    학위논문 (박사)-- 서울대학교 대학원 : 전기·컴퓨터공학부, 2017. 2. 하순회.Simulation is a software technique that uses the current available architecture to prototype a future architecture. In computer architecture research, simulation techniques are one of the most important skills. Simulation techniques enable us to obtain important performance indicators of new architectures and to perform the design space exploration using these metrics. Furthermore, the simulator enables rapid software development and optimization on the architecture that does not exist. Despite various known problems, such as slow speed or coverage issue, the reliance on simulation technology in computer architecture research continues to increase. As the density of transistor increases and the performance improvement of the single core hits the ceiling, the newly constructed architectures usually consist of multi/many cores with the network-on-chip, which enables scalable communications. In addition, the implementation of the application itself has also been complicated to effectively utilize these parallel architectures. Thus, simulators for parallel architectures and parallel applications have become extremely complex, and existing sequential simulators no longer simulate these systems at a realistic time. While many of parallel simulation techniques are being developed to solve these problems, they suffer from poor simulation performance or accuracy. In this thesis, we propose and evaluate a novel many-core simulation technique that can obtain the best simulation performance at the cost of minimum simulation error. The proposed parallel many-core simulator is divided into three parts: 1) core simulator, 2) network-on-chip simulator, and 3) simulation backplane. Each core is executed by a core simulator, which communicates with the external simulation backplane via the Interprocess Communication (IPC). Each core simulation is performed individually in a separate host processor. The simulation backplane arranges messages from each core into chronological order, passes them to destination modules, and simulates hardware components other than cores. If the simulation backplane generates a request requiring NoC communication, this request is forwarded to the network simulator and is simulated at the most accurate accuracy level. In this thesis, we proposed a novel core simulation model, which combined analytical and sampled simulations. The core simulator presents 11.36 to 44.31 MIPS performance, while the simulation error is approximately 8 percent. The standalone core simulator is released as an open-source. We confirmed that NoC simulation has a great effect on the reliability of outputs generated from many-core simulation. First, existing flit-level NoC simulators were analyzed at source-code level. Based on the observations, various implementations were evaluated and various software optimizations was applied to improve the network simulation performance. The proposed NoC simulator presents more than 100KCycles/s performance unless the packet injection rate exceeds 0.00625, which is two times faster than state-of-the-arts NoC simulator at least. The speed of the simulation backplane depends greatly on the IPC overhead and SystemC scheduling overhead. To reduce the IPC overhead, the trace-driven co-simulation technique is used, faster IPC is introduced, and the segmented L1 data cache is embedded in a core simulator. In addition, to reduce SystemC scheduling overhead, it is important to reduce the number of modules that are simultaneously awakened. To this end, slave modules are redesigned to be activated only based on an event. A new scheduler parallelization technique is also studied. Although the newly developed SystemC parallel scheduler showed good performance under limited conditions, we also confirmed that no performance improvement was found in the TLM level many-core simulator developed in this thesis. While the proposed many-core simulator uses the conservative synchronization technique which is free from causality errors and performs an accurate flit-level NoC simulation, the simulation performance is still acceptable, thanks to parallelism and optimizations. Additionally, the simulator is highly scalable to add other modules because the simulation backplane is developed to be compatible with SystemC TLM 2.0 standard. Although extensive experiments on accuracy are not conducted, it will be complemented when a detailed specification of the target architecture is given. This dissertation can be a reference to the development of a many-core simulator, which will be more essential in the future.Chapter 1 Introduction 1 1.1 Motivation 1 1.2 Contribution 4 1.3 Dissertation Organization 5 Chapter 2 Background and Existing Research 6 2.1 Terminologies 6 2.1.1 Simulation Host / Simulation Target 6 2.1.2 Simulated Time / Simulation Time 2.1.3 User-level Simulation / Full-system Simulation 7 2.1.4 Execution-driven Simulation / Trace-driven Simulation 7 2.2 State-of-the-arts Many-core Simulators 8 2.2.1 Gem5 8 2.2.2 Marss 9 2.2.3 Sniper 9 2.2.4 Zsim 9 2.2.5 Manifold 10 2.2.6 Hornet 10 2.2.7 Summary 11 2.3 Host and Target Architecture 12 Chapter 3 Core Simulation 14 3.1 Overview 14 3.2 Related Works 16 3.2.1 Timing Models 16 3.2.2 Analytical Model: Interval Simulation 19 3.3 Sampling Mechanism 23 3.3.1 Sampling Configuration 24 3.3.2 Parameter Extraction 24 3.4 Trace Analyzer 27 3.4.1 Dependency Analysis 29 3.4.2 Life Cycle of An Instruction 31 3.5 Experimental Results 32 3.5.1 Time-accuracy Trade-off 34 3.5.2 Simulation Accuracy 37 3.5.3 Simulation Performance 41 3.6 Discussion 42 Chapter 4 NoC Simulation 45 4.1 Network-on-chip 45 4.2 Motivation 46 4.3 Related Works 48 4.3.1 Noxim 49 4.3.2 Booksim2 50 4.3.3 Garnet 51 4.4 Proposed Approach 51 4.4.1 Implementations 51 4.4.2 Optimizations 54 4.5 Experimental Results 56 4.5.1 Impact of Implementations and Optimizations 56 4.5.2 Comparison with Other State-Of-The-Arts 58 4.5.3 Performance Evaluation For Various Configurations 59 4.5.4 Full-System Simulation Accuracy Impact 59 4.5.5 Accuracy 61 4.6 Discussion 61 Chapter 5 Simulation Backplane 63 5.1 Overview 63 5.2 Background 65 5.2.1 SystemC 65 5.2.2 OSCI Transaction Level Modeling Standard 2.0 66 5.2.3 Synchronization Techniques 67 5.3 SystemC Models for the Target Architecture 69 5.4 Reducing the Cost of Interprocess Communications 71 5.4.1 Trace-driven Co-simulation 71 5.4.2 Better Interprocess Communication 73 5.4.3 Virtually embedding modules to core simulator 74 5.5 Reducing SystemC Scheduling Overhead 76 5.5.1 Event-based Slave Module Activation 76 5.5.2 SystemC Scheduler Parallelization 78 5.6 Evaluation 79 5.6.1 Scalability Test 79 5.6.2 Simulation Performance 79 5.6.3 Simulation Accuracy 80 Chapter 6 Simulation Backplane Parallelization 81 6.1 Background: OSCI SystemC Scheduler 81 6.2 Related Work: SystemC Parallelization Techniques 82 6.2.1 Fully-synchronous Approach 82 6.2.2 Parallel Distributed Event Scheduling (PDES) Approach 82 6.2.3 Out-of-order Execution with Dependency Analysis 83 6.2.4 Dynamic Offloading Approach 84 6.3 Proposed Technique 84 6.3.1 Basic Synchronization 85 6.3.2 Relaxed Synchronization 86 6.3.3 Modeling Restrictions 88 6.4 Experimental Results 89 6.4.1 Performance 90 6.4.2 Accuracy 92 6.5 Discussion and Limitation 93 Chapter 7 Conclusion 95 Bibliography 97 요약 107Docto

    MPSoCBench : um framework para avaliação de ferramentas e metodologias para sistemas multiprocessados em chip

    Get PDF
    Orientador: Rodolfo Jardim de AzevedoTese (doutorado) - Universidade Estadual de Campinas, Instituto de ComputaçãoResumo: Recentes metodologias e ferramentas de projetos de sistemas multiprocessados em chip (MPSoC) aumentam a produtividade por meio da utilização de plataformas baseadas em simuladores, antes de definir os últimos detalhes da arquitetura. No entanto, a simulação só é eficiente quando utiliza ferramentas de modelagem que suportem a descrição do comportamento do sistema em um elevado nível de abstração. A escassez de plataformas virtuais de MPSoCs que integrem hardware e software escaláveis nos motivou a desenvolver o MPSoCBench, que consiste de um conjunto escalável de MPSoCs incluindo quatro modelos de processadores (PowerPC, MIPS, SPARC e ARM), organizado em plataformas com 1, 2, 4, 8, 16, 32 e 64 núcleos, cross-compiladores, IPs, interconexões, 17 aplicações paralelas e estimativa de consumo de energia para os principais componentes (processadores, roteadores, memória principal e caches). Uma importante demanda em projetos MPSoC é atender às restrições de consumo de energia o mais cedo possível. Considerando que o desempenho do processador está diretamente relacionado ao consumo, há um crescente interesse em explorar o trade-off entre consumo de energia e desempenho, tendo em conta o domínio da aplicação alvo. Técnicas de escalabilidade dinâmica de freqüência e voltagem fundamentam-se em gerenciar o nível de tensão e frequência da CPU, permitindo que o sistema alcance apenas o desempenho suficiente para processar a carga de trabalho, reduzindo, consequentemente, o consumo de energia. Para explorar a eficiência energética e desempenho, foram adicionados recursos ao MPSoCBench, visando explorar escalabilidade dinâmica de voltaegem e frequência (DVFS) e foram validados três mecanismos com base na estimativa dinâmica de energia e taxa de uso de CPUAbstract: Recent design methodologies and tools aim at enhancing the design productivity by providing a software development platform before the definition of the final Multiprocessor System on Chip (MPSoC) architecture details. However, simulation can only be efficiently performed when using a modeling and simulation engine that supports system behavior description at a high abstraction level. The lack of MPSoC virtual platform prototyping integrating both scalable hardware and software in order to create and evaluate new methodologies and tools motivated us to develop the MPSoCBench, a scalable set of MPSoCs including four different ISAs (PowerPC, MIPS, SPARC, and ARM) organized in platforms with 1, 2, 4, 8, 16, 32, and 64 cores, cross-compilers, IPs, interconnections, 17 parallel version of software from well-known benchmarks, and power consumption estimation for main components (processors, routers, memory, and caches). An important demand in MPSoC designs is the addressing of energy consumption constraints as early as possible. Whereas processor performance comes with a high power cost, there is an increasing interest in exploring the trade-off between power and performance, taking into account the target application domain. Dynamic Voltage and Frequency Scaling techniques adaptively scale the voltage and frequency levels of the CPU allowing it to reach just enough performance to process the system workload while meeting throughput constraints, and thereby, reducing the energy consumption. To explore this wide design space for energy efficiency and performance, both for hardware and software components, we provided MPSoCBench features to explore dynamic voltage and frequency scalability (DVFS) and evaluated three mechanisms based on energy estimation and CPU usage rateDoutoradoCiência da ComputaçãoDoutora em Ciência da Computaçã

    Co-simulation techniques based on virtual platforms for SoC design and verification in power electronics applications

    Get PDF
    En las últimas décadas, la inversión en el ámbito energético ha aumentado considerablemente. Actualmente, existen numerosas empresas que están desarrollando equipos como convertidores de potencia o máquinas eléctricas con sistemas de control de última generación. La tendencia actual es usar System-on-chips y Field Programmable Gate Arrays para implementar todo el sistema de control. Estos dispositivos facilitan el uso de algoritmos de control más complejos y eficientes, mejorando la eficiencia de los equipos y habilitando la integración de los sistemas renovables en la red eléctrica. Sin embargo, la complejidad de los sistemas de control también ha aumentado considerablemente y con ello la dificultad de su verificación. Los sistemas Hardware-in-the-loop (HIL) se han presentado como una solución para la verificación no destructiva de los equipos energéticos, evitando accidentes y pruebas de alto coste en bancos de ensayo. Los sistemas HIL simulan en tiempo real el comportamiento de la planta de potencia y su interfaz para realizar las pruebas con la placa de control en un entorno seguro. Esta tesis se centra en mejorar el proceso de verificación de los sistemas de control en aplicaciones de electrónica potencia. La contribución general es proporcionar una alternativa a al uso de los HIL para la verificación del hardware/software de la tarjeta de control. La alternativa se basa en la técnica de Software-in-the-loop (SIL) y trata de superar o abordar las limitaciones encontradas hasta la fecha en el SIL. Para mejorar las cualidades de SIL se ha desarrollado una herramienta software denominada COSIL que permite co-simular la implementación e integración final del sistema de control, sea software (CPU), hardware (FPGA) o una mezcla de software y hardware, al mismo tiempo que su interacción con la planta de potencia. Dicha plataforma puede trabajar en múltiples niveles de abstracción e incluye soporte para realizar co-simulación mixtas en distintos lenguajes como C o VHDL. A lo largo de la tesis se hace hincapié en mejorar una de las limitaciones de SIL, su baja velocidad de simulación. Se proponen diferentes soluciones como el uso de emuladores software, distintos niveles de abstracción del software y hardware, o relojes locales en los módulos de la FPGA. En especial se aporta un mecanismo de sincronizaron externa para el emulador software QEMU habilitando su emulación multi-core. Esta aportación habilita el uso de QEMU en plataformas virtuales de co-simulacion como COSIL. Toda la plataforma COSIL, incluido el uso de QEMU, se ha analizado bajo diferentes tipos de aplicaciones y bajo un proyecto industrial real. Su uso ha sido crítico para desarrollar y verificar el software y hardware del sistema de control de un convertidor de 400 kVA

    Proceedings of the 5th International Workshop on Reconfigurable Communication-centric Systems on Chip 2010 - ReCoSoC\u2710 - May 17-19, 2010 Karlsruhe, Germany. (KIT Scientific Reports ; 7551)

    Get PDF
    ReCoSoC is intended to be a periodic annual meeting to expose and discuss gathered expertise as well as state of the art research around SoC related topics through plenary invited papers and posters. The workshop aims to provide a prospective view of tomorrow\u27s challenges in the multibillion transistor era, taking into account the emerging techniques and architectures exploring the synergy between flexible on-chip communication and system reconfigurability

    Support des communications dans des architectures multicœurs par l’intermédiaire de mécanismes matériels et d’interfaces de programmation standardisées

    Get PDF
    The application constraints driving the design of embedded systems are constantly demanding higher performance and power efficiency. To meet these constraints, current SoC platforms rely on replicating several processing cores while adding dedicated hardware accelerators to handle specific tasks. However, developing embedded applications is becoming a key challenge, since applications workload will continue to grow and the software technologies are not evolving as fast as hardware architectures, leaving a gap in the full system design. Indeed, the increased programming complexity can be associated to the lack of software standards that supports heterogeneity, frequently leading to custom solutions. On the other hand, implementing a standard software solution for embedded systems might induce significant performance and memory usage overheads. Therefore, this Thesis focus on decreasing this gap by implementing hardware mechanisms in co-design with a standard programming interface for embedded systems. The main objectives are to increase programmability through the implementation of a standardized communication application programming interface (MCAPI), and decrease the overheads imposed by the software implementation through the use of the developed hardware mechanisms.The contributions of the Thesis comprise the implementation of MCAPI for a generic multi-core platform and dedicated hardware mechanisms to improve communication connection phase and overall performance of data transfer phase. It is demonstrated that the proposed mechanisms can be exploited by the software implementation without increasing software complexity. Furthermore, performance estimations obtained using a SystemC/TLM simulation model for the reference multi-core architecture show that the proposed mechanisms provide significant gains in terms of latency (up to 97%), throughput (40x increase) and network traffic (up to 68%) while reducing processor workload for both characterization test-cases and real application benchmarks.L’évolution des contraintes applicatives imposent des améliorations continues sur les performances et l’efficacité énergétique des systèmes embarqués. Pour répondre à ces contraintes, les plateformes « SoC » actuelles s’appuient sur la multiplication des cœurs de calcul, tout en ajoutant des accélérateurs matériels dédiés pour gérer des tâches spécifiques. Dans ce contexte, développer des applications embarquées devient un défi complexe, en effet la charge de travail des applications continue à croître alors que les technologies logicielles n’évoluent pas aussi vite que les architectures matérielles, laissant un écart dans la conception complète du système. De fait, la complexité accrue de programmation peut être associée à l’absence de standards logiciels qui prennent en charge l’hétérogénéité des architectures, menant souvent à des solutions ad hoc. A l’opposé, l’utilisation d’une solution logicielle standardisée pour les systèmes embarqués peut induire des surcoûts importants concernant les performances et l’occupation de la mémoire si elle n’est pas adaptée à l’architecture. Par conséquent, le travail de cette thèse se concentre sur la réduction de cet écart en mettant en œuvre des mécanismes matériels dont la conception prend en compte une interface de programmation standard pour systèmes embarqués. Les principaux objectifs sont ainsi d’accroître la programmabilité par la mise en œuvre d’une interface de programmation : MCAPI, et de diminuer la charge logiciel des cœurs grâce à l’utilisation des mécanismes matériels développés.Les contributions de la thèse comprennent la mise en œuvre de MCAPI pour une plate-forme multicœur générique et des mécanismes matériels pour améliorer la performance globale de la configuration de la communication et des transferts de données. Il est démontré que les mécanismes peuvent être pris en charge par les interfaces logicielles sans augmenter leur complexité. En outre, les résultats de performance obtenus en utilisant un modèle SystemC/TLM de l’architecture multicœurs de référence montrent que les mécanismes proposés apportent des gains significatifs en termes de latence, débit, trafic réseau, temps de charge processeur et temps de communication sur des cas d’étude et des applications complètes

    SystemC Transaction-Level Modeling of an MPSoC Platform Based on an Open Source ISS by Using Interprocess Communication

    Get PDF
    Transaction-level modeling (TLM) is a promising technique to deal with the increasing complexity of modern embedded systems. This model allows a system designer to model a complete application, composed of hardware and software parts, at several levels of abstraction. For this purpose, we use systemC, which is proposed as a standardized modeling language. This paper presents a transaction-level modeling cosimulation methodology for modeling, validating, and verifying our embedded open architecture platform. The proposed platform is an open source multiprocessor system-on-chip (MPSoC) platform, integrated under the synthesis tool for adaptive and reconfigurable system-on-chip (STARSoC) environment. It relies on the integration between an open source instruction set simulators (ISSs), OR1Ksim platform, and the systemC simulation environment which contains other components (wishbone bus, memories, …, etc.). The aim of this work is to provide designers with the possibility of faster and efficient architecture exploration at a higher level of abstractions, starting from an algorithmic description to implementation details
    corecore