Search CORE

18 research outputs found

Challenges for the Parallelization of Loosely Timed SystemC Programs

Author: Becker Denis
Cornet Jérôme
Moy Matthieu
Publication venue: HAL CCSD
Publication date: 08/10/2015
Field of study

International audienceSystemC/TLM models are commonly used in the industry to provide an early SoC simulation environment. The open source implementation of the SystemC simulator is sequential. The standard doesn't impose sequential executions, but makes this choice the easiest by imposing coroutine semantics. With the increasing size and complexity of models, and the multiplication of computation cores on recent machines, the parallelization of SystemC simulations is a major research concern. There have been several proposals for SystemC parallelization, but most of them are limited to cycle-accurate models. In this paper we give an overview of the practices in one industrial context. We explain why loosely timed models are the only viable option in this context. We also show that unfortunately, most of the existing approaches for SystemC parallelization can fundamentally not apply to these models. We support this claim with a set of measurements performed on a platform used in production at STMicroelectronics. This paper both surveys existing techniques and identifies unsolved challenges in the parallelization of SystemC/TLM models

Hal - Université Grenoble Alpes

Fast and Accurate TLM Simulations using Temporal Decoupling for FIFO-based Communications

Author: Cornet Jérôme
Galilée Bruno
Helmstetter Claude
Moy Matthieu
Vivet Pascal
Publication venue: HAL CCSD
Publication date: 01/01/2013
Field of study

International audienceUntimed models of large embedded systems, generally written using SystemC/TLM, allow the software team to start simulations before the RTL description is available, and then provide a golden reference model to the verification team. For those two purposes, only a correct functional behavior is required, but users are asking more and more for timing estimations early in the design flow. Because companies cannot afford to maintain two simulators for the same chip, only local modifications of the untimed model are considered. A known approach is to add timing annotations into the code and to reduce the number of costly context switches using temporal decoupling, meaning that a process can go ahead of the simulation time before synchronizing again. Our current goal is to apply temporal decoupling to the TLM platform of a many-core SoC dedicated to high performance computing. Part of this SoC communicates using classic memory-mapped buses, but it can be extended with hardware accelerators communicating using FIFOs. Whereas temporal decoupling for memory-based transactions has been widely studied, FIFO-based communications raise issues that have not been addressed before. In this paper, we provide an efficient solution to combine temporal decoupling and FIFO-based communications

Crossref

Hal - Université Grenoble Alpes

HAL-CEA

GPU Based Acceleration of SystemC and Transaction Level Models for MPSOC Simulation

Author: Gangrade Rohit
Publication venue
Publication date: 08/07/2016
Field of study

With increasing number of cores on a chip, the complexity of modeling hardware using virtual prototype is increasing rapidly. Typical SOCs today have multipro-cessors connected through a bus or NOC architecture which can be modeled using SystemC framework. SystemC is a popular language used for early design exploration and performance analysis of complex embedded systems. TLM2.0, an extension of SystemC, is increasingly used in MPSOC designs for simulating loosely and approxi-mately timed transaction level models. The OSCI reference kernel which implements SystemC library runs on a single thread, slowing up the simulation speed to a large extent. Previous works have used the computational power of multi-core systems and GPUs which can run multiple threads simultaneously, speeding up the simu-lation. Multi-core simulations are not as eﬀective in cases where thread runtime is low, because synchronization overhead becomes comparable to thread runtime. Modern GPUs can run thousands of threads at a time and have shown good results for synthesizable designs in recent eﬀorts. However, development in these works are limited to synthesizable subsets of SystemC models, not supporting timed events for process communication. In this research work, a methodology is proposed for accelerating timed event based SystemC TLM2.0 model to GPU based kernel, which maps SystemC processes to CUDA threads in GPU, providing high data level par-allelism. This work aims to provide a scalable solution for simulating large MPSOC designs, facilitating early design exploration and performance analysis. Experiments have shown that the proposed technique provides a speed-up of the order of 100x for typical MPSOC designs

Texas A&M Repository

High Speed Cycle-Approximate Simulation of Embedded Cache-Incoherent and Coherent Chip-Multiprocessors

Author: Christopher Thompson
D Burger
D Chiou
Daniel Sanchez
E Argollo
ES Chung
J Chen
K Skadron
K Wang
M Monchiero
Miles Gould
MMK Martin
N Binkert
N Hardavellas
Nigel Topham
P Ren
PS Magnusson
SS Mukherjee
X Zhu
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/12/2018
Field of study

Crossref

Edinburgh Research Explorer

Recommended from our members

Dynamic time management for improved accuracy and speed in host-compiled multi-core platform models

Author: Razaghi Parisa
Publication venue
Publication date: 07/07/2014
Field of study

textWith increasing complexity and software content, modern embedded platforms employ a heterogeneous mix of multi-core processors along with hardware accelerators in order to provide high performance in limited power budgets. Due to complex interactions and highly dynamic behavior, static analysis of real-time performance and other constraints is challenging. As an alternative, full-system simulations have been widely accepted by designers. With traditional approaches being either slow or inaccurate, so-called host-compiled simulators have recently emerged as a solution for rapid evaluation of complete systems at early design stages. In such approaches, a faster simulation is achieved by natively executing application code at the source level, abstracting execution behavior of target platforms, and thus increasing simulation granularity. However, most existing host-compiled simulators often focus on application behavior only while neglecting effects of hardware/software interactions and associated speed and accuracy tradeoffs in platform modeling. In this dissertation, we focus on host-compiled operating system (OS) and processor modeling techniques, and we introduce novel dynamic timing model management approaches that efficiently improve both accuracy and speed of such models via automatically calibrating the simulation granularity. The contributions of this dissertation are twofold: We first establish an infrastructure for efficient host-compiled multi-core platform simulation by developing (a) abstract models of both real-time OSs and processors that replicate timing-accurate hardware/software interactions and enable full-system co-simulation, and (b) quantitative and analytical studies of host-compiled simulation principles to analyze error bounds and investigate possible improvements. Building on this infrastructure, we further propose specific techniques for improving accuracy and speed tradeoffs in host-compiled simulation by developing (c) an automatic timing granularity adjustment technique based on dynamically observing system state to control the simulation, (d) an out-of-order cache hierarchy modeling approach to efficiently reorder memory access behavior in the presence of temporal decoupling, and (e) a synchronized timing model to align platform threads to run efficiently in parallel simulation. Results as applied to industrial-strength platforms confirm that by providing careful abstractions and dynamic timing management, our models can achieve full-system simulations at equivalent speeds of more than a thousand MIPS with less than 3% timing error. Coupled with the capability to easily adjust simulation parameters and configurations, this demonstrates the benefits of our platform models for early application development and exploration.Electrical and Computer Engineerin

Texas ScholarWorks

Transaction-level power analysis of VLSI digital systems

Author: Conti M.
Orcioni S.
Vece G.B.
Publication venue
Publication date: 01/01/2015
Field of study

IRIS UniversitÃ Politecnica delle Marche

Open Access Repository

Simulation Native des Systèmes Multiprocesseurs sur Puce à l'aide de la Virtualisation Assistée par le Matériel

Author: HAMAYUN Mian Muhammad
PETROT Frédéric
Publication venue
Publication date: 01/01/2013
Field of study

L'intégration de plusieurs processeurs hétérogènes en un seul système sur puce (SoC) est une tendance claire dans les systèmes embarqués. La conception et la vérification de ces systèmes nécessitent des plateformes rapides de simulation, et faciles à construire. Parmi les approches de simulation de logiciels, la simulation native est un bon candidat grâce à l'exécution native de logiciel embarqué sur la machine hôte, ce qui permet des simulations à haute vitesse, sans nécessiter le développement de simulateurs d'instructions. Toutefois, les techniques de simulation natives existantes exécutent le logiciel de simulation dans l'espace de mémoire partagée entre le matériel modélisé et le système d'exploitation hôte. Il en résulte de nombreux problèmes, par exemple les conflits l'espace d'adressage et les chevauchements de mémoire ainsi que l'utilisation des adresses de la machine hôte plutôt des celles des plates-formes matérielles cibles. Cela rend pratiquement impossible la simulation native du code existant fonctionnant sur la plate-forme cible. Pour surmonter ces problèmes, nous proposons l'ajout d'une couche transparente de traduction de l'espace adressage pour séparer l'espace d'adresse cible de celui du simulateur de hôte. Nous exploitons la technologie de virtualisation assistée par matériel (HAV pour Hardware-Assisted Virtualization) à cet effet. Cette technologie est maintenant disponibles sur plupart de processeurs grande public à usage général. Les expériences montrent que cette solution ne dégrade pas la vitesse de simulation native, tout en gardant la possibilité de réaliser l'évaluation des performances du logiciel simulé. La solution proposée est évolutive et flexible et nous fournit les preuves nécessaires pour appuyer nos revendications avec des solutions de simulation multiprocesseurs et hybrides. Nous abordons également la simulation d'exécutables cross- compilés pour les processeurs VLIW (Very Long Instruction Word) en utilisant une technique de traduction binaire statique (SBT) pour généré le code natif. Ainsi il n'est pas nécessaire de faire de traduction à la volée ou d'interprétation des instructions. Cette approche est intéressante dans les situations où le code source n'est pas disponible ou que la plate-forme cible n'est pas supporté par les compilateurs reciblable, ce qui est généralement le cas pour les processeurs VLIW. Les simulateurs générés s'exécutent au-dessus de notre plate-forme basée sur le HAV et modélisent les processeurs de la série C6x de Texas Instruments (TI). Les résultats de simulation des binaires pour VLIW montrent une accélération de deux ordres de grandeur par rapport aux simulateurs précis au cycle près.Integration of multiple heterogeneous processors into a single System-on-Chip (SoC) is a clear trend in embedded systems. Designing and verifying these systems require high-speed and easy-to-build simulation platforms. Among the software simulation approaches, native simulation is a good candidate since the embedded software is executed natively on the host machine, resulting in high speed simulations and without requiring instruction set simulator development effort. However, existing native simulation techniques execute the simulated software in memory space shared between the modeled hardware and the host operating system. This results in many problems, including address space conflicts and overlaps as well as the use of host machine addresses instead of the target hardware platform ones. This makes it practically impossible to natively simulate legacy code running on the target platform. To overcome these issues, we propose the addition of a transparent address space translation layer to separate the target address space from that of the host simulator. We exploit the Hardware-Assisted Virtualization (HAV) technology for this purpose, which is now readily available on almost all general purpose processors. Experiments show that this solution does not degrade the native simulation speed, while keeping the ability to accomplish software performance evaluation. The proposed solution is scalable as well as flexible and we provide necessary evidence to support our claims with multiprocessor and hybrid simulation solutions. We also address the simulation of cross-compiled Very Long Instruction Word (VLIW) executables, using a Static Binary Translation (SBT) technique to generated native code that does not require run-time translation or interpretation support. This approach is interesting in situations where either the source code is not available or the target platform is not supported by any retargetable compilation framework, which is usually the case for VLIW processors. The generated simulators execute on top of our HAV based platform and model the Texas Instruments (TI) C6x series processors. Simulation results for VLIW binaries show a speed-up of around two orders of magnitude compared to the cycle accurate simulators.SAVOIE-SCD - Bib.électronique (730659901) / SudocGRENOBLE1/INP-Bib.électronique (384210012) / SudocGRENOBLE2/3-Bib.électronique (384219901) / SudocSudocFranceF

OpenGrey Repository

Parallele und kooperative Simulation für eingebettete Multiprozessorsysteme

Author: Roth Christoph
Publication venue: KIT-Bibliothek, Karlsruhe
Publication date: 01/01/2014
Field of study

Die Entwicklung von eingebetteten Systemen wird durch die stetig steigende Anzahl und Integrationsdichte neuer Funktionen in Kombination mit einem erhöhten Interaktionsgrad zunehmend zur Herausforderung. Vor diesem Hintergrund werden in dieser Arbeit Methoden zur SystemC-basierten parallelen Simulation von Multiprozessorsystemen auf Manycore Architekturen sowie zur Verbesserung der Interoperabilität zwischen heterogenen Simulationswerkzeugen entwickelt, experimentell untersucht und bewertet

KITopen

Enabling the multi-threaded simulation for models written in SystemC

Author: Faveri Rodrigo Richard Cantos
Publication venue: [s.n.]
Publication date: 17/08/2018
Field of study

Orientadores: Sandro Rigo, Rodolfo Jardim de AzevedoDissertação (mestrado) - Universidade Estadual de Campinas, Instituto de ComputaçãoResumo: SystemC é uma linguagem de desenvolvimento de sistemas de hardware como, por exemplo, os modelos arquiteturais SoC (Systems-on-Chip) e, em conjunto com a biblioteca e metodologia TLM (Transacüon Levei Modeling), oferece a infraestrutura de simulação necessária capaz de realizar a simulação de software e hardware rapidamente em um alto nível de abstração. O seu núcleo de simulação foi construído como uma cadeia de threads, que são executadas uma por vez. Sendo assim, essa modelagem do núcleo de simulação do SystemC não é capaz de se beneficiar dos recursos oferecidos pelos novos processadores com mais de um núcleo de processamento, para obter ganhos de desempenho de simulação. Com o aumento da complexidade dos projetos de circuitos eletrônicos e a diminuição dos prazos para que um produto de SoC se torne comercial, o desempenho das simulações se tornou essencial. No presente trabalho, apresenta uma nova versão do SystemC capaz de executar em processadores multinúcleos com ganhos de desempenho de 2,üx à 22,029x à versão original em máquinas de 4 e 12 núcleos de processamento simulando plataformas contendo de 4 a 64 threads. Além disso, também foram realizadas mudanças nas interfaces TLM, para que a sincronização dos processos paralelos seja independente dos eventos hoje presentes no SystemC e, devido às alterações no núcleo de simulação do SystemC, a linguagem de descrição de arquitetura ArchC também foi adaptada para conseguir executar em um ambiente paralelo de simulaçãoAbstract: SystemC is a modeling language for development of hardware systems, such SoCs (Systems-on-Chip) architectural models, and integrated with the methodology and library TLM (Transaction Level Modeling), it offers the required simulation platform infrastructure capable to simulate software and hardware in a fast way at different abstration levels. However, its single thread simulation kernel prevents it from utilizing the potential computing power of multi-core machines to speed up the simulation. With the complexity and the functionality of new circuits and applications size increasing and the time-to-market becoming shorter, the simulation speed-up is essential. In the present work, we introduce a new SystemC version, able to perform in multi-core machines and, consequently, with performance gains of 2.Ox to 22.029x to the original version on machines with 4 and 12 cores simulating platforms with 4 to 64 threads. Furthermore, changes were made on the TLM interfaces for parallel process can synchronize independently of SystemC events, and because the changes in the SystemC simulation kernel, Archc also had to be adapted for execute in a parallel simulation environmentMestradoMestre em Ciência da Computaçã

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Repositorio da Producao Cientifica e Intelectual da Unicamp

SoCRocket - A flexible and extensible Virtual Platform for the development of robust Embedded Systems

Author: Schuster Thomas
Publication venue
Publication date: 08/04/2015
Field of study

Der Schwerpunkt dieser Arbeit liegt in der Erhöhung des Abstraktionsniveaus im Entwurfsprozess, speziell dem Entwurf von Systemen auf Basis von Virtuellen Plattformen (VPs), Transaction-Level-Modellierung (TLM) und SystemC. Es wird eine ganzheitliche Methode vorgestellt, mit der komplexe eingebettete Systeme effizient modelliert werden können. Ergebnis ist eine der RTL-Synthese nahezu gleichgestellte Genauigkeit bei wesentlich höherer Flexibilität und Simulationsgeschwindigkeit. Das SoCRocket-System orientiert sich dazu an existierenden Standards und stellt Methoden zu deren effizientem Einsatz zur Verbesserung von Simulationsgeschwindigkeit und Simulationsgenauigkeit vor. So wird unter anderem gezeigt, wie moderne Multi-Kanal-Protokolle mit Split-Transfers durch Ausgleich des Intertransaktions-Timings ohne die Einführung zusätzlicher Protokollphasen zeitlich genau modelliert werden können. Standardisierungslücken in den Bereichen Speichermodellierung und Systemkonfiguration werden durch standardoffene Lösungen geschlossen. Darüber hinaus wird neue Infrastruktur zur Modellierung von Signalkommunikation auf Transaktionsebene, der Verifikation von Komponenten und der Modellierung des Energieverbrauchs vorgestellt. Zur Demonstration wurden die Kernkomponenten einer im europäischen Raumfahrtsektor maßgeblichen Hardwarebibliothek modelliert. Alle Komponenten wurden zunächst in Unit-Tests verifiziert und anschließend in einem Systemprototypen integriert. Zur Verifikation der Funktion, sowie Bestimmung von Simulationsgeschwindigkeit und zeitlicher Genauigkeit, wurde dieser für unterschiedliche Abstraktionsstufen konfiguriert und mit einem in VHDL beschriebenen RISC-Referenzentwurf (LEON3MP) verglichen. Das System mit losem Timing (LT) und blockierender Kommunikation ist im Durchschnitt 561-mal schneller als die RTL-Referenz und weist eine durchschnittliche Timing-Abweichung von 7,04% auf. Das System mit näherungsweise akkuratem Timing (AT) und nicht-blockierender Kommunikation ist 335-mal schneller. Die durchschnittliche Timing-Abweichung beträgt hier nur noch 3,03%, was einer Standardabweichung von 0.033 und damit einer sehr hohen statistischen Sicherheit entspricht. Die verschiedenen Abstraktionsniveaus können zur Realisierung mehrstufiger Architekturexplorationen eingesetzt werden. Dies wird am Beispiel einer hyperspektralen Bildkompression verdeutlicht.The focus of this work is raising the abstraction level in the development process, especially for the design of systems based on Virtual Platforms (VPs), Transaction Level Modeling (TLM), and SystemC. A holistic method for efficient modeling of complex embedded systems is presented. Results are accuracies close to RTL synthesis but at much higher flexibility, and simulation performance. The SoCRocket system integrates existing standards and introduces new methods for improvement of simulation performance and accuracy. It is shown, amongst others, how modern multi-channel protocols with split transfers can be accurately modeled by compensating inter-transaction timing without introducing additional protocol phases. Standardization gaps in the area of memory modeling and system configuration are closed by standard-open solutions. Furthermore, new infrastructure for modeling signal communication on transaction level, verification of components, and estimating power consumption are presented. All components have been verified in unit tests and were subsequently integrated in a system prototype. For functional verification, as well as measurement of simulation performance and accuracy, the prototype was configured for different abstractions and compared to a VHDL-based RISC reference design (LEON3MP). The loosely-timed platform prototype with blocking communication (LT) is in average 561 times faster than the RTL reference and shows an average timing deviation of 7,04%. The approximately-timed system (AT) with non-blocking communication is 335 times faster. Here, the timing deviation is only 3,03 %, corresponding to a standard deviation of 0.033, proving a very high statistic certainty. The system’s various abstraction levels can be exploited by a multi-stage architecture exploration. This is demonstrated by the example of a hyperspectral image compression

Digitale Bibliothek Braunschweig