Search CORE

48 research outputs found

Recommended from our members

Scalable Emulation of Heterogeneous Systems

Author: Garcia Cota Emilio
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2019
Field of study

The breakdown of Dennard's transistor scaling has driven computing systems toward application-specific accelerators, which can provide orders-of-magnitude improvements in performance and energy efficiency over general-purpose processors. To enable the radical departures from conventional approaches that heterogeneous systems entail, research infrastructure must be able to model processors, memory and accelerators, as well as system-level changes---such as operating system or instruction set architecture (ISA) innovations---that might be needed to realize the accelerators' potential. Unfortunately, existing simulation tools that can support such system-level research are limited by the lack of fast, scalable machine emulators to drive execution. To fill this need, in this dissertation we first present a novel machine emulator design based on dynamic binary translation that makes the following improvements over the state of the art: it scales on multicore hosts while remaining memory efficient, correctly handles cross-ISA differences in atomic instruction semantics, leverages the host floating point (FP) unit to speed up FP emulation without sacrificing correctness, and can be efficiently instrumented to---among other possible uses---drive the execution of a full-system, cross-ISA simulator with support for accelerators. We then demonstrate the utility of machine emulation for studying heterogeneous systems by leveraging it to make two additional contributions. First, we quantify the trade-offs in different coupling models for on-chip accelerators. Second, we present a technique to reuse the private memories of on-chip accelerators when they are otherwise inactive to expand the system's last-level cache, thereby reducing the opportunity cost of the accelerators' integration

Columbia University Academic Commons

Co-simulation techniques based on virtual platforms for SoC design and verification in power electronics applications

Author: Díaz Llerena Edel
Publication venue
Publication date: 01/01/2022
Field of study

En las últimas décadas, la inversión en el ámbito energético ha aumentado considerablemente. Actualmente, existen numerosas empresas que están desarrollando equipos como convertidores de potencia o máquinas eléctricas con sistemas de control de última generación. La tendencia actual es usar System-on-chips y Field Programmable Gate Arrays para implementar todo el sistema de control. Estos dispositivos facilitan el uso de algoritmos de control más complejos y eficientes, mejorando la eficiencia de los equipos y habilitando la integración de los sistemas renovables en la red eléctrica. Sin embargo, la complejidad de los sistemas de control también ha aumentado considerablemente y con ello la dificultad de su verificación. Los sistemas Hardware-in-the-loop (HIL) se han presentado como una solución para la verificación no destructiva de los equipos energéticos, evitando accidentes y pruebas de alto coste en bancos de ensayo. Los sistemas HIL simulan en tiempo real el comportamiento de la planta de potencia y su interfaz para realizar las pruebas con la placa de control en un entorno seguro. Esta tesis se centra en mejorar el proceso de verificación de los sistemas de control en aplicaciones de electrónica potencia. La contribución general es proporcionar una alternativa a al uso de los HIL para la verificación del hardware/software de la tarjeta de control. La alternativa se basa en la técnica de Software-in-the-loop (SIL) y trata de superar o abordar las limitaciones encontradas hasta la fecha en el SIL. Para mejorar las cualidades de SIL se ha desarrollado una herramienta software denominada COSIL que permite co-simular la implementación e integración final del sistema de control, sea software (CPU), hardware (FPGA) o una mezcla de software y hardware, al mismo tiempo que su interacción con la planta de potencia. Dicha plataforma puede trabajar en múltiples niveles de abstracción e incluye soporte para realizar co-simulación mixtas en distintos lenguajes como C o VHDL. A lo largo de la tesis se hace hincapié en mejorar una de las limitaciones de SIL, su baja velocidad de simulación. Se proponen diferentes soluciones como el uso de emuladores software, distintos niveles de abstracción del software y hardware, o relojes locales en los módulos de la FPGA. En especial se aporta un mecanismo de sincronizaron externa para el emulador software QEMU habilitando su emulación multi-core. Esta aportación habilita el uso de QEMU en plataformas virtuales de co-simulacion como COSIL. Toda la plataforma COSIL, incluido el uso de QEMU, se ha analizado bajo diferentes tipos de aplicaciones y bajo un proyecto industrial real. Su uso ha sido crítico para desarrollar y verificar el software y hardware del sistema de control de un convertidor de 400 kVA

e_Buah - Biblioteca Digital de la Universidad de Alcalá

Fuzzing Binaries for Memory Safety Errors with QASan

Author: Delia D. C.
Fioraldi A.
Querzoni L.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2020
Field of study

Fuzz testing techniques are becoming pervasive for their ever-improving ability to generate crashing trial cases for programs. Memory safety violations however can lead to silent corruptions and errors, and a fuzzer may recognize them only in the presence of sanitization machinery. For closed-source software combining sanitization with fuzzing incurs practical obstacles that we try to tackle with an architecture-independent proposal called QASan for detecting heap memory violations. In our tests QASan is competitive with standalone sanitizers and adds a moderate 1.61x average slowdown to the AFL++ fuzzer while enabling it to reveal more heap-related bugs

Archivio della ricerca- Università di Roma La Sapienza

A Retargetable System-Level DBT Hypervisor

Author: Franke Bjoern
Spink Tom
Wagstaff Harry
Publication venue
Publication date: 10/07/2019
Field of study

System-level Dynamic Binary Translation (DBT) provides the capability to boot an Operating System (OS) and execute programs compiled for an Instruction Set Architecture (ISA) different to that of the host machine. Due to their performance critical nature, system-level DBT frameworks are typically hand-coded and heavily optimized, both for their guest and host architectures. While this results in good performance of the DBT system, engineering costs for supporting a new, or extending an existing architecture are high. In this paper we develop a novel, retargetable DBT hypervisor, which includes guest specific modules generated from high-level guest machine specifications. Our system simplifies retargeting of the DBT, but it also delivers performance levels in excess of existing manually created DBT solutions. We achieve this by combining offline and online optimizations, and exploiting the freedom of a Just-in-time (JIT) compiler operating in a bare-metal environment provided by a Virtual Machine (VM) hypervisor. We evaluate our DBT using both targeted micro-benchmarks as well as standard application benchmarks, and we demonstrate its ability to outperform the de-facto standard QEMU DBT system. Our system delivers an average speedup of 2.21× over QEMU across SPEC CPU2006 integer benchmarks running in a full-system Linux OS environment, compiled for the 64-bit ARMv8-A ISA and hosted on an x86-64 platform. For floating-point applications the speedup is even higher, reaching 6.49× on average. We demonstrate that our system-level DBT system significantly reduces the effort required to support a new ISA, while delivering outstanding performance.Publisher PD

Edinburgh Research Explorer

University of St. Andrews - Pure

St Andrews Research Repository

Many-Core Architectures: Hardware-Software Optimization and Modeling Techniques

Author: Pinto Christian <1986>
Publication venue: Alma Mater Studiorum - Università di Bologna
Publication date: 19/05/2015
Field of study

During the last few decades an unprecedented technological growth has been at the center of the embedded systems design paramount, with Moore’s Law being the leading factor of this trend. Today in fact an ever increasing number of cores can be integrated on the same die, marking the transition from state-of-the-art multi-core chips to the new many-core design paradigm. Despite the extraordinarily high computing power, the complexity of many-core chips opens the door to several challenges. As a result of the increased silicon density of modern Systems-on-a-Chip (SoC), the design space exploration needed to find the best design has exploded and hardware designers are in fact facing the problem of a huge design space. Virtual Platforms have always been used to enable hardware-software co-design, but today they are facing with the huge complexity of both hardware and software systems. In this thesis two different research works on Virtual Platforms are presented: the first one is intended for the hardware developer, to easily allow complex cycle accurate simulations of many-core SoCs. The second work exploits the parallel computing power of off-the-shelf General Purpose Graphics Processing Units (GPGPUs), with the goal of an increased simulation speed. The term Virtualization can be used in the context of many-core systems not only to refer to the aforementioned hardware emulation tools (Virtual Platforms), but also for two other main purposes: 1) to help the programmer to achieve the maximum possible performance of an application, by hiding the complexity of the underlying hardware. 2) to efficiently exploit the high parallel hardware of many-core chips in environments with multiple active Virtual Machines. This thesis is focused on virtualization techniques with the goal to mitigate, and overtake when possible, some of the challenges introduced by the many-core design paradigm

AMS Tesi di Dottorato

Efficient runtime management for enabling sustainable performance in real-world mobile applications

Author: Sahin Onur
Publication venue
Publication date: 29/09/2019
Field of study

Mobile devices have become integral parts of our society. They handle our diverse computing needs from simple daily tasks (i.e., text messaging, e-mail) to complex graphics and media processing under a limited battery budget. Mobile system-on-chip (SoC) designs have become increasingly sophisticated to handle performance needs of diverse workloads and to improve user experience. Unfortunately, power and thermal constraints have also emerged as major concerns. Increased power densities and temperatures substantially impair user experience due to frequent throttling as well as diminishing device reliability and battery life. Addressing these concerns becomes increasingly challenging due to increased complexities at both hardware (e.g., heterogeneous CPUs, accelerators) and software (e.g., vast number of applications, multi-threading). Enabling sustained user experience in face of these challenges requires (1) practical runtime management solutions that can reason about the performance needs of users and applications while optimizing power and temperature; (2) tools for analyzing real-world mobile application behavior and performance. This thesis aims at improving sustained user experience under thermal limitations by incorporating insights from real-world mobile applications into runtime management. This thesis first proposes thermally-efficient and Quality-of-Service (QoS) aware runtime management techniques to enable sustained performance. Our work leverages inherent QoS tolerance of users in real-world applications and introduces QoS-temperature tradeoff as a viable control knob to improve user experience under thermal constraints. We present a runtime control framework, QScale, which manages CPU power and scheduling decisions to optimize temperature while strictly adhering to given QoS targets. We also design a framework, Maestro, which provides autonomous and application-aware management of QoS-temperature tradeoffs. Maestro uses our thermally-efficient QoS control framework, QScale, as its foundation. This thesis also presents tools to facilitate studies of real-world mobile applications. We design a practical record and replay system, RandR, to generate repeatable executions of mobile applications. RandR provides this capability by automatically reproducing non-deterministic input sources in mobile applications such as user inputs and network events. Finally, we focus on the non-deterministic executions in Android malware which seek to evade analysis environments. We propose the Proteus system to identify the instruction-level inputs that reveal analysis environments

Boston University Institutional Repository (OpenBU)

Cycle-accurate multicore performance models on FPGAs

Author: Pellauer Michael (Michael Ignatius)
Publication venue: Massachusetts Institute of Technology
Publication date: 01/01/2011
Field of study

Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2011.Cataloged from PDF version of thesis.Includes bibliographical references (p. 159-165).The goal of this project is to improve computer architecture by accelerating cycle-accurate performance modeling of multicore processors using FPGAs. Contributions include a distributed technique controlling simulation on a highly-parallel substrate, hardware design techniques to reduce development effort, and a specific framework for modeling shared-memory multicore processors paired with realistic On-Chip Networks.by Michael Pellauer.Ph.D

DSpace@MIT

Cache Predictability and Performance Improvement in ARINC-653 Compliant Systems

Author: Torres Aurora Dugo Alexy
Publication venue
Publication date: 01/07/2019
Field of study

Depuis les années 2000, les processeurs multi-coeurs sont développés afin de répondre à une demande croissante en performances et miniaturisation. Ces nouvelles architectures viennent remplacer les processeurs mono-coeurs, moins rentables sur le plan des performances et de la consommation énergétique. De par cette transition, les systèmes avioniques actuels se retrouvent face à un défi de taille. Ces systèmes critiques n’utilisent que des processeurs mono-coeurs, éprouvés et validés depuis des années afin de garantir la fiabilité du système. Cependant, les fabriquants de processeurs et autres microcontrôleur délaissent peu à peu ces architectures pour ne produire que des processeurs multi-coeurs. Afin de maintenir les systèmes avioniques critiques, les intégrateurs doivent alors se tourner vers ces nouveaux processeurs. Cependant, cette transition n’est pas sans défi. Outre le fait de devoir assurer la portabilité des applications mono-coeur dans un environnement multi-coeurs, l’utilisation de plusieurs coeurs permet leur exécution concurrente. Ce nouveau paradigme apporte aux systèmes des comportements qui peuvent entrainer, dans certains cas, un dysfonctionnement complet du système. De tels comportements ne sont pas acceptables dans ces systèmes où la moindre faute peut provoquer des pertes humaines. Les systèmes critiques suivent certaines règles afin de garantir leur intégrité. Le standard ARINC-653 définit un ensemble de règles et de recommandations afin de développer ce genre de systèmes. Le standard introduit le concept de système partitionné où chaque partition s’exécute indépendamment des autres et ne peut pas influer sur le comportement du système ou des autres partitions. Ainsi, si une partition vient à fonctionner anormalement, son exécution ne peut compromettre le bon fonctionnement des autres partitions. Le problème émergeant dans les architectures multicoeurs vient du fait que plusieurs partitions peuvent s’exécuter de manière parallèle. Cette nouvelle possibilité introduit de la concurrence sur les ressources du système, ce qui génère des comportements non prévisibles. Ces comportements, appelés interférences apparaissent lorsque plusieurs coeurs partagent les mêmes ressources. Lors d’un accès à ces ressources (mémoire, périphériques, etc.), un arbitrage doit être fait afin d’assurer l’intégrité des données. Cet arbitrage peut causer des délais dans l’accès à une ressource. De plus si plusieurs partitions accèdent à une même ressource, le concept d’isolation n’est plus respecté. Dans le cas des mémoires caches partagées, une partition peut évincer des données utilisées par une autre partition. Dans ce mémoire, nous étudions la possibilité d’empêcher l’évincement de données des caches privés d’un processeur. Cette méthode, appelée cache locking, permet de réduire le nombre de fautes de cache dans les caches privés et ainsi limiter les accès aux caches partagés. Cela permet de réduire les interférences liées aux caches partagés, non seulementen termes de concurrence d’accès, mais aussi d’évincement non voulus de données dans ces caches. Ainsi nous introduisons un outil de profilage d’utilisation de la mémoire dans les systèmes partitionnés. Nous présentons aussi un algorithme associé à cet outil permettant de sélectionner le contenu des mémoires caches devant être empêché d’être évincé. Cet outil propose un processus complet de traitement des traces d’accès mémoire jusqu’à la création des fichiers de configuration. Nous avons validé notre approche par le biais de simulation et d’expérimentation sur matériel réel. Un système d’exploitation temps réel respectant la norme ARINC-653 a été utilisé afin de conduire nos expérimentations. Les résultats obtenus sont encourageants et permettent de comprendre l’impact des méthodes de caches locking pour les systèmes embarqués multi-coeurs.----------ABSTRACT: Due to their energy efficiency and their capability to be miniaturized, multi-core processors have been developed to replace the less cost-effective single-core architectures. Since around year 2000, processor manufacturers slowly stopped producing single-core processors. This raised an issue for avionic system designers. In these critical systems, designers use processors that have proven their reliability through time. However, none of such processors are multi-core. To be able to keep their system up to date, aerospace system designers will have to make the transition to multi-core architectures. This transition brings a lot of challenges to system designers and integrator. Current single-core applications may not be fully portable to multi-core systems. Thus developers will have to make sure the transition is possible for such applications. Multi-core CPUs offer the ability to execute multiple tasks in parallel. From this ability, new behaviors may induce delays, bugs and undefined behaviors that may result in general system failure. In critical systems, where safety is crucial, such comportment is unacceptable. Safety critical systems have to comply with multiple standards and guidance to ensure their reliability. One of the standard Real Time Operating Systems developers may rely on is the ARINC-653. This standard introduces the concept of partitioned systems. In such systems, each partition runs independently and should never be able to modify or impact the behavior of another partition. This concept ensures that if one partition comes to misbehave, the system’s integrity is not impacted. In multi-core systems, multiple applications can run in parallel and access hardware and software resources at the same time. This concurrence, if not correctly managed, will introduce delays in execution time, loss of performances and unwanted behaviors. We call interferences any behavior introduced by this concurrence on the resources shared by different cores or partitions. When concurrent accesses to shared components occur, arbitration has to be done to ensure the data integrity. In most cases, this arbitration is the cause of interferences. However, other sources of interference exist. For instance, if two partitions share the same cache, one partition may evict other partition data from the cache. This leads to unpredictable delays when the next partitions will need to access the evicted data. In this thesis, we explore methods to prevent cache line evictions in private processor caches. This work allows to reduce the number of cache misses occurring at the private level, which reduces the amount of access done to the lower memory levels and reduces interferences related to them. We call this method cache locking. We introduce a framework capable of profiling memory accesses done by applications and propose a cache content selection algorithm that selects cache lines to be locked to reduce cache misses in private caches. We present the toll and the associated processing pipeline, from the memory profiling, to the cache locking configuration table generation. We validated our approach on simulated and actual hardware and used a proprietary ARINC-653 compliant system to conduct our experiments. The results obtained are encouraging and allow to understand the impact of private caches and cache locking methods to reduce multi-core interferences in safety-critical systems

PolyPublie