238 research outputs found
Aging-Aware Request Scheduling for Non-Volatile Main Memory
Modern computing systems are embracing non-volatile memory (NVM) to implement
high-capacity and low-cost main memory. Elevated operating voltages of NVM
accelerate the aging of CMOS transistors in the peripheral circuitry of each
memory bank. Aggressive device scaling increases power density and temperature,
which further accelerates aging, challenging the reliable operation of
NVM-based main memory. We propose HEBE, an architectural technique to mitigate
the circuit aging-related problems of NVM-based main memory. HEBE is built on
three contributions. First, we propose a new analytical model that can
dynamically track the aging in the peripheral circuitry of each memory bank
based on the bank's utilization. Second, we develop an intelligent memory
request scheduler that exploits this aging model at run time to de-stress the
peripheral circuitry of a memory bank only when its aging exceeds a critical
threshold. Third, we introduce an isolation transistor to decouple parts of a
peripheral circuit operating at different voltages, allowing the decoupled
logic blocks to undergo long-latency de-stress operations independently and off
the critical path of memory read and write accesses, improving performance. We
evaluate HEBE with workloads from the SPEC CPU2017 Benchmark suite. Our results
show that HEBE significantly improves both performance and lifetime of
NVM-based main memory.Comment: To appear in ASP-DAC 202
Piattaforme multicore e integrazione tri-dimensionale: analisi architetturale e ottimizzazione
Modern embedded systems embrace many-core shared-memory designs. Due to constrained power and area budgets, most of them feature software-managed scratchpad memories instead of data caches to increase the data locality. It is therefore programmers’ responsibility to explicitly manage the memory transfers, and this make programming these platform cumbersome. Moreover, complex modern applications must be adequately parallelized before they can the parallel potential of the platform into actual performance. To support this, programming languages were proposed, which work at a high level of abstraction, and rely on a runtime whose cost hinders performance, especially in embedded systems, where resources and power budget are constrained. This dissertation explores the applicability of the shared-memory paradigm on modern many-core systems, focusing on the ease-of-programming. It focuses on OpenMP, the de-facto standard for shared memory programming. In a first part, the cost of algorithms for synchronization and data partitioning are analyzed, and they are adapted to modern embedded many-cores. Then, the original design of an OpenMP runtime library is presented, which supports complex forms of parallelism such as multi-level and irregular parallelism. In the second part of the thesis, the focus is on heterogeneous systems, where hardware accelerators are coupled to (many-)cores to implement key functional kernels with orders-of-magnitude of speedup and energy efficiency compared to the “pure software” version. However, three main issues rise, namely i) platform design complexity, ii) architectural scalability and iii) programmability. To tackle them, a template for a generic hardware processing unit (HWPU) is proposed, which share the memory banks with cores, and the template for a scalable architecture is shown, which integrates them through the shared-memory system. Then, a full software stack and toolchain are developed to support platform design and to let programmers exploiting the accelerators of the platform. The OpenMP frontend is extended to interact with it.I sistemi integrati moderni sono architetture many-core, in cui spesso lo spazio di memoria è condiviso fra i processori. Per ridurre i consumi, molte di queste architetture sostituiscono le cache dati con memorie scratchpad gestite in software, per massimizzarne la località alle CPU e aumentare le performance. Questo significa che i dati devono essere spostati manualmente da parte del programmatore. Inoltre, tradurre in perfomance l’enorme parallelismo potenziale delle piattaforme many-core non è semplice. Per supportare la programmazione, diversi programming model sono stati proposti, e siccome lavorano ad un alto livello di astrazione, sfruttano delle librerie di runtime che forniscono servizi di base quali sincronizzazione, allocazione della memoria, threading. Queste librerie hanno un costo, che nei sistemi integrati è troppo elevato e ostacola il raggiungimento delle piene performance. Questa tesi analizza come un programming model ad alto livello di astrazione – OpenMP – possa essere efficientemente supportato, se il suo stack software viene adattato per sfruttare al meglio la piattaforma sottostante. In una prima parte, studio diversi meccanismi di sincronizzazione e comunicazione fra thread paralleli, portati sulle piattaforme many-core. In seguito, li utilizzo per scrivere un runtime di supporto a OpenMP che sia il più possibile efficente e “leggero” e che supporti paradigmi di parallelismo multi-livello e irregolare, spesso presenti nelle applicazioni moderne. Una seconda parte della tesi esplora le architetture eterogenee, ossia con acceleratori hardware. Queste architetture soffrono di problematiche sia i) per il processo di design della piattaforma, che ii) di scalabilità della piattaforma stessa (aumento del numero degli acceleratori e dei processori), che iii) di programmabilità . La tesi propone delle soluzioni a tutti e tre i problemi. Il linguaggio di programmazione usato è OpenMP, sia per la sua grande espressività a livello semantico, sia perché è lo standard de-facto per programmare sistemi a memoria condivisa
Self-Test Mechanisms for Automotive Multi-Processor System-on-Chips
L'abstract è presente nell'allegato / the abstract is in the attachmen
Optimisation des mémoires dans le flot de conception des systèmes multiprocesseurs sur puces pour des applications de type multimédia
RÉSUMÉ
Les systèmes multiprocesseurs sur puce (MPSoC) constituent l'un des principaux moteurs de
la révolution industrielle des semi-conducteurs. Les MPSoCs jouissent d’une popularité
grandissante dans le domaine des systèmes embarquĂ©s. Leur grande capacitĂ© de parallĂ©lisation Ă
un très haut niveau d'intégration, en font de bons candidats pour les systèmes et les applications
telles que les applications multimédia. La consommation d’énergie, la capacité de calcul et
l’espace de conception sont les éléments dont dépendent les performances de ce type
d’applications. La mémoire est le facteur clé permettant d’améliorer de façon substantielle leurs
performances. Avec l’arrivée des applications multimédias embarquées dans l’industrie, le
problème des gains de performances est vital. La masse de données traitées par ces applications
requiert une grande capacité de calcul et de mémoire. Dernièrement, de nouveaux modèles de
programmation ont fait leur apparition. Ces modèles offrent une programmation de plus haut
niveau pour répondre aux besoins croissants des MPSoCs, d’où la nécessité de nouvelles
approches d'optimisation et de placement pour les systèmes embarqués et leurs modèles de
programmation.
La conception niveau système des architectures MPSoCs pour les applications de type
multimédia constitue un véritable défi technique. L’objectif général de cette thèse est de relever
ce défi en trouvant des solutions. Plus spécifiquement, cette thèse se propose d’introduire le
concept d’optimisation mémoire dans le flot de conception niveau système et d’observer leur
impact sur différents modèles de programmation utilisés lors de la conception de MPSoCs. Il
s’agit, autrement dit, de réaliser l’unification du domaine de la compilation avec celui de la
conception niveau système pour une meilleure conception globale.
La contribution de cette thèse est de proposer de nouvelles approches pour les techniques
d'optimisation mémoire pour la conception MPSoCs avec différents modèles de programmation.
Nos travaux de recherche concernent l'intégration des techniques d’optimisation mémoire dans le
flot de conception de MPSoCs pour différents types de modèle de programmation. Ces travaux
ont été exécutés en collaboration avec STMicroelectronics.----------ABSTRACT
Multiprocessor systems-on-chip (MPSoC) are defined as one of the main drivers of the
industrial semiconductors revolution. MPSoCs are gaining popularity in the field of embedded
systems. Pursuant to their great ability to parallelize at a very high integration level, they are
good candidates for systems and applications such as multimedia. Memory is becoming a key
player for significant improvements in these applications (i.e. power, performance and area).
With the emergence of more embedded multimedia applications in the industry, this issue
becomes increasingly vital. The large amount of data manipulated by these applications requires
high-capacity calculation and memory. Lately, new programming models have been introduced.
These programming models offer a higher programming level to answer the increasing needs of
MPSoCs. This leads to the need of new optimization and mapping approaches suitable for
embedded systems and their programming models.
The overall objective of this research is to find solutions to the challenges of system level
design of applications such as multimedia. This entails the development of new approaches and
new optimization techniques. The specific objective of this research is to introduce the concept
of memory optimization in the system level conception flow and study its impact on different
programming models used for MPSoCs’ design. In other words, it is the unification of the
compilation and system level design domains.
The contribution of this research is to propose new approaches for memory optimization
techniques for MPSoCs’ design in different programming models. This thesis relates to the
integration of memory optimization to varying programming model types in the MPSoCs
conception flow. Our research was done in collaboration with STMicroelectronics
Building a RTOS for MPSoC Dataflow Programming
International audienceMultiprocessor Systems-on-Chip (MPSoC) are becoming the standard high performance Digital Signal Processing (DSP) systems. Hardware complexity abstraction is needed to enable efficient MPSoC programming. A major challenge of MPSoC programming is efficiently handling the combination of new features necessary in a MPSoC operating system: load balancing and efficient use of the parallel resources, with the more traditional features of Real-Time Operating Systems (RTOS): resource sharing between applications, task priorities and reactivity to events. This paper presents a method to combine dataflow methods and RTOS features. The resulting system prototypes an RTOS for symmetric multiprocessing MPSoCs whose inputs are dataflow graphs of applications. The prototype is built on the uC/OS-II RTOS. Experimental results are given on a 3GPP Long Term Evolution algorithm executed on a 4-core MPSoC
State of the art baseband DSP platforms for Software Defined Radio: A survey
Software Defined Radio (SDR) is an innovative approach which is becoming a more and more promising technology for future mobile handsets. Several proposals in the field of embedded systems have been introduced by different universities and industries to support SDR applications. This article presents an overview of current platforms and analyzes the related architectural choices, the current issues in SDR, as well as potential future trends.Peer reviewe
- …