Search CORE

381 research outputs found

SARA: Self-Aware Resource Allocation for Heterogeneous MPSoCs

Author: Alavoine Olivier
Lin Bill
Song Yang
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 05/04/2018
Field of study

In modern heterogeneous MPSoCs, the management of shared memory resources is crucial in delivering end-to-end QoS. Previous frameworks have either focused on singular QoS targets or the allocation of partitionable resources among CPU applications at relatively slow timescales. However, heterogeneous MPSoCs typically require instant response from the memory system where most resources cannot be partitioned. Moreover, the health of different cores in a heterogeneous MPSoC is often measured by diverse performance objectives. In this work, we propose a Self-Aware Resource Allocation (SARA) framework for heterogeneous MPSoCs. Priority-based adaptation allows cores to use different target performance and self-monitor their own intrinsic health. In response, the system allocates non-partitionable resources based on priorities. The proposed framework meets a diverse range of QoS demands from heterogeneous cores.Comment: Accepted by the 55th annual Design Automation Conference 2018 (DAC'18

arXiv.org e-Print Archive

Crossref

Dynamic Power Management for Neuromorphic Many-Core Systems

Author: Cederstroem Love
Dixius Andreas
Ellguth Georg
Furber Steve
Garside Jim
Hartmann Stephan
Hoeppner Sebastian
Mayr Christian
Neumaerker Felix
Partzsch Johannes
Plana Luis
Schiefer Stefan
Scholze Stefan
Vogginger Bernhard
Yan Yexin
Publication venue
Publication date: 01/01/2019
Field of study

This work presents a dynamic power management architecture for neuromorphic many core systems such as SpiNNaker. A fast dynamic voltage and frequency scaling (DVFS) technique is presented which allows the processing elements (PE) to change their supply voltage and clock frequency individually and autonomously within less than 100 ns. This is employed by the neuromorphic simulation software flow, which defines the performance level (PL) of the PE based on the actual workload within each simulation cycle. A test chip in 28 nm SLP CMOS technology has been implemented. It includes 4 PEs which can be scaled from 0.7 V to 1.0 V with frequencies from 125 MHz to 500 MHz at three distinct PLs. By measurement of three neuromorphic benchmarks it is shown that the total PE power consumption can be reduced by 75%, with 80% baseline power reduction and a 50% reduction of energy per neuron and synapse computation, all while maintaining temporary peak system performance to achieve biological real-time operation of the system. A numerical model of this power management model is derived which allows DVFS architecture exploration for neuromorphics. The proposed technique is to be used for the second generation SpiNNaker neuromorphic many core system

arXiv.org e-Print Archive

The University of Manchester - Institutional Repository

Leveraging Hardware QoS to Control Contention in the Xilinx Zynq UltraScale+ MPSoC

Author: Abella Jaume
Cazorla Francisco J.
Mezzetti Enrico
Reina Juan M.
Serrano-Cases Alejandro
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 33rd Euromicro Conference on Real-Time Systems (ECRTS 2021)
Publication date: 01/01/2021
Field of study

The interference co-running tasks generate on each other’s timing behavior continues to be one of the main challenges to be addressed before Multi-Processor System-on-Chip (MPSoCs) are fully embraced in critical systems like those deployed in avionics and automotive domains. Modern MPSoCs like the Xilinx Zynq UltraScale+ incorporate hardware Quality of Service (QoS) mechanisms that can help controlling contention among tasks. Given the distributed nature of modern MPSoCs, the route a request follows from its source (usually a compute element like a CPU) to its target (usually a memory) crosses several QoS points, each one potentially implementing a different QoS mechanism. Mastering QoS mechanisms individually, as well as their combined operation, is pivotal to obtain the expected benefits from the QoS support. In this work, we perform, to our knowledge, the first qualitative and quantitative analysis of the distributed QoS mechanisms in the Xilinx UltraScale+ MPSoC. We empirically derive QoS information not covered by the technical documentation, and show limitations and benefits of the available QoS support. To that end, we use a case study building on neural network kernels commonly used in autonomous systems in different real-time domains.This work has been partially supported by the Spanish Ministry of Science and Innovation under grant PID2019-107255GB; the European Union’s Horizon 2020 research and innovation programme under grant agreement No. 878752 (MASTECS) and the European Research Council (ERC) grant agreement No. 772773 (SuPerCom).Peer ReviewedPostprint (published version

UPCommons. Portal del coneixement obert de la UPC

Dagstuhl Research Online Publication Server

IRON-MAN: An Approach To Perform Temporal Motionless Analysis of Video using CNN in MPSoC

Author: Dey Somdip
McDonald-Maier Klaus
Prasad Dilip Kumar
Singh Amit Kumar
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2020
Field of study

This paper proposes a novel human-inspired methodology called IRON-MAN ( Integrated RatiONal prediction and Motionless ANalysis ) for mobile multi-processor systems-on-chips (MPSoCs). The methodology integrates analysis of the previous image frames of the video to represent the analysis of the current frame in order to perform Temporal Motionless Analysis of the Video ( TMAV ). This is the first work on TMAV using Convolutional Neural Network (CNN) for scene prediction in MPSoCs. Experimental results show that our methodology outperforms state-of-the-art. We also introduce a metric named, Energy Consumption per Training Image ( ECTI ) to assess the suitability of using a CNN model in mobile MPSoCs with a focus on energy consumption and lifespan reliability of the device

University of Essex Research Repository

Crossref

Munin - Open Research Archive

NORA - Norwegian Open Research Archives

York St John University Institutional Repository

Laitteistokiihdytetyn vuoronnuksen suorituskykyanalyysi

Author: Hanhirova Jussi
Publication venue
Publication date: 01/12/2014
Field of study

Performance analysis of heterogeneous MPSoCs (Multiprocessor System-on-Chip) is difficult. The non-determinism of parallel computation, communication delays and memory accesses force the system components into complex interaction. Hardware acceleration is used both to speed up the computations and the scheduling on MPSoCs. Finding an accompanying software structuring and efficient scheduling algorithms is not a straightforward task. In this thesis we investigate the use of simulation, measurement and modeling methods for analyzing the performance of heterogeneous MPSoCs. The viewpoint of this thesis is in simulation and modeling: How a high abstraction level simulation methodology can be used in modeling and analyzing of parallel systems based on MPSoCs. In particular we are interested in efficient use of hardware accelerated scheduling mechanisms and how they can be analyzed. Both parallel simulation and simulation of parallel systems contains many different methods, tools and approaches that attempt to balance between competing goals and cope with a specific subset of the problem space. Challenge is that in all approaches most of the simulation and modeling related problems remain and new challenges emerge. This thesis shows that the resource network methodology and dynamic scheduling models are a viable approach in modeling heterogeneous MPSoCs with accelerators. Concrete contributions are based on upgrading an existing simulation framework to support parallelism. Main contribution is on one hand that modeling concepts have been widened, and on the other hand that the supporting mechanisms have been implemented. The thesis work in progress was published in a peer reviewed international scientific workshop and the final results in a peer reviewed international scientific conference. The toolset has also been used in multiuniversity organized teaching and by the industry.Heterogeenisten moniydinjärjestelmien suorituskykyanalyysi on haasteellista. Laskennan epä-deterministisyys, kommunikaatioviiveet ja lukuisat muistioperaatiot saattavat järjestelmän komponentit monimutkaisiin vuorovaikutussuhteisiin. Laitteistokiihdytettyjä ajoitusmenetelmiä käytetään nopeuttamaan ajoituspäätöksiä. Sopivan ohjelmarakenteen ja tehokkaiden ajoitusalgoritmien löytäminen ei ole helppoa. Tässä työssä tutkitaan miten simulointi-, mittaus- ja mallinnusmenetelmiä voi käyttää laitteistokiihdytettyjen moniydinjärjestelmien suorituskykyanalyysiin. Työn näkökulma on simuloinnissa ja mallinnuksessa: Miten korkean abstraktiotason simulointimenetelmät soveltuvat moniydinjärjestelmiin pohjautuvien rinnakkaisten järjestelmien mallinnukseen ja suorituskykyanalyysiin. Erityisen kiinnostuksen kohteena on laitteistokiihdytteisten ajoitusmenetelmien tehokas käyttö sekä analysointi. Rinnakkaissimulointi pitää sisällään erilaisia menetelmiä, työkaluja ja lähestymistapoja jotka pyrkivät tasapainottelemaan ristiriitaisten tavoitteiden välillä. Haasteena on se, että kaikissa lähestymistavoissa simulaation ja mallinnuksen useimmat ongelmat säilyvät ja uusia ongelmia ilmaantuu. Työn tulokset viittaavat siihen että resurssiverkkopohjainen menetelmä dynaamisen ajoituksen kanssa on toimiva lähestymistapa rinnakkaisten järjestelmien suorituskykyanalyysiin. Työn konkreettiset tulokset pitävät sisällään olemassa olevan simulointiympäristön päivittämisen rinnakkaisuutta tukevaksi. Keskeinen tulos on toisaalta se että mallinnusmenetelmiä on laajennettu ja toisaalta se että näitä tukevat mekanismit on toteutettu. Keskeneräisen työn tulokset on julkaistu vertaisarvioidussa tieteellisessä seminaarissa ja valmiin työn tulokset vertaisarvioidussa tieteellisessä konferenssissa. Simulointiympäristöä on käytetty usean yliopiston järjestämässä yhteisopetuksessa sekä teollisuudessa

Aaltodoc Publication Archive

MPSoCBench : um framework para avaliação de ferramentas e metodologias para sistemas multiprocessados em chip

Author: Garanhani Liana Dessandre Duenha, 1977-
Publication venue: [s.n.]
Publication date: 30/08/2018
Field of study

Orientador: Rodolfo Jardim de AzevedoTese (doutorado) - Universidade Estadual de Campinas, Instituto de ComputaçãoResumo: Recentes metodologias e ferramentas de projetos de sistemas multiprocessados em chip (MPSoC) aumentam a produtividade por meio da utilização de plataformas baseadas em simuladores, antes de definir os últimos detalhes da arquitetura. No entanto, a simulação só é eficiente quando utiliza ferramentas de modelagem que suportem a descrição do comportamento do sistema em um elevado nível de abstração. A escassez de plataformas virtuais de MPSoCs que integrem hardware e software escaláveis nos motivou a desenvolver o MPSoCBench, que consiste de um conjunto escalável de MPSoCs incluindo quatro modelos de processadores (PowerPC, MIPS, SPARC e ARM), organizado em plataformas com 1, 2, 4, 8, 16, 32 e 64 núcleos, cross-compiladores, IPs, interconexões, 17 aplicações paralelas e estimativa de consumo de energia para os principais componentes (processadores, roteadores, memória principal e caches). Uma importante demanda em projetos MPSoC é atender às restrições de consumo de energia o mais cedo possível. Considerando que o desempenho do processador está diretamente relacionado ao consumo, há um crescente interesse em explorar o trade-off entre consumo de energia e desempenho, tendo em conta o domínio da aplicação alvo. Técnicas de escalabilidade dinâmica de freqüência e voltagem fundamentam-se em gerenciar o nível de tensão e frequência da CPU, permitindo que o sistema alcance apenas o desempenho suficiente para processar a carga de trabalho, reduzindo, consequentemente, o consumo de energia. Para explorar a eficiência energética e desempenho, foram adicionados recursos ao MPSoCBench, visando explorar escalabilidade dinâmica de voltaegem e frequência (DVFS) e foram validados três mecanismos com base na estimativa dinâmica de energia e taxa de uso de CPUAbstract: Recent design methodologies and tools aim at enhancing the design productivity by providing a software development platform before the definition of the final Multiprocessor System on Chip (MPSoC) architecture details. However, simulation can only be efficiently performed when using a modeling and simulation engine that supports system behavior description at a high abstraction level. The lack of MPSoC virtual platform prototyping integrating both scalable hardware and software in order to create and evaluate new methodologies and tools motivated us to develop the MPSoCBench, a scalable set of MPSoCs including four different ISAs (PowerPC, MIPS, SPARC, and ARM) organized in platforms with 1, 2, 4, 8, 16, 32, and 64 cores, cross-compilers, IPs, interconnections, 17 parallel version of software from well-known benchmarks, and power consumption estimation for main components (processors, routers, memory, and caches). An important demand in MPSoC designs is the addressing of energy consumption constraints as early as possible. Whereas processor performance comes with a high power cost, there is an increasing interest in exploring the trade-off between power and performance, taking into account the target application domain. Dynamic Voltage and Frequency Scaling techniques adaptively scale the voltage and frequency levels of the CPU allowing it to reach just enough performance to process the system workload while meeting throughput constraints, and thereby, reducing the energy consumption. To explore this wide design space for energy efficiency and performance, both for hardware and software components, we provided MPSoCBench features to explore dynamic voltage and frequency scalability (DVFS) and evaluated three mechanisms based on energy estimation and CPU usage rateDoutoradoCiência da ComputaçãoDoutora em Ciência da Computaçã

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Repositorio da Producao Cientifica e Intelectual da Unicamp

The Road towards Predictable Automotive High-Performance Platforms

Author: Arne Hamann
Dirk Ziegenbein
Falk Rehm
Giovanni Stea
Jan-Peter Larsson
Jorg Seitter
Matteo Andreozzi
Raffaele Zippo
Selma Saidi
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2021
Field of study

Due to the trends of centralizing the E/E architecture and new computing-intensive applications, high-performance hardware platforms are currently finding their way into automotive systems. However, the SoCs currently available on the market have significant weaknesses when it comes to providing predictable performance for time-critical applications. The main reason for this is that these platforms are optimized for averagecase performance. This shortcoming represents one major risk in the development of current and future automotive systems. In this paper we describe how high-performance and predictability could (and should) be reconciled in future HW/SW platforms. We believe that this goal can only be reached in a close collaboration between system suppliers, IP providers, semiconductor companies, and OS/hypervisor vendors. Furthermore, academic input will be needed to solve remaining challenges and to further improve initial solutions

Archivio della Ricerca - Università di Pisa

Energy and reliability challenges in next generation devices: integrated software solutions

Author
Publication venue: Università degli Studi di Cagliari
Publication date: 24/02/2011
Field of study

Archivio istituzionale della ricerca - Università di Cagliari

Multiprocessor System-on-Chips based Wireless Sensor Network Energy Optimization

Author: Ali Haider
Publication venue: Department of Electronics, Computing and Mathematics
Publication date: 01/01/2020
Field of study

Wireless Sensor Network (WSN) is an integrated part of the Internet-of-Things (IoT) used to monitor the physical or environmental conditions without human intervention. In WSN one of the major challenges is energy consumption reduction both at the sensor nodes and network levels. High energy consumption not only causes an increased carbon footprint but also limits the lifetime (LT) of the network. Network-on-Chip (NoC) based Multiprocessor System-on-Chips (MPSoCs) are becoming the de-facto computing platform for computationally extensive real-time applications in IoT due to their high performance and exceptional quality-of-service. In this thesis a task scheduling problem is investigated using MPSoCs architecture for tasks with precedence and deadline constraints in order to minimize the processing energy consumption while guaranteeing the timing constraints. Moreover, energy-aware nodes clustering is also performed to reduce the transmission energy consumption of the sensor nodes. Three distinct problems for energy optimization are investigated given as follows: First, a contention-aware energy-efficient static scheduling using NoC based heterogeneous MPSoC is performed for real-time tasks with an individual deadline and precedence constraints. An offline meta-heuristic based contention-aware energy-efficient task scheduling is developed that performs task ordering, mapping, and voltage assignment in an integrated manner. Compared to state-of-the-art scheduling our proposed algorithm significantly improves the energy-efficiency. Second, an energy-aware scheduling is investigated for a set of tasks with precedence constraints deploying Voltage Frequency Island (VFI) based heterogeneous NoC-MPSoCs. A novel population based algorithm called ARSH-FATI is developed that can dynamically switch between explorative and exploitative search modes at run-time. ARSH-FATI performance is superior to the existing task schedulers developed for homogeneous VFI-NoC-MPSoCs. Third, the transmission energy consumption of the sensor nodes in WSN is reduced by developing ARSH-FATI based Cluster Head Selection (ARSH-FATI-CHS) algorithm integrated with a heuristic called Novel Ranked Based Clustering (NRC). In cluster formation parameters such as residual energy, distance parameters, and workload on CHs are considered to improve LT of the network. The results prove that ARSH-FATI-CHS outperforms other state-of-the-art clustering algorithms in terms of LT.University of Derby, Derby, U

UDORA - University of Derby Online Research Archive

AdaMD: Adaptive Mapping and DVFS for Energy-efficient Heterogeneous Multi-cores

Author: Al-Hashimi Bashir M
Basireddy Karunakar R
Merrett Geoff V
Singh Amit Kumar
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/10/2020
Field of study

Modern heterogeneous multi-core systems, containing various types of cores, are increasingly dealing with concurrent execution of dynamic application workloads. Moreover, the performance constraints of each application vary, and applications enter/exit the system at any time. Existing approaches are not efficient in such dynamic scenarios, especially if applications are unknown, as they require extensive offline application analysis and do not consider the runtime execution scenarios (application arrival/completion, and workload and performance variations) for runtime management. To address this, we present AdaMD, an adaptive mapping and dynamic voltage and frequency scaling (DVFS) approach for improving energy consumption and performance. The key feature of the proposed approach is the elimination of dependency on offline profiled results while making runtime decisions. This is achieved through a performance prediction model having a maximum error of 7.9% lower than the previously reported model and a mapping approach that allocates processing cores to applications while respecting performance constraints. Furthermore, AdaMD adapts to runtime execution scenarios efficiently by monitoring the application status, and performance/workload variations to adjust the previous DVFS settings and thread-to-core mappings. The proposed approach is experimentally validated on the Odroid-XU3, with various combinations of diverse multi-threaded applications from PARSEC and SPLASH benchmarks. Results show energy savings of up to 28% compared to the recently proposed approach while meeting performance constraints

University of Essex Research Repository

Southampton (e-Prints Soton)