301 research outputs found

    Mapping parallelism to heterogeneous processors

    Get PDF
    Most embedded devices are based on heterogeneous Multiprocessor System on Chips (MPSoCs). These contain a variety of processors like CPUs, micro-controllers, DSPs, GPUs and specialised accelerators. The heterogeneity of these systems helps in achieving good performance and energy efficiency but makes programming inherently difficult. There is no single programming language or runtime to program such platforms. This thesis makes three contributions to these problems. First, it presents a framework that allows code in Single Program Multiple Data (SPMD) form to be mapped to a heterogeneous platform. The mapping space is explored, and it is shown that the best mapping depends on the metric used. Next, a compiler framework is presented which bridges the gap between the high -level programming model of OpenMP and the heterogeneous resources of MPSoCs. It takes OpenMP programs and generates code which runs on all processors. It delivers programming ease while exploiting heterogeneous resources. Finally, a compiler-based approach to runtime power management for heterogeneous cores is presented. Given an externally provided budget, the approach generates heterogeneous, partitioned code that attempts to give the best performance within that budget

    Low power architectures for streaming applications

    Get PDF

    Performance Estimation of Task Graphs Based on Path Profiling

    Get PDF
    Correctly estimating the speed-up of a parallel embedded application is crucial to efficiently compare different parallelization techniques, task graph transformations or mapping and scheduling solutions. Unfortunately, especially in case of control-dominated applications, task correlations may heavily affect the execution time of the solutions and usually this is not properly taken into account during performance analysis. We propose a methodology that combines a single profiling of the initial sequential specification with different decisions in terms of partitioning, mapping, and scheduling in order to better estimate the actual speed-up of these solutions. We validated our approach on a multi-processor simulation platform: experimental results show that our methodology, effectively identifying the correlations among tasks, significantly outperforms existing approaches for speed-up estimation. Indeed, we obtained an absolute error less than 5 % in average, even when compiling the code with different optimization levels

    Power, Performance, and Energy Management of Heterogeneous Architectures

    Get PDF
    abstract: Many core modern multiprocessor systems-on-chip offers tremendous power and performance optimization opportunities by tuning thousands of potential voltage, frequency and core configurations. Applications running on these architectures are becoming increasingly complex. As the basic building blocks, which make up the application, change during runtime, different configurations may become optimal with respect to power, performance or other metrics. Identifying the optimal configuration at runtime is a daunting task due to a large number of workloads and configurations. Therefore, there is a strong need to evaluate the metrics of interest as a function of the supported configurations. This thesis focuses on two different types of modern multiprocessor systems-on-chip (SoC): Mobile heterogeneous systems and tile based Intel Xeon Phi architecture. For mobile heterogeneous systems, this thesis presents a novel methodology that can accurately instrument different types of applications with specific performance monitoring calls. These calls provide a rich set of performance statistics at a basic block level while the application runs on the target platform. The target architecture used for this work (Odroid XU3) is capable of running at 4940 different frequency and core combinations. With the help of instrumented application vast amount of characterization data is collected that provides details about performance, power and CPU state at every instrumented basic block across 19 different types of applications. The vast amount of data collected has enabled two runtime schemes. The first work provides a methodology to find optimal configurations in heterogeneous architecture using classifiers and demonstrates an average increase of 93%, 81% and 6% in performance per watt compared to the interactive, ondemand and powersave governors, respectively. The second work using same data shows a novel imitation learning framework for dynamically controlling the type, number, and the frequencies of active cores to achieve an average of 109% PPW improvement compared to the default governors. This work also presents how to accurately profile tile based Intel Xeon Phi architecture while training different types of neural networks using open image dataset on deep learning framework. The data collected allows deep exploratory analysis. It also showcases how different hardware parameters affect performance of Xeon Phi.Dissertation/ThesisMasters Thesis Engineering 201

    Laitteistokiihdytetyn vuoronnuksen suorituskykyanalyysi

    Get PDF
    Performance analysis of heterogeneous MPSoCs (Multiprocessor System-on-Chip) is difficult. The non-determinism of parallel computation, communication delays and memory accesses force the system components into complex interaction. Hardware acceleration is used both to speed up the computations and the scheduling on MPSoCs. Finding an accompanying software structuring and efficient scheduling algorithms is not a straightforward task. In this thesis we investigate the use of simulation, measurement and modeling methods for analyzing the performance of heterogeneous MPSoCs. The viewpoint of this thesis is in simulation and modeling: How a high abstraction level simulation methodology can be used in modeling and analyzing of parallel systems based on MPSoCs. In particular we are interested in efficient use of hardware accelerated scheduling mechanisms and how they can be analyzed. Both parallel simulation and simulation of parallel systems contains many different methods, tools and approaches that attempt to balance between competing goals and cope with a specific subset of the problem space. Challenge is that in all approaches most of the simulation and modeling related problems remain and new challenges emerge. This thesis shows that the resource network methodology and dynamic scheduling models are a viable approach in modeling heterogeneous MPSoCs with accelerators. Concrete contributions are based on upgrading an existing simulation framework to support parallelism. Main contribution is on one hand that modeling concepts have been widened, and on the other hand that the supporting mechanisms have been implemented. The thesis work in progress was published in a peer reviewed international scientific workshop and the final results in a peer reviewed international scientific conference. The toolset has also been used in multiuniversity organized teaching and by the industry.Heterogeenisten moniydinjärjestelmien suorituskykyanalyysi on haasteellista. Laskennan epä-deterministisyys, kommunikaatioviiveet ja lukuisat muistioperaatiot saattavat järjestelmän komponentit monimutkaisiin vuorovaikutussuhteisiin. Laitteistokiihdytettyjä ajoitusmenetelmiä käytetään nopeuttamaan ajoituspäätöksiä. Sopivan ohjelmarakenteen ja tehokkaiden ajoitusalgoritmien löytäminen ei ole helppoa. Tässä työssä tutkitaan miten simulointi-, mittaus- ja mallinnusmenetelmiä voi käyttää laitteistokiihdytettyjen moniydinjärjestelmien suorituskykyanalyysiin. Työn näkökulma on simuloinnissa ja mallinnuksessa: Miten korkean abstraktiotason simulointimenetelmät soveltuvat moniydinjärjestelmiin pohjautuvien rinnakkaisten järjestelmien mallinnukseen ja suorituskykyanalyysiin. Erityisen kiinnostuksen kohteena on laitteistokiihdytteisten ajoitusmenetelmien tehokas käyttö sekä analysointi. Rinnakkaissimulointi pitää sisällään erilaisia menetelmiä, työkaluja ja lähestymistapoja jotka pyrkivät tasapainottelemaan ristiriitaisten tavoitteiden välillä. Haasteena on se, että kaikissa lähestymistavoissa simulaation ja mallinnuksen useimmat ongelmat säilyvät ja uusia ongelmia ilmaantuu. Työn tulokset viittaavat siihen että resurssiverkkopohjainen menetelmä dynaamisen ajoituksen kanssa on toimiva lähestymistapa rinnakkaisten järjestelmien suorituskykyanalyysiin. Työn konkreettiset tulokset pitävät sisällään olemassa olevan simulointiympäristön päivittämisen rinnakkaisuutta tukevaksi. Keskeinen tulos on toisaalta se että mallinnusmenetelmiä on laajennettu ja toisaalta se että näitä tukevat mekanismit on toteutettu. Keskeneräisen työn tulokset on julkaistu vertaisarvioidussa tieteellisessä seminaarissa ja valmiin työn tulokset vertaisarvioidussa tieteellisessä konferenssissa. Simulointiympäristöä on käytetty usean yliopiston järjestämässä yhteisopetuksessa sekä teollisuudessa

    Improving Model-Based Software Synthesis: A Focus on Mathematical Structures

    Get PDF
    Computer hardware keeps increasing in complexity. Software design needs to keep up with this. The right models and abstractions empower developers to leverage the novelties of modern hardware. This thesis deals primarily with Models of Computation, as a basis for software design, in a family of methods called software synthesis. We focus on Kahn Process Networks and dataflow applications as abstractions, both for programming and for deriving an efficient execution on heterogeneous multicores. The latter we accomplish by exploring the design space of possible mappings of computation and data to hardware resources. Mapping algorithms are not at the center of this thesis, however. Instead, we examine the mathematical structure of the mapping space, leveraging its inherent symmetries or geometric properties to improve mapping methods in general. This thesis thoroughly explores the process of model-based design, aiming to go beyond the more established software synthesis on dataflow applications. We starting with the problem of assessing these methods through benchmarking, and go on to formally examine the general goals of benchmarks. In this context, we also consider the role modern machine learning methods play in benchmarking. We explore different established semantics, stretching the limits of Kahn Process Networks. We also discuss novel models, like Reactors, which are designed to be a deterministic, adaptive model with time as a first-class citizen. By investigating abstractions and transformations in the Ohua language for implicit dataflow programming, we also focus on programmability. The focus of the thesis is in the models and methods, but we evaluate them in diverse use-cases, generally centered around Cyber-Physical Systems. These include the 5G telecommunication standard, automotive and signal processing domains. We even go beyond embedded systems and discuss use-cases in GPU programming and microservice-based architectures

    A Model-based Design Framework for Application-specific Heterogeneous Systems

    Get PDF
    The increasing heterogeneity of computing systems enables higher performance and power efficiency. However, these improvements come at the cost of increasing the overall complexity of designing such systems. These complexities include constructing implementations for various types of processors, setting up and configuring communication protocols, and efficiently scheduling the computational work. The process for developing such systems is iterative and time consuming, with no well-defined performance goal. Current performance estimation approaches use source code implementations that require experienced developers and time to produce. We present a framework to aid in the design of heterogeneous systems and the performance tuning of applications. Our framework supports system construction: integrating custom hardware accelerators with existing cores into processors, integrating processors into cohesive systems, and mapping computations to processors to achieve overall application performance and efficient hardware usage. It also facilitates effective design space exploration using processor models (for both existing and future processors) that do not require source code implementations to estimate performance. We evaluate our framework using a variety of applications and implement them in systems ranging from low power embedded systems-on-chip (SoC) to high performance systems consisting of commercial-off-the-shelf (COTS) components. We show how the design process is improved, reducing the number of design iterations and unnecessary source code development ultimately leading to higher performing efficient systems

    MPSoCBench : um framework para avaliação de ferramentas e metodologias para sistemas multiprocessados em chip

    Get PDF
    Orientador: Rodolfo Jardim de AzevedoTese (doutorado) - Universidade Estadual de Campinas, Instituto de ComputaçãoResumo: Recentes metodologias e ferramentas de projetos de sistemas multiprocessados em chip (MPSoC) aumentam a produtividade por meio da utilização de plataformas baseadas em simuladores, antes de definir os últimos detalhes da arquitetura. No entanto, a simulação só é eficiente quando utiliza ferramentas de modelagem que suportem a descrição do comportamento do sistema em um elevado nível de abstração. A escassez de plataformas virtuais de MPSoCs que integrem hardware e software escaláveis nos motivou a desenvolver o MPSoCBench, que consiste de um conjunto escalável de MPSoCs incluindo quatro modelos de processadores (PowerPC, MIPS, SPARC e ARM), organizado em plataformas com 1, 2, 4, 8, 16, 32 e 64 núcleos, cross-compiladores, IPs, interconexões, 17 aplicações paralelas e estimativa de consumo de energia para os principais componentes (processadores, roteadores, memória principal e caches). Uma importante demanda em projetos MPSoC é atender às restrições de consumo de energia o mais cedo possível. Considerando que o desempenho do processador está diretamente relacionado ao consumo, há um crescente interesse em explorar o trade-off entre consumo de energia e desempenho, tendo em conta o domínio da aplicação alvo. Técnicas de escalabilidade dinâmica de freqüência e voltagem fundamentam-se em gerenciar o nível de tensão e frequência da CPU, permitindo que o sistema alcance apenas o desempenho suficiente para processar a carga de trabalho, reduzindo, consequentemente, o consumo de energia. Para explorar a eficiência energética e desempenho, foram adicionados recursos ao MPSoCBench, visando explorar escalabilidade dinâmica de voltaegem e frequência (DVFS) e foram validados três mecanismos com base na estimativa dinâmica de energia e taxa de uso de CPUAbstract: Recent design methodologies and tools aim at enhancing the design productivity by providing a software development platform before the definition of the final Multiprocessor System on Chip (MPSoC) architecture details. However, simulation can only be efficiently performed when using a modeling and simulation engine that supports system behavior description at a high abstraction level. The lack of MPSoC virtual platform prototyping integrating both scalable hardware and software in order to create and evaluate new methodologies and tools motivated us to develop the MPSoCBench, a scalable set of MPSoCs including four different ISAs (PowerPC, MIPS, SPARC, and ARM) organized in platforms with 1, 2, 4, 8, 16, 32, and 64 cores, cross-compilers, IPs, interconnections, 17 parallel version of software from well-known benchmarks, and power consumption estimation for main components (processors, routers, memory, and caches). An important demand in MPSoC designs is the addressing of energy consumption constraints as early as possible. Whereas processor performance comes with a high power cost, there is an increasing interest in exploring the trade-off between power and performance, taking into account the target application domain. Dynamic Voltage and Frequency Scaling techniques adaptively scale the voltage and frequency levels of the CPU allowing it to reach just enough performance to process the system workload while meeting throughput constraints, and thereby, reducing the energy consumption. To explore this wide design space for energy efficiency and performance, both for hardware and software components, we provided MPSoCBench features to explore dynamic voltage and frequency scalability (DVFS) and evaluated three mechanisms based on energy estimation and CPU usage rateDoutoradoCiência da ComputaçãoDoutora em Ciência da Computaçã

    Design Space Exploration and Resource Management of Multi/Many-Core Systems

    Get PDF
    The increasing demand of processing a higher number of applications and related data on computing platforms has resulted in reliance on multi-/many-core chips as they facilitate parallel processing. However, there is a desire for these platforms to be energy-efficient and reliable, and they need to perform secure computations for the interest of the whole community. This book provides perspectives on the aforementioned aspects from leading researchers in terms of state-of-the-art contributions and upcoming trends
    corecore