11 research outputs found

    High-Performance Simultaneous Multiprocessing for Heterogeneous System-on-Chip

    Get PDF
    This paper presents a methodology for simultaneous heterogeneous computing, named ENEAC, where a quad core ARM Cortex-A53 CPU works in tandem with a preprogrammed on-board FPGA accelerator. A heterogeneous scheduler distributes the tasks optimally among all the resources and all compute units run asynchronously, which allows for improved performance for irregular workloads. ENEAC achieves up to 17\% performance improvement \ignore{and 14\% energy usage reduction,} when using all platform resources compared to just using the FPGA accelerators and up to 865\% performance increase \ignore{and up to 89\% energy usage decrease} when using just the CPU. The workflow uses existing commercial tools and C/C++ as a single programming language for both accelerator design and CPU programming for improved productivity and ease of verification.Comment: 7 pages, 5 figures, 1 table Presented at the 13th International Workshop on Programmability and Architectures for Heterogeneous Multicores, 2020 (arXiv:2005.07619

    Thermal and Performance Efficient On-Chip Surface-Wave Communication for Many-Core Systems in Dark Silicon Era

    Get PDF
    Due to the exceedingly high integration density of VLSI circuits and the resulting high power density, thermal integrity became a major challenge. One way to tackle this problem is Dark silicon. Dark silicon is the amount of circuitry in a chip that is forced to switch off to insure thermal integrity of the system and prevent permanent thermal-related faults. In many-core systems, the presence of Dark Silicon adds new design constraints, in general, and on the communication fabric of such systems, in particular. This is due to the fact that system-level thermal-management systems tend to increase the distance between high activity cores to insure better thermal balancing and integrity. Consequently, a designing dilemma is created where a compromise has to be made between interconnect performance and power consumption. This study proposes a hybrid wire and surface-wave interconnect (SWI) based Network-on-Chip (NoC) to address the dark silicon challenge. Through efficient utilization of one-hop cross the chip communication SWI links, the proposed architecture is able to offer an efficient and scalable communication platform in terms of performance, power, and thermal impact. As a result, evaluations of the proposed architecture compared to baseline architecture under dark silicon scenarios show reduction in maximum temperature by 15°C, average delay up to 73.1%, and energy-saving up to ~3X. This study explores the promising potential of the proposed architecture in extending the utilization wall for current and future many-core systems in dark silicon era

    Energy Proportionality in Near-Threshold Computing Servers and Cloud Data Centers: Consolidating or Not?

    Get PDF
    Cloud Computing aims to efficiently tackle the increasing demand of computing resources, and its popularity has led to a dramatic increase in the number of computing servers and data centers worldwide. However, as effect of post-Dennard scaling, computing servers have become power-limited, and new system-level approaches must be used to improve their energy efficiency. This paper first presents an accurate power modelling characterization for a new server architecture based on the FD-SOI process technology for near-threshold computing (NTC). Then, we explore the existing energy vs. performance trade-offs when virtualized applications with different CPU utilization and memory footprint characteristics are executed. Finally, based on this analysis, we propose a novel dynamic virtual machine (VM) allocation method that exploits the knowledge of VMs characteristics together with our accurate server power model for next-generation NTC-based data centers, while guaranteeing quality of service (QoS) requirements. Our results demonstrate the inefficiency of current workload consolidation techniques for new NTC-based data center designs, and how our proposed method provides up to 45% energy savings when compared to state-of-the-art consolidation-based approaches

    Adaptive Knobs for Resource Efficient Computing

    Get PDF
    Performance demands of emerging domains such as artificial intelligence, machine learning and vision, Internet-of-things etc., continue to grow. Meeting such requirements on modern multi/many core systems with higher power densities, fixed power and energy budgets, and thermal constraints exacerbates the run-time management challenge. This leaves an open problem on extracting the required performance within the power and energy limits, while also ensuring thermal safety. Existing architectural solutions including asymmetric and heterogeneous cores and custom acceleration improve performance-per-watt in specific design time and static scenarios. However, satisfying applications’ performance requirements under dynamic and unknown workload scenarios subject to varying system dynamics of power, temperature and energy requires intelligent run-time management. Adaptive strategies are necessary for maximizing resource efficiency, considering i) diverse requirements and characteristics of concurrent applications, ii) dynamic workload variation, iii) core-level heterogeneity and iv) power, thermal and energy constraints. This dissertation proposes such adaptive techniques for efficient run-time resource management to maximize performance within fixed budgets under unknown and dynamic workload scenarios. Resource management strategies proposed in this dissertation comprehensively consider application and workload characteristics and variable effect of power actuation on performance for pro-active and appropriate allocation decisions. Specific contributions include i) run-time mapping approach to improve power budgets for higher throughput, ii) thermal aware performance boosting for efficient utilization of power budget and higher performance, iii) approximation as a run-time knob exploiting accuracy performance trade-offs for maximizing performance under power caps at minimal loss of accuracy and iv) co-ordinated approximation for heterogeneous systems through joint actuation of dynamic approximation and power knobs for performance guarantees with minimal power consumption. The approaches presented in this dissertation focus on adapting existing mapping techniques, performance boosting strategies, software and dynamic approximations to meet the performance requirements, simultaneously considering system constraints. The proposed strategies are compared against relevant state-of-the-art run-time management frameworks to qualitatively evaluate their efficacy

    Uma ferramenta para modelagem e simulação de computação aproximada em hardware

    Get PDF
    Orientador: Lucas Francisco WannerDissertação (mestrado) - Universidade Estadual de Campinas, Instituto de ComputaçãoResumo: Pesquisas recentes têm introduzido unidades de hardware que produzem resultados incorretos de maneira determinística ou probabilística para um pequeno conjunto de entradas. Por outro lado, permitem um maior desempenho ou um consumo de energia significativamente menor em comparação com versões precisas das mesmas unidades. Como integrar, validar e avaliar essas alternativas em uma arquitetura ou processador, porém, permanece um desafio. A falta de ferramentas para representar e avaliar hardware aproximado leva desenvolvedores a verificar suas soluções de maneira independente, sem considerar interações com outros componentes, exigindo um grande esforço em modelagem e simulação. Neste trabalho, introduzimos ADeLe, uma linguagem de alto nível para descrever, configurar e integrar unidades de hardware aproximado em um processador. ADeLe reduz o esforço de desenvolvimento de hardware aproximado por modelar aproximações em um alto nível de abstração e injetá-las automaticamente em um modelo de processador para simulação arquitetural. Na ferramenta relacionada a ADeLe, aproximações podem modificar ou substituir completamente o comportamento de instruções de hardware através de políticas definidas pelo usuário. As instruções podem ser modificadas deterministicamente ou probabilisticamente (por exemplo, baseado em tensão de operação e frequência). Para proporcionar um ambiente de teste controlado, as aproximações podem ser ligadas e desligadas a partir do software em execução. O consumo de energia é automaticamente computado com base em modelos customizáveis no sistema. Assim, a ferramenta proporciona um método de verificação genérico e flexível, permitindo uma fácil avaliação da troca entre energia e qualidade de aplicações sujeitadas a hardware aproximado. Demonstramos a ferramenta pela introdução de variadas técnicas de aproximação em um modelo de processador, com o qual aplicações selecionadas foram executadas. Ao modelar módulos de hardware aproximado dedicados, mostramos como ADeLe representa unidades aritméticas aproximadas e unidades funcionais de precisão reduzida executando 4 aplicações de processamento de imagens e 2 de computação de ponto flutuante. Com outro método de aproximação, também mostramos como a ferramenta é utilizada para estudar o impacto de memórias alimentadas por tensão ajustável sobre 9 aplicações. Nossos experimentos demonstram as capacidades da ferramenta e como ela pode ser utilizada para gerar processadores virtuais aproximados e avaliar o equilíbrio entre energia e qualidade para diferentes aplicações com esforço reduzidoAbstract: Recent research has introduced approximate hardware units that produce incorrect outputs deterministically or probabilistically for some small subset of inputs. On the other hand, they allow significantly higher throughput or lower power than their error-free counterparts. The integration, validation, and evaluation of these approximate units in architectures and processors, however, remains challenging. The lack of tools to represent and evaluate approximate hardware leads designers to verify their solutions independently, not considering interactions with other components, demanding high-effort modeling and simulation. In this work, we introduce ADeLe, a high-level language for the description, configuration, and integration of approximate hardware units into processors. ADeLe reduces the design effort for approximate hardware by modeling approximations at a high level of abstraction and automatically injecting them into a processor model for architectural simulation. In the ADeLe framework, approximations may modify or completely replace the functional behavior of instructions according to user-defined policies. Instructions may be approximated deterministically or probabilistically (e.g., based on operating voltage and frequency). To allow for controlled testing, approximations may be enabled and disabled from software. Energy is automatically accounted for based on customizable models that consider the potential power savings of the approximations that are enabled in the system. Thus, the framework provides a generic and flexible verification method, allowing for easy evaluation of the energy-quality trade-off of applications subjected to approximate hardware. We demonstrate the framework by introducing different approximation techniques into a processor model, on top of which we run selected applications. Modeling dedicated hardware modules, we show how ADeLe can represent approximate arithmetic and reduced precision computation units executing 4 image processing and 2 floating point applications. Using a different method of approximation, we also show how the framework is used to study the impact of voltage-overscaled memories over 9 applications. Our experiments show the framework capabilities and how it may be used to generate approximate virtual CPUs and to evaluate energy-quality trade-offs for different applications with reduced effortMestradoCiência da ComputaçãoMestre em Ciência da Computação2017/08015-8  FAPES

    Mapping parallelism to heterogeneous processors

    Get PDF
    Most embedded devices are based on heterogeneous Multiprocessor System on Chips (MPSoCs). These contain a variety of processors like CPUs, micro-controllers, DSPs, GPUs and specialised accelerators. The heterogeneity of these systems helps in achieving good performance and energy efficiency but makes programming inherently difficult. There is no single programming language or runtime to program such platforms. This thesis makes three contributions to these problems. First, it presents a framework that allows code in Single Program Multiple Data (SPMD) form to be mapped to a heterogeneous platform. The mapping space is explored, and it is shown that the best mapping depends on the metric used. Next, a compiler framework is presented which bridges the gap between the high -level programming model of OpenMP and the heterogeneous resources of MPSoCs. It takes OpenMP programs and generates code which runs on all processors. It delivers programming ease while exploiting heterogeneous resources. Finally, a compiler-based approach to runtime power management for heterogeneous cores is presented. Given an externally provided budget, the approach generates heterogeneous, partitioned code that attempts to give the best performance within that budget

    Dark silicon as a challenge for hardware/software co-design

    No full text

    Scalable Task Schedulers for Many-Core Architectures

    Get PDF
    This thesis develops schedulers for many-cores with different optimization objectives. The proposed schedulers are designed to be scale up as the number of cores in many-cores increase while continuing to provide guarantees on the quality of the schedule

    Methoden zur applikationsspezifischen Effizienzsteigerung adaptiver Prozessorplattformen

    Get PDF
    General-Purpose Prozessoren sind für den durchschnittlichen Anwendungsfall optimiert, wodurch vorhandene Ressourcen nicht effizient genutzt werden. In der vorliegenden Arbeit wird untersucht, in wie weit es möglich ist, einen General-Purpose Prozessor an einzelne Anwendungen anzupassen und so die Effizienz zu steigern. Die Adaption kann zur Laufzeit durch das Prozessor- oder Laufzeitsystem anhand der jeweiligen Systemparameter erfolgen, um eine Effizienzsteigerung zu erzielen
    corecore