81 research outputs found

    Enabling the use of embedded and mobile technologies for high-performance computing

    Get PDF
    In the late 1990s, powerful economic forces led to the adoption of commodity desktop processors in High-Performance Computing(HPC). This transformation has been so effective that the November 2016 TOP500 list is still dominated by x86 architecture. In 2016, the largest commodity market in computing is not PCs or servers, but mobile computing, comprising smartphones andtablets, most of which are built with ARM-based Systems on Chips (SoC). This suggests that once mobile SoCs deliver sufficient performance, mobile SoCs can help reduce the cost of HPC. This thesis addresses this question in detail.We analyze the trend in mobile SoC performance, comparing it with the similar trend in the 1990s. Through development of real system prototypes and their performance analysis we assess the feasibility of building an HPCsystem based on mobile SoCs. Through simulation of the future mobile SoC, we identify the missing features and suggest improvements that would enable theuse of future mobile SoCs in HPC environment. Thus, we present design guidelines for future generations mobile SoCs, and HPC systems built around them, enabling the newclass of cheap supercomputers.A finales de la década de los 90, razones económicas llevaron a la adopción de procesadores de uso general en sistemas de Computación de Altas Prestaciones (HPC). Esta transformación ha sido tan efectiva que la lista TOP500 de noviembre de 2016 sigue aun dominada por la arquitectura x86. En 2016, el mayor mercado de productos básicos en computación no son los ordenadores de sobremesa o los servidores, sino la computación móvil, que incluye teléfonos inteligentes y tabletas, la mayoría de los cuales están construidos con sistemas en chip(SoC) de arquitectura ARM. Esto sugiere que una vez que los SoC móviles ofrezcan un rendimiento suficiente, podrán utilizarse para reducir el costo desistemas HPC. Esta tesis aborda esta cuestión en detalle. Analizamos la tendencia del rendimiento de los SoC para móvil, comparándola con la tendencia similar ocurrida en los añosnoventa. A través del desarrollo de prototipos de sistemas reales y su análisis de rendimiento, evaluamos la factibilidad de construir unsistema HPC basado en SoCs móviles. A través de la simulación de SoCs móviles futuros, identificamos las características que faltan y sugerimos mejoras quepermitirían su uso en entornos HPC. Por lo tanto, presentamos directrices de diseño para futuras generaciones de SoCs móviles y sistemas HPC construidos a sualrededor, para permitir la construcción de una nueva clase de supercomputadores de coste reducido

    TEEM: Online Thermal- and Energy-Efficiency Management on CPU-GPU MPSoCs

    Get PDF
    Heterogeneous Multiprocessor System-on-Chip (MPSoC) are progressively becoming predominant in most modern mobile devices. These devices are required to perform processing of applications within thermal, energy and performance constraints. However, most stock power and thermal management mechanisms either neglect some of these constraints or rely on frequency scaling to achieve energy-efficiency and temperature reduction on the device. Although this inefficient technique can reduce temporal thermal gradient, but at the same time hurts the performance of the executing task. In this paper, we propose a thermal and energy management mechanism which achieves reduction in thermal gradient as well as energy-efficiency through resource mapping and thread-partitioning of applications with online optimization in heterogeneous MPSoCs. The efficacy of the proposed approach is experimentally appraised using different applications from Polybench benchmark suite on Odroid-XU4 developmental platform. Results show 28% performance improvement, 28.32% energy saving and reduced thermal variance of over 76% when compared to the existing approaches. Additionally, the method is able to free more than 90% in memory storage on the MPSoC, which would have been previously utilized to store several task-to-thread mapping configurations

    Exploiting memory allocations in clusterized many-core architectures

    Get PDF
    Power-efficient architectures have become the most important feature required for future embedded systems. Modern designs, like those released on mobile devices, reveal that clusterization is the way to improve energy efficiency. However, such architectures are still limited by the memory subsystem (i.e., memory latency problems). This work investigates an alternative approach that exploits on-chip data locality to a large extent, through distributed shared memory systems that permit efficient reuse of on-chip mapped data in clusterized many-core architectures. First, this work reviews the current literature on memory allocations and explore the limitations of cluster-based many-core architectures. Then, several memory allocations are introduced and benchmarked scalability, performance and energy-wise, compared to the conventional centralized shared memory solution to reveal which memory allocation is the most appropriate for future mobile architectures. Our results show that distributed shared memory allocations bring performance gains and opportunities to reduce energy consumption

    Intelligent systems for efficiency and security

    Get PDF
    As computing becomes ubiquitous and personalized, resources like energy, storage and time are becoming increasingly scarce and, at the same time, computing systems must deliver in multiple dimensions, such as high performance, quality of service, reliability, security and low power. Building such computers is hard, particularly when the operating environment is becoming more dynamic, and systems are becoming heterogeneous and distributed. Unfortunately, computers today manage resources with many ad hoc heuristics that are suboptimal, unsafe, and cannot be composed across the computer’s subsystems. Continuing this approach has severe consequences: underperforming systems, resource waste, information loss, and even life endangerment. This dissertation research develops computing systems which, through intelligent adaptation, deliver efficiency along multiple dimensions. The key idea is to manage computers with principled methods from formal control. It is with these methods that the multiple subsystems of a computer sense their environment and configure themselves to meet system-wide goals. To achieve the goal of intelligent systems, this dissertation makes a series of contributions, each building on the previous. First, it introduces the use of formal MIMO (Multiple Input Multiple Output) control for processors, to simultaneously optimize many goals like performance, power, and temperature. Second, it develops the Yukta control system, which uses coordinated formal controllers in different layers of the stack (hardware and operating system). Third, it uses robust control to develop a fast, globally coordinated and decentralized control framework called Tangram, for heterogeneous computers. Finally, it presents Maya, a defense against power side-channel attacks that uses formal control to reshape the power dissipated by a computer, confusing the attacker. The ideas in the dissertation have been demonstrated successfully with several prototypes, including one built along with AMD (Advanced Micro Devices, Inc.) engineers. These designs significantly outperformed the state of the art. The research in this dissertation brought formal control closer to computer architecture and has been well-received in both domains. It has the first application of full-fledged MIMO control for processors, the first use of robust control in computer systems, and the first application of formal control for side-channel defense. It makes a significant stride towards intelligent systems that are efficient, secure and reliable

    It is too hot in here! A performance, energy and heat aware scheduler for Asymmetric multiprocessing processors in embedded systems.

    Get PDF
    Modern architecture present in self-power devices such as mobiles or tablet computers proposes the use of asymmetric processors that allow either energy-efficient or performant computation on the same SoC. For energy efficiency and performance consideration, the asymmetry resides in differences in CPU micro-architecture design and results in diverging raw computing capability. Other components such as the processor memory subsystem also show differences resulting in different memory transaction timing. Moreover, based on a bus-snoop protocol, cache coherency between processors comes with a peculiarity in memory latency depending on the processors operating frequencies. All these differences come with challenging decisions on both application schedulability and processor operating frequencies. In addition, because of the small form factor of such embedded systems, these devices generally cannot afford active cooling systems. Therefore thermal mitigation relies on dynamic software solutions. Current operating systems for embedded systems such as Linux or Android do not consider all these particularities. As such, they often fail to satisfy user expectations of a powerful device with long battery life. To remedy this situation, this thesis proposes a unified approach to deliver high-performance and energy-efficiency computation in each of its flavours, considering the memory subsystem and all computation units available in the system. Performance is maximized even when the device is under heavy thermal constraints. The proposed unified solution is based on accurate models targeting both performance and thermal behaviour and resides at the operating systems kernel level to manage all running applications in a global manner. Particularly, the performance model considers both the computation part and also the memory subsystem of symmetric or asymmetric processors present in embedded devices. The thermal model relies on the accurate physical thermal properties of the device. Using these models, application schedulability and processor frequency scaling decisions to either maximize performance or energy efficiency within a thermal budget are extensively studied. To cover a large range of application behaviour, both models are built and designed using a generative workload that considers fine-grain details of the underlying microarchitecture of the SoC. Therefore, this approach can be derived and applied to multiple devices with little effort. Extended evaluation on real-world benchmarks for high performance and general computing, as well as common applications targeting the mobile and tablet market, show the accuracy and completeness of models used in this unified approach to deliver high performance and energy efficiency under high thermal constraints for embedded devices

    Rebasing Microarchitectural Research with Industry Traces

    Get PDF
    Microarchitecture research relies on performance models with various degrees of accuracy and speed. In the past few years, one such model, ChampSim, has started to gain significant traction by coupling ease of use with a reasonable level of detail and simulation speed. At the same time, datacenter class workloads, which are not trivial to set up and benchmark, have become easier to study via the release of hundreds of industry traces following the first Championship Value Prediction (CVP-1) in 2018. A tool was quickly created to port the CVP-1 traces to the ChampSim format, which, as a result, have been used in many recent works. In this paper, we revisit this conversion tool and find that several key aspects of the CVP-1 traces are not preserved by the conversion. We therefore propose an improved converter that addresses most conversion issues as well as patches known limitations of the CVP-1 traces themselves. We evaluate the impact of our changes on two commits of ChampSim, with one used for the first Instruction Championship Prefetching (IPC-1) in 2020. We find that the performance variation stemming from higher accuracy conversion is significant

    A Storage-Effective BTB Organization for Servers

    Get PDF
    • …
    corecore